Snakemake for Lattice Quantum Field Theory: Key Points

Running commands with Snakemake

Before running Snakemake you need to write a Snakefile
A Snakefile is a text file which defines a list of rules
Rules have inputs, outputs, and shell commands to be run
You tell Snakemake what file to make and it will run the shell command defined in the appropriate rule

Snakemake will manage Conda environments for you, to help ensure that workflows always use a consistent set of packages
Use the --use-conda option to snakemake to enable this behaviour
Use conda: to specify a Conda environment definition (.yml) file. The path of this is relative to the file in which it is specified.
Conda environment files are conventionally put in the workflow/envs directory

Snakemake rules are made generic with placeholders and wildcards
Snakemake chooses the appropriate rule by replacing wildcards such the the output matches the target
Placeholders in the shell part of the rule are replaced with values based on the chosen wildcards
Snakemake checks for various error conditions and will stop if it sees a problem

Snakemake links up rules by iteratively looking for rules that make missing inputs
Careful choice of filenames allows this to work
Rules may have multiple named input files (and output files)
Use expand() to generate lists of filenames from a template

Use a YAML file to define parameters to the workflow, and attach it using configfile: near the top of the file.
Override individual options at run-time with the --config option.
Load additional parameter files at run-time using the --configfile option.
Use a CSV file loaded into a Pandas dataframe to load ensemble-specific metadata.
Use lookup() to get information out of the dataframe in a rule.
Use params: to define job-specific parameters that do not describe filenames.

Rules can have multiple inputs and outputs, separated by commas
Use name=value to give names to inputs/outputs
Inputs themselves can be lists
Use placeholders like {input.name} to refer to single named inputs
Where there are multiple inputs, {input} will insert them all, separated by spaces
Use log: to list log outputs, which will not be removed when jobs fail
Errors are an expected part developing Snakemake workflows, and usually give enough information to track down what is causing them

A job in Snakemake is a rule plus wildcard values (determined by working back from the requested output)
Snakemake plans its work by arranging all the jobs into a DAG (directed acyclic graph)
If output files already exist, Snakemake can skip parts of the DAG
Snakemake compares file timestamps and a log of previous runs to determine what need regenerating

To make your workflow run as fast as possible, try to match the number of threads to the number of cores you have
You also need to consider RAM, disk, and network bottlenecks
Profile your jobs to see what is taking most resources
Use --cores all to enable using all CPU cores
Snakemake is great for running workflows on compute clusters

Use Python input functions that take a dict of wildcards, and return a list of strings, to handle complex dependency issues that can’t be expressed in pure Snakemake.
Import glob.glob to match multiple files on disk matching a specific pattern. Don’t rely on this finding intermediate files in final production workflows, since it won’t find files not present at the start of the workflow run.
Use snakemake --touch if you need to mark files as up-to-date, so that Snakemake won’t try to regenerate them.

Use .smk files in workflow/rules to compartmentalise the Snakefile, and use input: lines in the main Snakefile to link them into the workflow.
Add a rule at the top of the Snakefile with default_target: True to specify the default output of a workflow.
Use .gitignore to avoid committing input or output data or the Snakemake cache.
Use .git_keep files to preserve empty directories.
Use Git submodules to link to libraries you have written that aren’t on PyPI.
Include a README.md in your repository explaining how to run the workflow.

Test the workflow by making a fresh clone and following the README instructions
Use zip -9 --exclude "**/.git/*" --exclude "**/.git" filename.zip dirname to prepare a ZIP file of a freshly-cloned repository, with submodules.