Running commands with Snakemake


  • Before running Snakemake you need to write a Snakefile
  • A Snakefile is a text file which defines a list of rules
  • Rules have inputs, outputs, and shell commands to be run
  • You tell Snakemake what file to make and it will run the shell command defined in the appropriate rule

Running Python code with Snakemake


Placeholders and wildcards


  • Snakemake rules are made generic with placeholders and wildcards
  • Snakemake chooses the appropriate rule by replacing wildcards such the the output matches the target
  • Placeholders in the shell part of the rule are replaced with values based on the chosen wildcards
  • Snakemake checks for various error conditions and will stop if it sees a problem

Chaining rules


  • Snakemake links up rules by iteratively looking for rules that make missing inputs
  • Careful choice of filenames allows this to work
  • Rules may have multiple named input files (and output files)
  • Use expand() to generate lists of filenames from a template

Metadata and parameters


Multiple inputs and outputs


How Snakemake plans jobs


  • A job in Snakemake is a rule plus wildcard values (determined by working back from the requested output)
  • Snakemake plans its work by arranging all the jobs into a DAG (directed acyclic graph)
  • If output files already exist, Snakemake can skip parts of the DAG
  • Snakemake compares file timestamps and a log of previous runs to determine what need regenerating

Optimising workflow performance


  • To make your workflow run as fast as possible, try to match the number of threads to the number of cores you have
  • You also need to consider RAM, disk, and network bottlenecks
  • Profile your jobs to see what is taking most resources
  • Snakemake is great for running workflows on compute clusters