Instructor Notes

This is a placeholder file. Please add content here.

Running commands with Snakemake


Environment activation

The most likely issue learners will encounter here is needing to activate their Snakemake environment when they have opened a fresh terminal. This is hopefully as simple as

conda activate snakemake

If Conda isn’t set up to automatically activate itself on starting a shell session, they may also need to run something like

source ~/miniconda3/bin/activate

where the exact path to run will depend on their specific setup.



Directories

Learners less experienced with the shell may want to cd into directories to edit files; if they do this and forget to cd back out again, they will encounter difficulties as Snakemake may not be able to find the Snakefile or the input files.

If they try to work around this, they may end up with multiple Snakefiles or ones with inputs pointing at incorrect relative paths.

Technically, you can specify absolute paths in Snakefiles, but this is not recommended, for portability reasons. For example, when using Snakemake to execute some rules on another machine, this would fail as it cannot gather the dependencies into the correct location; similarly if someone else were to run a workflow on their own machine, the home directory is unlikely to be the same, so the workflow would fail.

New researchers frequently like to hardcode absolute paths to their data, so this is an important point to reinforce.



Use of the --forceall flag

In the first few episodes we always run Snakemake with the --forceall flag, and it’s not explained what this does until Ep. 04. The rationale is that the default Snakemake behaviour when pruning the DAG leads to learners seeing different output (typically the message “nothing to be done”) when repeating the exact same command. This can seem strange to learners who are used to scripting and imperative programming.

The internal rules used by Snakemake to determine which jobs in the DAG are to be run, and which skipped, are pretty complex, but the behaviour seen under --forceall is much more simple and consistent; Snakemake simply runs every job in the DAG every time. You can think of --forceall as disabling the lazy evaluation feature of Snakemake, until we are ready to properly introduce and understand it.



Running Python code with Snakemake


Placeholders and wildcards


Chaining rules


Metadata and parameters


Multiple inputs and outputs


Definitely run through the spectrum plot

This plot is referred to from subsequent lessons, so you definitely need to go through it.



How Snakemake plans jobs


Version differences

Older versions of Snakemake only support outputting the DAG in dot format, so that argument is not needed there.



Optimising workflow performance


Running on cluster and cloud

Running workflows on HPC or Cloud systems could be a whole course in itself. The topic is too important not to be mentioned here, but also complex to teach because you need a cluster to work on.

If you are teaching this lesson and have institutional HPC then ideally you should liaise with the administrators of the system to make a suitable installation of a recent Snakemake version and a profile to run jobs on the cluster job scheduler. In practise this may be easier said than done!

If you are able to demonstrate Snakemake running on cloud as part of a workshop then we’d much appreciate any feedback on how you did this and how it went.



Awkward corners


Closure alternatives

Here we choose to use a closure (a function returned by another function, where the former’s behaviour depends on the arguments to the latter), so that the same code can be used for both the start and the end of the plateau. There are other ways to phrase this: you could define a free function get_plateau(wildcards, position), and then in the rule definition, use functools.partial(get_plateau, position=...) to set the position, or use a lambda lambda wildcards: get_plateau(wildcards, ...). We choose the closure here because defining functions should already be familiar to most learners, and passing functions as values needs to be learned anyway (since we have to pass one to Snakemake), whereas lambdas and functools may not be familiar, and aren’t needed elsewhere in the lesson.



Change of command line

If learners skipped over the previous section, note that we’ve changed the standard snakemake call we were previously using: now, we don’t use --forceall, so we only regenerate when necessary (which makes the run quicker). --jobs all, meanwhile, tells Snakemake to use all available CPU cores. In this case it doesn’t make a difference, since only one job is needed by this run, but it’s a useful default invocation for production use.