Instructor Notes
This is a placeholder file. Please add content here.
Running commands with Snakemake
Environment activation
The most likely issue learners will encounter here is needing to activate their Snakemake environment when they have opened a fresh terminal. This is hopefully as simple as
conda activate snakemake
If Conda isn’t set up to automatically activate itself on starting a shell session, they may also need to run something like
source ~/miniconda3/bin/activate
where the exact path to run will depend on their specific setup.
Directories
Learners less experienced with the shell may want to cd
into directories to edit files; if they do this and forget to
cd
back out again, they will encounter difficulties as
Snakemake may not be able to find the Snakefile or the input files.
If they try to work around this, they may end up with multiple Snakefiles or ones with inputs pointing at incorrect relative paths.
Technically, you can specify absolute paths in Snakefiles, but this is not recommended, for portability reasons. For example, when using Snakemake to execute some rules on another machine, this would fail as it cannot gather the dependencies into the correct location; similarly if someone else were to run a workflow on their own machine, the home directory is unlikely to be the same, so the workflow would fail.
New researchers frequently like to hardcode absolute paths to their data, so this is an important point to reinforce.
Use of the --forceall flag
In the first few episodes we always run Snakemake with the
--forceall
flag, and it’s not explained what this does
until Ep. 04. The rationale is that the default Snakemake behaviour when
pruning the DAG leads to learners seeing different output (typically the
message “nothing to be done”) when repeating the exact same command.
This can seem strange to learners who are used to scripting and
imperative programming.
The internal rules used by Snakemake to determine which jobs in the
DAG are to be run, and which skipped, are pretty complex, but the
behaviour seen under --forceall
is much more simple and
consistent; Snakemake simply runs every job in the DAG every time. You
can think of --forceall
as disabling the lazy evaluation
feature of Snakemake, until we are ready to properly introduce and
understand it.
Running Python code with Snakemake
Placeholders and wildcards
Chaining rules
Metadata and parameters
Multiple inputs and outputs
Definitely run through the spectrum plot
This plot is referred to from subsequent lessons, so you definitely need to go through it.
How Snakemake plans jobs
Version differences
Older versions of Snakemake only support outputting the DAG in
dot
format, so that argument is not needed there.
Optimising workflow performance
Running on cluster and cloud
Running workflows on HPC or Cloud systems could be a whole course in itself. The topic is too important not to be mentioned here, but also complex to teach because you need a cluster to work on.
If you are teaching this lesson and have institutional HPC then ideally you should liaise with the administrators of the system to make a suitable installation of a recent Snakemake version and a profile to run jobs on the cluster job scheduler. In practise this may be easier said than done!
If you are able to demonstrate Snakemake running on cloud as part of a workshop then we’d much appreciate any feedback on how you did this and how it went.
Awkward corners
Closure alternatives
Here we choose to use a closure (a function returned by
another function, where the former’s behaviour depends on the arguments
to the latter), so that the same code can be used for both the start and
the end of the plateau. There are other ways to phrase this: you could
define a free function get_plateau(wildcards, position)
,
and then in the rule definition, use
functools.partial(get_plateau, position=...)
to set the
position
, or use a lambda
lambda wildcards: get_plateau(wildcards, ...)
. We choose
the closure here because defining functions should already be familiar
to most learners, and passing functions as values needs to be learned
anyway (since we have to pass one to Snakemake), whereas lambdas and
functools
may not be familiar, and aren’t needed elsewhere
in the lesson.
Change of command line
If learners skipped over the previous section, note that we’ve
changed the standard snakemake
call we were previously
using: now, we don’t use --forceall
, so we only regenerate
when necessary (which makes the run quicker). --jobs all
,
meanwhile, tells Snakemake to use all available CPU cores. In this case
it doesn’t make a difference, since only one job is needed by this run,
but it’s a useful default invocation for production use.