Summary and Setup

Being able to reliably run a range of tools, across multiple inputs, and in the right order, is a very common problem in computing, including in scientific data analysis. If you develop scripts for data analysis from scratch, you will usually end up running up against these challenges. Rather than reinventing the wheel, we can lean on existing tools to do this for us, letting us focus on the aspects unique to our own work. Such tools are called workflow managers._

In this lesson, we introduce Snakemake, a workflow manager originally developed for bioinformatics applications, but that maps well onto the needs of data analysis in lattice quantum field theory.

By defining rules, each of which specify how to translate one or more input files into one or more output files, we can build up a workflow that takes our raw data as input, and produces plots, tables, and other definitions that we can include in our publications.

Software installation


These instructions set out how to obtain and install the software and data on Linux. It is assumed that you have:

  • access to the Bash or Zsh shell on a fairly modern Linux or macOS system
  • sufficient disk space (~1GB) to store the software and data

You do not need root/administrator access.

Data Sets


Download the data zip file and unzip it to your Desktop

Software Setup


Conda

We will use Conda both to install Snakemake itself, and to manage dependencies of our workflows. Miniforge provide a minimal Conda environment, on which we will build.

Discussion

Details

Download the correct file for your operating system from the Miniforge repository, and execute it at the terminal.

This lesson has not been tested with Windows. We would recommend using the Windows Subsystem for Linux, and following the instructions for Linux.

BASH

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

When prompted, unless you have a reason not to, pick the option for Conda to set up your environment with conda init.

Snakemake


With Conda available, we can create an environment containing Snakemake and its dependencies. This can be used not just for this lesson, but for your work in Snakemake going forward.

Discussion

Details

BASH

conda create -n snakemake -c conda-forge -c bioconda snakemake
conda activate snakemake
conda install -c conda-forge 'mamba<2.0.0'

After starting a new terminal, or rebooting your computer, you will need to run

BASH

conda activate snakemake

in order to activate the environment to be able to use Snakemake.

LaTeX


We will be using Matplotlib to generate plots formatted with LaTeX, which relies on having LaTeX installed.

This lesson has not been tested with Windows. You may try using MikTeX