Summary and Setup
Being able to reliably run a range of tools, across multiple inputs, and in the right order, is a very common problem in computing, including in scientific data analysis. If you develop scripts for data analysis from scratch, you will usually end up running up against these challenges. Rather than reinventing the wheel, we can lean on existing tools to do this for us, letting us focus on the aspects unique to our own work. Such tools are called workflow managers._
In this lesson, we introduce Snakemake, a workflow manager originally developed for bioinformatics applications, but that maps well onto the needs of data analysis in lattice quantum field theory.
By defining rules, each of which specify how to translate one or more input files into one or more output files, we can build up a workflow that takes our raw data as input, and produces plots, tables, and other definitions that we can include in our publications.
Software installation
These instructions set out how to obtain and install the software and data on Linux. It is assumed that you have:
- access to the Bash or Zsh shell on a fairly modern Linux or macOS system
- sufficient disk space (~1GB) to store the software and data
You do not need root/administrator access.
Data Sets
Download the data zip file and unzip it to your Desktop
Software Setup
Conda
We will use Conda both to install Snakemake itself, and to manage dependencies of our workflows. Miniforge provide a minimal Conda environment, on which we will build.
Details
Download the correct file for your operating system from the Miniforge repository, and execute it at the terminal.
This lesson has not been tested with Windows. We would recommend using the Windows Subsystem for Linux, and following the instructions for Linux.
Snakemake
With Conda available, we can create an environment containing Snakemake and its dependencies. This can be used not just for this lesson, but for your work in Snakemake going forward.
LaTeX
We will be using Matplotlib to generate plots formatted with LaTeX, which relies on having LaTeX installed.
This lesson has not been tested with Windows. You may try using MikTeX