GNU Parallel for quick gains
Overview
Teaching: 20 min
Exercises: 10 minQuestions
How and when do I use GNU Parallel with Python programs?
Objectives
Identify what sort of tasks are suitable for GNU Parallel to run in parallel
Learn how to add command-line arguments to variables that GNU Parallel is to control
Refresh how to use GNU Parallel to run in parallel a program that accepts command-line arguments
Sometimes, you have a piece of software that processes one thing, and you would like it to process many things. That thing may be anything an image, a parameter set, a chunk of genetic data… The problem that you have is that you have a script geared up to processing one thing, and now you need to scale up to process many of them.
In many cases, the way
that you will approach this is to add a for
loop around the block of
code, and iterate through a list (or other collection) of things.
This causes problems, however, if each thing takes more than a few minutes
to process, and you have many hundreds or thousands (or hundreds of
thousands!) of these things to process. Running all of these in a for
loop would take hours to days or weeks to run, and you can’t rely on your
laptop being available for that long (or if you’re already using
Supercomputing Wales, then the queue won’t let you run for longer
than 3 days).
Fortunately, there is tooling available to help with this. Unlike the other tools that will be discussed in the remainder of this lesson, this tool is not Python-specific, and if you have previously taken the “Introduction to High Performance Computing with Supercomputing Wales” course then you will already have encountered it—it is GNU Parallel.
GNU Parallel takes a program, and runs it many times with a list of possible inputs that you supply. It does this using as many cores as are available on the node that you are using, and if you are using multiple nodes, then with a little extra work then you can tell it to make use of all available cores on all available nodes, too.
The only catch here is that your program must run without input, and be able to run as a command-line script, accepting arguments to tell it which thing to process. If at the moment you have a Jupyter notebook that you hand-adjust each time you want to process a different thing, then some changes will be needed to let you use GNU Parallel.
Using command-line arguments
GNU Parallel uses command-line arguments to communicate to the many different copies of your software that it runs. So in order to take advantage of GNU Parallel to run your Python programs across many different input files or sets of data, then it needs to be able to accept command-line arguments for the parameters that you want GNU Parallel to be able to control.
Command line?
If you’re currently using a Jupyter Notebook for your analysis, you will need to convert it to running as an independent Python script in order to use GNU Parallel and command-line arguments. For more information on this, see the Command-line programs episode of the Software Carpentry Python lesson.
A common pattern for quick Python programs is to hard-code a filename or other parameter, and then adjust it by hand between runs. If you’re doing this at the moment, then you’ll need to make a few changes to your programs in order to take full advantage of the power that GNU Parallel has to offer.
We are going to look at an example program fourier_orig.py
, from the
code package you copied earlier. First off, try running this program to
see what it does:
$ python fourier_orig.py
This should take a few seconds to run; use ls -lrt
to see the most
recently created files in the directory once it finishes to see what
has been added. You will see that three PDF files
(fourier_restricted.pdf
, noise_isolation.pdf
, and phase_contrast.pdf
)
have been created.
Next up, inspect the file in a text editor.
$ nano fourier_orig.py
This program simulates a set of experiments in optics, where an image
is processed using a series of different manipulations. It reads in
an image file (currently einstein1_7.jpg
), does some processing to it,
and outputs the three PDF files mentioned above, which are generated
independently of each other.
This program works fine for testing a single image, or a handful, but for a full run of dozens or hundres of images, this is too cumbersome to be practical. So, we would like to adjust the program so that the filename to read can be controlled from the command line.
A plan of attack
Looking at
fourier_orig.py
, what sections of the program need to be changed in order for the program to be able to control the filename to read from the command-line? What other changes might need to be made so that the program will work properly in parallel when processing image files given as command-line arguments?Discuss your thoughts on this with one or more neighbours.
Solution
To accept an image as a command-line argument, you will need:
- To add an import in the first few lines, from a module that gives access to command line arguments
- To get the filename from the list of command line arguments and put it in a variable, somewhere before line 11
- To read from the given filename instead of from
einstein1_7.jpg
at line 11If this is run in parallel, the results files will always be called the same name. So to make the program work in parallel, you will additionally need to
- Before lines 69, 132, and 153, either get three output filenames from the command-line arguments, or decide based on the input filename what the output filenames should be.
- At lines 69, 132, and 153, use the filenames from the previous bullet rather than the hard-coded ones that are currently used.
- If using command-line arguments for filenames, then decide what to do if the filenames are not provided. This could raise an error message, or could skip the step that was not given and not output anything.
For this example, we will choose to specify the output filenames as arguments, and skip steps that don’t have an output file.
If you’ve worked through the
Command-line programs
episode of the Software Carpentry Python lesson, you may remember that
you can use sys.argv
to access the list of command-line arguments
passed to your program. However, this requires a lot of extra code to
handle edge cases and error checking if you want to do it robustly.
Instead, here we are going to use a module called argparse
to parse
the command-line arguments for us. This module is part of the Python
standard library, so you don’t need to install any additional packages
to make use of it.
To start off, make a copy of the file so you are not editing the original
program. You will need to press Ctrl+X
if you are still inside nano
(or quit your editor if you are using a different one). Then
$ cp fourier_orig.py fourier_new.py
$ nano fourier_new.py
Then add a line importing the part of the argparse
module
that we are going to use, at the start of the file:
from argparse import ArgumentParser
Now, before the call to plt.imread
, we need to use ArgumentParser
to
get a filename to read. To do this, we will create an ArgumentParser
object, tell it what arguments we would like, and then tell it to get
them for us.
parser = ArgumentParser()
parser.add_argument('filename')
args = parser.parse_args()
With this done, args
will hold any filename that gets supplied
as a command-line argument. Since in principle there can be more than one
argument, the filename will be an attribute of args
.
Now, tell the call to imread
to use this new filename by editing the line
to read:
image = plt.imread(args.filename)
At this point, the program should still do exactly what it did before, provided the same input file is specified. Check that this is true, by running:
python fourier_new.py einstein1_7.jpg
If all is well, you will see no errors, and ls -lrt
will show
that the three PDF files generated by the script have been updated.
We can now specify the input filename from the command line. If your
program only outputs to stdout (e.g. only outputs via print
), and
you can tell from each line of output what the input parameters were,
then this is all you need to do, However, in the case of the Fourier
example, some output PDF files are also generated. We need to tell
the program where to put these so that they don’t get overwritten by
subsequent runs of the program.
First off, before the call to parse_args()
,
tell the ArgumentParser
that we would like three more
arguments, and that they should be optional, with a default value of
None
:
parser.add_argument('--fourier_restricted_output', default=None)
parser.add_argument('--noise_isolation_output', default=None)
parser.add_argument('--phase_contrast_output', default=None)
Now, after the call to parse_args()
, the args
objects will
have three extra attributes, representing the three optional arguments.
If they are not provided, then they are set to None
.
The initial few lines are setup, and are needed for all three (or at least more than one) of the tasks the program carries out. From the line
# Move to Fourier plane
onwards is specific to each task. So, immediately before this line, add a check as to whether the program is carrying out this task:
if args.fourier_restricted_output:
Indent everything up to the first call to plt.savefig
, and change this
line to:
plt.savefig(args.fourier_restricted_output)
We can then repeat this step for the other two tasks that the program carries out.
With this done, we can test that the program still works, by running:
$ python fourier_new.py einstein1_7.jpg \
--fourier_restricted_output=fourier_restricted.pdf \
--noise_isolation_output=noise_isolation.pdf \
--phase_contrast_output=phase_contrast.pdf
Again, ls -lrt
will let you check that the output files are up to date.
With this done, we are now ready to use GNU Parallel to run this program in parallel for many image files at once.
Running your Python programs with GNU Parallel
Now that all the parameters that we want to control are exposed as command-line arguments, we are ready to use GNU Parallel to run the program across an entire batch of images at once.
To do this, let’s create a new job script to let this run as a batch job:
$ nano submit_fourier.sh
#!/bin/bash --login
###
# Number of processors we will use
#SBATCH --ntasks 10
# Output file location
#SBATCH --output fourier.out.%J
# Time limit for this job
#SBATCH --time 00:10:00
# specify our current project
# change this for your own work
#SBATCH --account=scw1389
# specify the reservation we have for the training workshop
# remove this for your own work
# replace XX with the code provided by your instructor
#SBATCH --reservation=scw1389_XX
###
# Ensure that parallel is available to us
module load parallel
# Load Python and activate the environment we will use for this work
module load anaconda/2021.05
source activate scw_test
# Only use one thread per copy of Python, since we are using GNU Parallel
# for parallelism
export OMP_NUM_THREADS=1
# Define srun arguments:
srun="srun --nodes 1 --ntasks 1"
# --nodes 1 --ntasks 1 allocates a single core to each task
# Define parallel arguments:
parallel="parallel --max-procs $SLURM_NTASKS --joblog parallel_joblog"
# --max-procs $SLURM_NTASKS is the number of concurrent tasks parallel runs, so number of CPUs allocated
# --joblog name parallel's log file of tasks it has run
# Run the tasks:
$parallel "$srun python fourier_new.py {1} \
--fourier_restricted_output=fourier_restricted_\$(basename {1}).pdf \
--noise_isolation_output=noise_isolation_\$(basename {1}).pdf \
--phase_contrast_output=phase_contrast_\$(basename {1}).pdf" :::: files_to_process.txt
This is adapted from the GNU Parallel script in the Supercomputing Wales tutorial. The changes that we have made:
- The job requests 10 CPU cores rather than 80.
- The job now loads Anaconda and loads the tutorial environment we created earlier in the lesson
- The variable
OMP_NUM_THREADS
is set to 1. This is because Numpy will by default try and use all the cores available on a machine to do computation; this is great if it is the only copy running, but if we are trying to fill the CPU with parallel copies of the same program, then these would all end up competing for resources. Instead, we tell Numpy to only use a single thread per copy of Python, and let GNU Parallel do the work of filling up the CPU. - The last line now calls our Python script rather than the
goostats
script from the Sotware Carpentry shell lesson
Set the reservation ID correctly (or remove it if you do not have a reservation) and submit this job.
$ sbatch submit_fourier.sh
$ squeue -u $USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
7166510 compute submit_f s.e.j.be R 0:00:14 1 scs0098
Checking the queue status, we can see that this job is running. Once the program has had a few seconds to run, you should see a collection of PDF files fill up your working directory.
$ ls
einstein1_7.jpg noise_isolation_Hendrik_lorentz.jpg.pdf
files_to_process.txt noise_isolation_Landau.jpg.pdf
fourier_new.py noise_isolation_Louis_de_Broglie.jpg.pdf
fourier_orig.py noise_isolation_Mariecurie.jpg.pdf
fourier.out.7166510 noise_isolation_Michael_Faraday_001.jpg.pdf
fourier_restricted_Abdus_Salam.jpg.pdf noise_isolation_Murray_Gell-Mann.png.pdf
fourier_restricted_Bardeen.jpg.pdf noise_isolation_Oppenheimer.jpg.pdf
fourier_restricted_Bethe.jpg.pdf noise_isolation.pdf
fourier_restricted_Boltzmann.jpg.pdf noise_isolation_Schrödinger.jpg.pdf
fourier_restricted_Dirac_4.jpg.pdf noise_isolation_Wigner.jpg.pdf
fourier_restricted_Enrico_Fermi.jpg.pdf out.gv
fourier_restricted_Feynman.jpg.pdf parallel_joblog
fourier_restricted_Gauss.jpg.pdf phase_contrast_Abdus_Salam.jpg.pdf
fourier_restricted_Heisenberg_10.jpg.pdf phase_contrast_Bardeen.jpg.pdf
fourier_restricted_Hendrik_lorentz.jpg.pdf phase_contrast_Bethe.jpg.pdf
fourier_restricted_Landau.jpg.pdf phase_contrast_Boltzmann.jpg.pdf
fourier_restricted_Louis_de_Broglie.jpg.pdf phase_contrast_Dirac_4.jpg.pdf
fourier_restricted_Mariecurie.jpg.pdf phase_contrast_Enrico_Fermi.jpg.pdf
fourier_restricted_Michael_Faraday_001.jpg.pdf phase_contrast_Feynman.jpg.pdf
fourier_restricted_Murray_Gell-Mann.png.pdf phase_contrast_Gauss.jpg.pdf
fourier_restricted_Oppenheimer.jpg.pdf phase_contrast_Heisenberg_10.jpg.pdf
fourier_restricted.pdf phase_contrast_Hendrik_lorentz.jpg.pdf
fourier_restricted_Schrödinger.jpg.pdf phase_contrast_Landau.jpg.pdf
fourier_restricted_Wigner.jpg.pdf phase_contrast_Louis_de_Broglie.jpg.pdf
mc.dat phase_contrast_Mariecurie.jpg.pdf
mc.py phase_contrast_Michael_Faraday_001.jpg.pdf
noise_isolation_Abdus_Salam.jpg.pdf phase_contrast_Murray_Gell-Mann.png.pdf
noise_isolation_Bardeen.jpg.pdf phase_contrast_Oppenheimer.jpg.pdf
noise_isolation_Bethe.jpg.pdf phase_contrast.pdf
noise_isolation_Boltzmann.jpg.pdf phase_contrast_Schrödinger.jpg.pdf
noise_isolation_Dirac_4.jpg.pdf phase_contrast_Wigner.jpg.pdf
noise_isolation_Enrico_Fermi.jpg.pdf __pycache__
noise_isolation_Feynman.jpg.pdf solutions
noise_isolation_Gauss.jpg.pdf submit_fourier.py
noise_isolation_Heisenberg_10.jpg.pdf
Getting more parallel
Since we deliberately adjusted the Fourier program to let us specify which output files we wanted to generate, we can get more parallelism by running the three tasks in each job in parallel too.
- How would you change
submit_fourier.sh
to parallelise this aspect of the computation?- When would doing this give a speedup?
- What would the disadvantages of this approach be?
Solution
- The last line of the script would need to change, to add the operation to perform as an extra parameter to parallelise over.
- This would give a speedup when not all CPU cores are busy all the time. This would either be because there were fewer files to process than CPUs allocated to the job (10 in this case), or because some files took much longer than others to process, leaving some cores waiting for others to finish.
- The lines of code common to all tasks now run three times rather than one, limiting the speedup you will see. Profiling will help you understand how much of a problem this will be, but at a minimum the imports of libraries will take a couple of seconds each.
More parameters
The filenames are not the only parameters you could imagine wanting to tune for this program. Identify another variable that you may want to do a parameter sweep of, and adapt both
fourier_new.py
andsubmit_fourier.sh
so that it does a parameter sweep of this variable instead of changing the image.Set the input image to
einstein1_7.jpg
insubmit_fourier.sh
rather than doing a full scan of all images, so that the job will finish in a reasonable time for today’s lesson.In your research, there’s nothing stopping you scanning multiple parameters at once like this, beyond the computational resource and time limits available on the machine.
Key Points
Let your Python programs be controlled by command-line arguments so that GNU Parallel can run them in parallel
Use
argparse
to let command-line arguments control your programs with relatively little work