This lesson is being piloted (Beta version)

Python for High Performance Computing: Glossary

Key Points

Getting started with Python on Supercomputing Wales
  • Use module load anaconda/2021.05 and source activate to get started with Anaconda on Sunbird

  • Create new conda environments with conda create when the base Anaconda set of packages doesn’t meet your needs

Hardware and software design
  • Python is an implicitly-typed, interpreted language, meaning that we must do work to overcome this to gain performance comparable to compiled, explicitly-typed languages.

  • Python has a Global Interpreter Lock, which allows performant single-threaded code at the expense of multithreading performance

  • Typically, performance is gained from efficient use of available vector units, CPU cores, accelerators such as GPUs, and nodes. We need to ensure that we don’t accidentally stop our programs making use of these if we want to get optimal performance.

GNU Parallel for quick gains
  • Let your Python programs be controlled by command-line arguments so that GNU Parallel can run them in parallel

  • Use argparse to let command-line arguments control your programs with relatively little work

Profiling to identify bottlenecks
  • Use the cProfile module to get an overview of where your program is spending time

  • Save the results to a file and copy them to your machine to analyse them visually

  • Check profiles again after optimisation to see whether it was successful

Numpy (and Scipy)
  • Numpy provides datastructures for arbitrary-dimensional arrays of homogeneous data

  • Whole-array operations are significantly faster than Python loops across arrays (or lists)

  • Scipy is very comprehensive; if you are doing something that someone has probable done before, then search to see if a library function exists before writing your own implementation, since it will probably be faster

Multiprocessing with Pathos
  • The ParallelPool in Pathos can spread work to multiple cores.

  • Look closely at loops and other large operations in your program to identify where there is no data dependency, and so parallelism can be added

  • Pyina has functions that extend the parallel map to multiple nodes

Numba for automatic optimisation
  • Use the @jit decorator to just-in-time compile Python functions with Numba

  • Pay attention to the kinds of object and operation you use in your functions to allow Numba to optimise them

Dask for parallel data operations
  • Dask will parallelise across as many resources as it is asked to

  • Dask also has drop-in replacements for many other common operations, e.g. in scikit-learn

Summary

Glossary

The glossary would go here, formatted as:

{:auto_ids}
key word 1
:   explanation 1

key word 2
:   explanation 2

({:auto_ids} is needed at the start so that Jekyll will automatically generate a unique ID for each item to allow other pages to hyperlink to specific glossary entries.) This renders as:

key word 1
explanation 1
key word 2
explanation 2