Python for High Performance Computing: Glossary

Key Points

Getting started with Python on Supercomputing Wales	Use `module load anaconda/2021.05` and `source activate` to get started with Anaconda on Sunbird Create new conda environments with `conda create` when the `base` Anaconda set of packages doesn’t meet your needs
Hardware and software design	Python is an implicitly-typed, interpreted language, meaning that we must do work to overcome this to gain performance comparable to compiled, explicitly-typed languages. Python has a Global Interpreter Lock, which allows performant single-threaded code at the expense of multithreading performance Typically, performance is gained from efficient use of available vector units, CPU cores, accelerators such as GPUs, and nodes. We need to ensure that we don’t accidentally stop our programs making use of these if we want to get optimal performance.
GNU Parallel for quick gains	Let your Python programs be controlled by command-line arguments so that GNU Parallel can run them in parallel Use `argparse` to let command-line arguments control your programs with relatively little work
Profiling to identify bottlenecks	Use the `cProfile` module to get an overview of where your program is spending time Save the results to a file and copy them to your machine to analyse them visually Check profiles again after optimisation to see whether it was successful
Numpy (and Scipy)	Numpy provides datastructures for arbitrary-dimensional arrays of homogeneous data Whole-array operations are significantly faster than Python loops across arrays (or lists) Scipy is very comprehensive; if you are doing something that someone has probable done before, then search to see if a library function exists before writing your own implementation, since it will probably be faster
Multiprocessing with Pathos	The `ParallelPool` in Pathos can spread work to multiple cores. Look closely at loops and other large operations in your program to identify where there is no data dependency, and so parallelism can be added Pyina has functions that extend the parallel map to multiple nodes
Numba for automatic optimisation	Use the `@jit` decorator to just-in-time compile Python functions with Numba Pay attention to the kinds of object and operation you use in your functions to allow Numba to optimise them
Dask for parallel data operations	Dask will parallelise across as many resources as it is asked to Dask also has drop-in replacements for many other common operations, e.g. in scikit-learn
Summary

Glossary

The glossary would go here, formatted as:

{:auto_ids}
key word 1
:   explanation 1

key word 2
:   explanation 2

({:auto_ids} is needed at the start so that Jekyll will automatically generate a unique ID for each item to allow other pages to hyperlink to specific glossary entries.) This renders as:

key word 1: explanation 1
key word 2: explanation 2