Getting started with Python on Supercomputing Wales
|
Use module load anaconda/2021.05 and source activate to get started with Anaconda on Sunbird
Create new conda environments with conda create when the base Anaconda set of packages doesn’t meet your needs
|
Hardware and software design
|
Python is an implicitly-typed, interpreted language, meaning that we must do work to overcome this to gain performance comparable to compiled, explicitly-typed languages.
Python has a Global Interpreter Lock, which allows performant single-threaded code at the expense of multithreading performance
Typically, performance is gained from efficient use of available vector units, CPU cores, accelerators such as GPUs, and nodes. We need to ensure that we don’t accidentally stop our programs making use of these if we want to get optimal performance.
|
GNU Parallel for quick gains
|
Let your Python programs be controlled by command-line arguments so that GNU Parallel can run them in parallel
Use argparse to let command-line arguments control your programs with relatively little work
|
Profiling to identify bottlenecks
|
Use the cProfile module to get an overview of where your program is spending time
Save the results to a file and copy them to your machine to analyse them visually
Check profiles again after optimisation to see whether it was successful
|
Numpy (and Scipy)
|
Numpy provides datastructures for arbitrary-dimensional arrays of homogeneous data
Whole-array operations are significantly faster than Python loops across arrays (or lists)
Scipy is very comprehensive; if you are doing something that someone has probable done before, then search to see if a library function exists before writing your own implementation, since it will probably be faster
|
Multiprocessing with Pathos
|
The ParallelPool in Pathos can spread work to multiple cores.
Look closely at loops and other large operations in your program to identify where there is no data dependency, and so parallelism can be added
Pyina has functions that extend the parallel map to multiple nodes
|
Numba for automatic optimisation
|
|
Dask for parallel data operations
|
Dask will parallelise across as many resources as it is asked to
Dask also has drop-in replacements for many other common operations, e.g. in scikit-learn
|
Summary
|
|
{:auto_ids}
key word 1
: explanation 1
key word 2
: explanation 2