This lesson is being piloted (Beta version)

Summary

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • Which techniques should I use for which types of problem?

  • What should I do next?

Objectives

Breakdown by component

Vector units

If you are not utilising vector units, consider using Numpy data types rather than built-in ones. If that is impractical, then look into using Numba’ to just-in-time compile your application.

Multiple cores

Does your program naturally divides into many work items that can be processed simultaneously (for example, a few thousand images, each of which need to be independently analysed, or different parameter sets that need to be fitted)? If so, and the work items are large (a minute or longer each), then consider using GNU Parallel to run many copies of an existing program on one node. If the work items are small (a few seconds each), then consider using Pathos to run many copies of individual Python functions on one node. If not, then examine where the bottlenecks in your application are, and what data structures are dominating this. If your application maps nicely to Numpy data structures and/or Scipy algorithms, then these will automatically use multiple cores where possible. Otherwise, You may be able to use Pathos to run individual subtasks in parallel, or you may be able to use Dask to partition large data structures across multiple cores.

GPUs

Check if there are any existing libraries that are related to the problem you are solving. If so, then try making use of these. Otherwise, look into using Numba to compile specific functions to run on the GPU.

Multiple nodes

Are you already fully using parallelism on a single node? If not, then look into this first. If yes, then most of the technologies described above for multiple cores will also work across multiple nodes—specifically, GNU Parallel (when used carefully), Pathos, and Dask.

What next?

This lesson has covered a lot of new topics in a short period of time. It’s tempting to try and apply them all at once. It’s tempting to try and jump in and rearchitect an entire application from scratch to use them all, but that isn’t likely to be the best course of action. Small, targeted changes are the best place to start, to show that things are working before you have spent a huge amount of time on them.

Some general thoughts:

A hackathon event is a good opportunity to start on this road, with others there to share the journey with you.

Key Points