Supercomputing Wales Introduction
|
A cluster is a group of computers connected together to act as one.
Clusters are formed of nodes, each usually has several processors and 10s or hundreds of gigabytes of RAM.
Supercomputing Wales has clusters for researchers at Welsh universities to use
|
Logging in to Supercomputing Wales
|
ssh sunbird.swansea.ac.uk or ssh hawklogin.cf.ac.uk to log in to the system
sinfo shows partitions and how busy they are.
slurmtop shows another view of how busy the system is.
|
Filesystems and Storage
|
The home directory is the default place to store data.
The scratch directory is a larger space for temporary files.
On Hawk in Cardiff home is backed up but is also a slower disk.
On Sunbird in Swansea neither is backed up.
Quotas on home are much smaller than scratch.
|
Running Jobs with Slurm
|
Interactive jobs let you test out the behaviour of a command, but aren’t pratical for running lots of jobs
Batch jobs are suited for submitting a job to run without user interaction.
Job arrays are useful to submit lots of jobs.
Slurm lets you set parameters about how many processors or nodes are allocated, how much memory or how long the job can run.
Slurm can email you when a job starts or finishes.
|
Working with Modules and Installing Software
|
A lot of software is only available by loading extra modules. This helps prevent problems where two packages are incompatible.
Python 3 is one such package.
If you want to install pip packages use the --user option to store the packages in your home directory.
|
What next?
|
Remember that HPCs are shared systems and try avoid allocating resources which you don’t use
Don’t make millions of files
You will need to apply for or using an existing project on Supercomputing Wales
Make use of the Research Software Engineers to help you use the system effectively
|
Optimising for Parallel Processing
|
|
Running on GPUs
|
|
Profiling Single Core Performance
|
Each programming language typically provides tools called profilers with which you can analyse the runtime of your code.
The estimate of pi spends most of it’s time while generating random numbers.
The estimation of pi with the Monte Carlo method is a compute bound problem because pseudo-random numbers are just algorithms.
|
Parallel Estimation of Pi
|
Amdahl’s law is a description of what you can expect of your parallelisation efforts.
Use the profiling data to calculate the time consumption of hot spots in the code.
The generation and processing of random numbers can be parallelized as it is a data parallel task.
Time consumption of a single application can be measured using the time utility.
The ratio of the run time of a parallel program divided by the time of the equivalent serial implementation, is called speed-up.
|
Distributing computations among computers with MPI
|
The MPI driver mpirun sends compute jobs to a set of allocated computers.
The MPI software then executes these jobs on the remote hosts and synchronizes their state/memory.
MPI assigns a rank to each process, usually the one with a rank of zero does the coordination
MPI can be used to split a task into components and have several nodes run them.
|
If you want to use SFTP from the command line instead of with Filezilla then these commands might be helpful.
sftp s.jane.doe@sunbird.swansea.ac.uk:/home/s.jane.doe/data
s.jane.doe@sunbird.swansea.ac.uk's password:
Connected to sunbird.swansea.ac.uk.
Changing to: /home/s.jane.doe/data
sftp> ls