Hpc05 do's and dont's

Dig into some details

Learn about different levels of parallelization (thread, process, node).

  • Which one are you using? MPI? OpenMP? python multiprocessing?
  • Is it efficient? Where is your bottleneck? Are you sending much data over the interconnect?
  • How does the code scale with the amount of resources you use? Maybe it is more efficient to use less resources?

Learn to request the exact amount of resources.

  • hpc05 uses PBSPro which can be googled. Learn how commands qsub, qdel, qstat work.
  • Maybe you need more memory per process? Or maybe you are ok with sharing a node with someone else? Learn which handles and parameters are responsible for resources allocation using PBSPro.
  • Check whether it works as intended. Learn how to log into the node and check its runtime load with top / htop. Learn how CPU load and memory usage read.

Make sure things do not interfere (destructively).

  • Your code likely has many different ways to be parallel. Maybe you are using several of them at once? A typical scenario is having a process-level parallelization (such as MPI) together with OpenMP. Are you sure you are running the correct number of processes? Maybe you spawn too much processes and they compete for CPU time because each process uses several cores/CPUs? What is the value of OMP_NUM_THREADS and who sets it?
  • You are using python. Are you sure you know what global interpreter lock (GIL) means? Is it a problem for your code?
  • Just in case: be aware of the hardware. Make sure you are not using hyperthreading or any other kind of “virtual” resources. If you are using a virtualization technology make sure that all relevant hardware (well, CPU) is correctly mapped. Avoid using swap.