Hpc05 do's and dont's

Let’s collect best practices for using the cluster: it’s a shared resource, and its proper use requires coordination between users (the alternative—implementing strict usage limits—would be inefficient).

This post is a wiki: feel free to edit!

Getting cluster access

To get an account, email your request and NetID to Jos Thijssen if you are in QN or QuTech, and to Timon Idema or Martin Depken if you are in BN.

Use a fair share of cores

  1. You may freely reserve about a 100 CPU cores for a couple of days.
  2. If there are plenty of free cores, you may reserve more, but not for long, and you should be ready to shut the extra jobs down if somebody asks.

The cluster has around 100 users and some 1000 CPU cores, but most of the time only a handful of people use the cluster simultaneously. We could set up strict usage limits, but it would likely be less efficient.

Keep an eye on your usage

Right now there’s a minimal monitoring setup, but it does tell how CPU-efficient your code is. If you see very low percentages, something is wrong.

Contacting the users

This forum category is the central point of discussion for cluster users. If you have any message to other cluster users, share it here. If you want to notify one cluster user that isn’t yet on the forum, check out their username, then search for it in the outlook address book. Also invite that user to join.

Other

See the TUD hpc wiki for additional information.

2 Likes

Hi all , I am Leila a PhD from BN , and I would like to know if we can also access this cluster? For example to do sequencing analysis, etc.

Thanks in advance,
cheers!

Dig into some details

Learn about different levels of parallelization (thread, process, node).

  • Which one are you using? MPI? OpenMP? python multiprocessing?
  • Is it efficient? Where is your bottleneck? Are you sending much data over the interconnect?
  • How does the code scale with the amount of resources you use? Maybe it is more efficient to use less resources?

Learn to request the exact amount of resources.

  • hpc05 uses PBSPro which can be googled. Learn how commands qsub, qdel, qstat work.
  • Maybe you need more memory per process? Or maybe you are ok with sharing a node with someone else? Learn which handles and parameters are responsible for resources allocation using PBSPro.
  • Check whether it works as intended. Learn how to log into the node and check its runtime load with top / htop. Learn how CPU load and memory usage read.

Make sure things do not interfere (destructively).

  • Your code likely has many different ways to be parallel. Maybe you are using several of them at once? A typical scenario is having a process-level parallelization (such as MPI) together with OpenMP. Are you sure you are running the correct number of processes? Maybe you spawn too much processes and they compete for CPU time because each process uses several cores/CPUs? What is the value of OMP_NUM_THREADS and who sets it?
  • You are using python. Are you sure you know what global interpreter lock (GIL) means? Is it a problem for your code?
  • Just in case: be aware of the hardware. Make sure you are not using hyperthreading or any other kind of “virtual” resources. If you are using a virtualization technology make sure that all relevant hardware (well, CPU) is correctly mapped. Avoid using swap.

This cluster is primarily for simulations, with hardware and software geared to supporting those. There are other clusters at BN for other purposes. Depending on your software needs, these can be better suited for you.
To get access to the hpc05 cluster, you need an account, which for BN runs through the theory PIs (Martin Depken and myself); while these accounts are not limited to people from the theory groups, we do try to limit them to people for whom they’ll actually be useful.

@tidema thanks for the clarification. I’ve added the cluster points of contact for BN to the top post. Feel free to edit if there is some information missing.

A post was split to a new topic: Using PuTTY and MobaXTerm