Long queuing times

Hi :wave: recently the queuing time is much longer than usual. My jobs have been waiting for 3 hours even though the monitoring page suggests that almost all cores (872) are free?

A colleague of mine is facing the same issue.
Can you help us out here?

Thank you :pray:

Hey @Luise, how do you confirm this? Launching an interactive job with qsub -I does take long, but it’s 20 seconds rather than hours.

Hey @anton-akhmerov , I use the qstat command to check the status β€œR” or β€œQ”. I submitted the following jobs yesterday (lprielinger) and just checked: they are still in the β€œQ” status.

Screenshot 2023-11-02 at 09.38.43

I think the issue is related to the size of the requested resources – I tried different test submissions now

#PBS -l nodes=1:ppn=25 got the β€œR” status within a few seconds
#PBS -l nodes=1:ppn=30 stayed in β€œQ” for 1min then I cancelled the job

In my initial script I specified #PBS -l nodes=3:ppn=30. I wonder if this is simply too large, but then previous submissions of the same size were usually accepted?

You seem to be asking for 30 cpu cores per node. Each node has 2x10 core CPUs with 2x hyperthreading, so they show up as 40 CPU cores. I believe that for practical purposes using more than the amount of physical cores does not provide speedup, and therefore I believe you’re better off not using more than ppn=20. Still, I don’t actually know why ppn=30 doesn’t run unless all nodes have more than 10 cores reserved.

Thanks, I followed your instructions and all jobs are getting executed again. :slight_smile:

1 Like