Cluster Status:
All systems are working properly.
The next scheduled reboot of the head node is 4-Feb-2025.
Slurm Partition Status
Interpreting FRCE metrics
Wait times on this page are presented in two different formats.
The data presented in the Measured Wait Times section is taken from a script that submits jobs to each partition every 15 minutes. Jobs request a reasonable amount of resources depending on the partition selected. The script then measures the difference between the submission time and when the job is executed. For this measurement, the jobs have no dependencies and will not ever hit any limits the number of jobs allowed per user.
Wait times presented by XDmod are more complicated. First, these are averages of all jobs submitted to any partition over the past month. The wait time can also be affected by several variables:
the user may be exceeding limits on the number of cores allowed in a given partition
a particular job may have a dependency on other jobs so that it is not allowed to start until the previous jobs finish
the job may be held for administrative reasons
Summary - the XDmod wait time may not accurately reflect how long any given job submission will take to begin executing.
Measured wait times:
A script is submitted to each of these four partitions every 30 minutes, requesting a set of resources appropriate for each partition. The graphs below show the actual time measured between the time the script was submitted and when it started its execution.
Cluster activity for last month:
A full set of metrics with configurable time frames can be found on the XDMoD site.
Metrics for the last calendar month include: