Slurm node overhead

Wondering if anyone has experience actually admining slurm clusters and could shead a little light on something that I’ve never found a definate answer on.

When assigning processes to specific nodes and queues and requesting memory, I believe our nodes boot with their OS image to RAM(or so I was told). There’s no physical disk in each node.

That means there’s around 10GB of memory overhead for the OS is what I was quoted by our HPC admins a few years ago.

I’ve used this command to get node specs:

sinfo --Node --long
Sun Jan 28 17:07:52 2024
NODELIST        NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
compute-1-1-0       1      256i   allocated   20   2:10:1 256000        0      1   (null) none                
compute-1-1-1       1      256i   allocated   20   2:10:1 256000        0      1   (null) none                

Which shows these nodes have 256GB.

I’ve also used

scontrol show nodes
NodeName=compute-7-6-39 Arch=x86_64 CoresPerSocket=8 
   CPUAlloc=16 CPUTot=16 CPULoad=21.24
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=compute-7-6-39 NodeHostName=compute-7-6-39 Version=18.08
   OS=Linux 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 
   RealMemory=32106 AllocMem=0 FreeMem=188 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=normal 
   BootTime=2023-11-30T14:30:57 SlurmdStartTime=2023-11-30T14:32:06
   CfgTRES=cpu=16,mem=32106M,billing=16
   AllocTRES=cpu=16,mem=32106M,billing=16
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

Which will say 32GB, but occasionally a job will run over the memory limit when I set it to 32GB with something like this samtools sort: couldn't allocate memory for bam_mem (#108) · Issues · GUDMAP_RBK / RNA-seq · GitLab

I think it’s a standard practice with HPC schedulers to not expect to be able to use the full memory available on a node. Some schedulers might report the full memory, or report an adjusted amount based on what they think the OS requires, or limit the memory that you can request for a given node. Either way, if you aren’t sure how much memory the OS actually requires and there aren’t any extra guardrails, then you just have to play with it and see how much you can get away with.

1 Like