Slurm node overhead

emiller · January 28, 2024, 11:13pm

Wondering if anyone has experience actually admining slurm clusters and could shead a little light on something that I’ve never found a definate answer on.

When assigning processes to specific nodes and queues and requesting memory, I believe our nodes boot with their OS image to RAM(or so I was told). There’s no physical disk in each node.

That means there’s around 10GB of memory overhead for the OS is what I was quoted by our HPC admins a few years ago.

I’ve used this command to get node specs:

sinfo --Node --long

Sun Jan 28 17:07:52 2024
NODELIST        NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
compute-1-1-0       1      256i   allocated   20   2:10:1 256000        0      1   (null) none                
compute-1-1-1       1      256i   allocated   20   2:10:1 256000        0      1   (null) none

Which shows these nodes have 256GB.

I’ve also used

scontrol show nodes

NodeName=compute-7-6-39 Arch=x86_64 CoresPerSocket=8 
   CPUAlloc=16 CPUTot=16 CPULoad=21.24
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=compute-7-6-39 NodeHostName=compute-7-6-39 Version=18.08
   OS=Linux 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 
   RealMemory=32106 AllocMem=0 FreeMem=188 Sockets=2 Boards=1
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=normal 
   BootTime=2023-11-30T14:30:57 SlurmdStartTime=2023-11-30T14:32:06
   CfgTRES=cpu=16,mem=32106M,billing=16
   AllocTRES=cpu=16,mem=32106M,billing=16
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

Which will say 32GB, but occasionally a job will run over the memory limit when I set it to 32GB with something like this samtools sort: couldn't allocate memory for bam_mem (#108) · Issues · GUDMAP_RBK / RNA-seq · GitLab

bentsherman · January 29, 2024, 5:00pm

I think it’s a standard practice with HPC schedulers to not expect to be able to use the full memory available on a node. Some schedulers might report the full memory, or report an adjusted amount based on what they think the OS requires, or limit the memory that you can request for a given node. Either way, if you aren’t sure how much memory the OS actually requires and there aren’t any extra guardrails, then you just have to play with it and see how much you can get away with.

emiller · September 29, 2024, 10:57pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Memory allocation issue Ask for help	3	213	March 27, 2024
Spades job in nf-core/denovotranscript pipeline fails with memory issues Ask for help platform , slurm	2	54	October 9, 2024
Slurm killing platform jobs Ask for help hpc , platform , slurm	3	312	April 8, 2024
Slurm requiring multiple resumes for pipeline advancement Ask for help	1	50	August 13, 2024
OOMs using Nextflow that don't happen when submitting jobs manually Ask for help nextflow , slurm	3	24	July 22, 2025

Slurm node overhead

Related topics