OOMs using Nextflow that don't happen when submitting jobs manually

ejade42 · July 18, 2025, 12:49am

Hi team,

I previously constructed a Nanopore analysis pipeline in an ad-hoc manner using a bunch of bash scripts manually submitted to slurm. I am now trying to make this easier to use and more reproducible by converting it to a Nextflow pipeline.

However, when my scripts are submitted via Nextflow, I am getting OOM crashes even with several times the resource allocation that worked for individual shell scripts. For example, a basic Nanopore sequencing indexing script that takes FASTQ.GZ input, maps using minimap2, then sorts using samtools sort, works perfectly when submitted as an individual SLURM job with 48GB and 16 cpus.

However, when the equivalent script is submitted via Nextflow, I am consistently getting OOM crashes even when setting the resources to 96GB (which I have tried with anywhere from 2 to 32 CPUs - nothing makes a difference). I have tried all sorts of reconfigurations, such as changing whether minimap2 and samtools sort are connected by standard output piping or by writing to an intermediate file, and changing slurm submission options such as hyperthreading and node exclusivity. None of it has worked and I have not seen a single run of this script via Nextflow that hasn’t failed due to running out of memory.

Do you have any idea why resource requirements would be so incredibly different between single bash scripts submitted manually versus the same script submitted via Nextflow?

Many thanks,

Evelyn

yinshiyi · July 18, 2025, 10:47pm

You have tried a smaller test job to ensure nextflow pipeline was running expected?
I wonder if you could share which module(s) are crashing specifically OOM.

ejade42 · July 21, 2025, 12:10am

Hi, jobs with trivially small input datasets (e.g. pre-filtered) work fine with the exact same code. When inputting a <100 MB FASTQ, everything works as expected. Input files up to around 50 GB seem to be working okay. However, very large inputs of >100 GB give an OOM crash without fail.

I contacted the support team for my HPC cluster and they suspected that something in the nextflow configuration is causing SAMtools to try to store temporary files in memory rather than on disc. When running normally via slurm (without Nextflow), SAMtools will produce hundreds of GB of temporary files in some cases, then merge and delete them afterwards. If these are being held in memory rather than written to disk, it makes sense that I am getting OOM crashes.

If this is the issue, how can I configure nextflow to allow SAMtools to write to temporary files rather than trying to hold everything in memory?

Thanks,
Evelyn

Alexander_Nater · July 22, 2025, 11:20am

Hi Evelyn. Have a look at the -m option of samtools sort. Keep in mind that this is the max memory per thread. If you use minimap2 and samtools sort in a pipe, you have to carefully manage the memory usage between these two tools. I would recommend to keep the number of threads for samtools sort low, restrict it to <1 GB memory per thread and use -T to a fast storage to write out temporary files during sorting.

Topic		Replies	Views
Slurm requiring multiple resumes for pipeline advancement Ask for help	1	53	August 13, 2024
Memory allocation issue Ask for help	3	223	March 27, 2024
Slurm jobs not being submitted Ask for help nextflow	6	246	July 24, 2024
Feature idea: Optimised memory allocation through preemptive adjustment to avoid anticipated failures Ask for help nextflow	3	269	July 13, 2024
Troubleshooting why Nextflow is not capturing an Out of Memory (OOM) error (exit status 137) in a piped command within a process Ask for help nextflow , platform	3	287	November 18, 2024

OOMs using Nextflow that don't happen when submitting jobs manually

Related topics