Handling large numbers of processes in an HPC

ramirobarrantes · June 18, 2025, 5:16pm

I need to run Sarek on a large number of genomes (more than 10,000). Our HPC is set up with apptainer and works pretty well. I think I can go two ways:

Run Sarek as I usually do, and let it handle all the particularities of the process managements in terms of memory, etc.
Create a bash script with a certain amount of memory and resources, and run Sarek in there without specifying a profile.

I am just concerned that (1) might overwhelm our cluster, not sure. I think (2) might be more friendly for sysadmins. I was also thinking to sleep the loop that starts the processes every 100 genomes for some time to give the system a chance to catch up. Do you have any suggestions when handling these kinds of situations? It is about how to run things without creating problems for the HPC and/or the sysadmins.

mahesh.binzerpanchal · June 19, 2025, 7:05am

Use the executor scope to control the number of jobs, etc:

executor.queueSize, and executor.submiteRateLimit should help ease the load. So you can go with option 1) with this route.

ramirobarrantes · July 10, 2025, 10:59pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nextflow run stopped for uknown reason Ask for help nf-core	2	98	August 20, 2024
Memory allocation issue Ask for help	3	211	March 27, 2024
Nf- core/sarek on aws batch error Ask for help nextflow , aws	2	41	April 15, 2025
Handling lots of IO (large files or lots of small files) on HPC systems Ask for help hpc	1	296	December 18, 2023
SIngle nextflow job vs the executor Ask for help	2	150	August 9, 2024

Handling large numbers of processes in an HPC

Related topics