SIngle nextflow job vs the executor

I recently asked this to one of our sysadmins:

…the pipeline consists of running many processes, so I could reserve say, one node with 32 processors and 32gb and 3 hours, or our workflow manager would call each process separately calling 1 node 1GB and a few minutes each time. Is any “style” better for our HPC?

In other words, I can run a Nextflow pipeline as a larger single slurm job with all the processes running locally, or have nextflow use the slurm executor to run all the individual processes as separate smaller slurm jobs.

This is what he said:

Most definitely! It is much better for our HPC and for you if you can ‘bundle’ up steps that do not take much time into a single Slurm job. Slurm is quite fast at starting jobs, but there is still overhead when starting a job, and that will add up if you have many of them.

What do people think? Do you have any suggestions?

You shouldn’t have to worry about how SLURM works. Nextflow will take care of this for you. Make sure you tell Nextflow what resources are needed for each process, and Nextflow will make sure it’s done the best way, regardless of where it is run (through SLURM, in your case).

But yes, if you have tasks that are very very fast (you sure you have those and they’re a lot?), having each of them as a job in SLURM is not the best way of doing that. You can batch your tasks (more info here), or use Job Arrays (more info here).

Also, for the sake of clarity, “reserve one node …” means you wouldn’t be using SLURM (not setting executor = slurm). You’d be running Nextflow just the same way you would do if it was in your laptop locally.

Thank you @mribeirodantas ! I don’t think I have a lot of quick jobs, but a job array sounds like a good option!

1 Like