Advice on best practices for submitting processes that wait for others to complete before starting

Dan_Higgins · March 13, 2024, 10:02pm

Hi,

I’m seeking advice on the architecture of my Nextflow workflow, which currently works well but is approaching its capacity limit.

The workflow involves calling an internet API that allows approximately 10 concurrent calls per IP address. I’ve implemented a process to handle common rate-limiting errors by sleeping and retrying (up to 3 times), and I’ve found that 10 concurrent jobs are the optimal number to avoid too many jobs waiting on 429 errors.

While Nextflow's -resume functionality does a great job of recovering failed batches, I want to ensure that batch sizes are manageable for recovery (~5,000) and that the number of batches running at any time is limited to 10.

Specifically, I’m looking for advice on best practices for creating and submitting batches of batches. For example, Batch A should wait for Batch B to finish before submitting. I’m also open to suggestions for implementing a process pool with a limit of 10, where the next process starts as soon as one completes until all batches are processed.

Some additional notes:

Input and output to the process are file paths.
I’m not looking for specific code examples but rather for an architectural pattern.
Currently, I’m using a workaround involving .collect() to wait on the output from the previous process before calling an alias back to the same process, but I’m sure there’s a better approach.

Thanks for any insights you can provide.
-Dan

mribeirodantas · March 14, 2024, 2:30am

Have you thought of using maxForks so that you don’t have more than N tasks at the same time calling this API? More info on this process directive here.

Dan_Higgins · March 14, 2024, 12:08pm

Hi @mribeirodantas ,
This is exactly what I need.
A simple and elegant solution.

Thanks so much!
-Dan

system · March 21, 2024, 12:08pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The second process is not being executed frequently enough compare to the first process Ask for help nextflow	3	17	May 8, 2025
Nextflow process hangs with no error message Ask for help nextflow	1	100	February 9, 2025
Wait and retry if not ready Ask for help nextflow	3	48	November 9, 2024
Limiting parallelism with max memory cap? Ask for help	2	160	March 19, 2024
Nextflow Error Ask for help nextflow , nf-core , google-cloud , platform	5	418	July 1, 2024

Advice on best practices for submitting processes that wait for others to complete before starting

Related topics