After running my Nextflow workflow, it seems to heavily prioritize the Downloader process. I noticed that although around 50 Downloader tasks have finished, only 2 downstream processes have been executed so far.
What I would prefer is for the pipeline to prioritize progressing through the entire workflow, rather than focusing mostly on the first process. How can I adjust the execution behavior to achieve this?
You can use the maxForks directives to make sure there are cpus available for other processes to start. Nextflow will simply go through the script and fire off as many jobs as it can, and schedule them according to available resources (if not using an external scheduler). As long as process.withName:Downloader.(maxForks * cpus) is less than the available cpus, it’ll leave space for the remaining tasks to get in an execute.
I think since you have some data on how long the processes are taking, you can probably refine that maxForks a bit more along with cpus/memory to get a better balance if you think it’s not optimal. Perhaps try with Seqera’s AI to see if it can read the .nextflow.log and achieve a better optimization profile.