The second process is not being executed frequently enough compare to the first process

Hello,

here is some part of my workflow:

    def input_channel = Channel.from(file_list)
        .filter { 
            def sampleId = it[0].toString()
                
            return !(filter_list.contains(sampleId))
        }

    def downloader_input = input_channel.map{
            tuple(it[0], it[1])
        }	

def cramfile = Downloader(downloader_input)

def downloaded_cram = cramfile.combine(input_channel, by: 0)

def check_sum_input = downloaded_cram
        .map{
            def sampleId = it[0]
            def file_path = it[1]
            def checksum = it[-1]
            tuple(sampleId, file_path, checksum)
        }

def check_result = Checksum(check_sum_input)

After running my Nextflow workflow, it seems to heavily prioritize the Downloader process. I noticed that although around 50 Downloader tasks have finished, only 2 downstream processes have been executed so far.

What I would prefer is for the pipeline to prioritize progressing through the entire workflow, rather than focusing mostly on the first process. How can I adjust the execution behavior to achieve this?

You can use the maxForks directives to make sure there are cpus available for other processes to start. Nextflow will simply go through the script and fire off as many jobs as it can, and schedule them according to available resources (if not using an external scheduler). As long as process.withName:Downloader.(maxForks * cpus) is less than the available cpus, it’ll leave space for the remaining tasks to get in an execute.

1 Like

I try to set maxForks of Downloader less than other process

here some part of my .config file :

process {
  cache = 'lenient'
  executor = 'slurm'
  queue = 'memory'
  maxForks = 10
  withName: Downloader {
    maxForks = 1
  }

executor {
    queueSize = 20
}
}

It’s getting better, but still only around 50% of the completed Downloader tasks are triggering the downstream process.

A couple of things I think might be affecting this:

export NXF_OPTS='-Xms4g -Xmx8g'

and also channel is more than thousand.

I think since you have some data on how long the processes are taking, you can probably refine that maxForks a bit more along with cpus/memory to get a better balance if you think it’s not optimal. Perhaps try with Seqera’s AI to see if it can read the .nextflow.log and achieve a better optimization profile.