The second process is not being executed frequently enough compare to the first process

CryALotz · May 7, 2025, 7:28am

Hello,

here is some part of my workflow:

    def input_channel = Channel.from(file_list)
        .filter { 
            def sampleId = it[0].toString()
                
            return !(filter_list.contains(sampleId))
        }

    def downloader_input = input_channel.map{
            tuple(it[0], it[1])
        }	

def cramfile = Downloader(downloader_input)

def downloaded_cram = cramfile.combine(input_channel, by: 0)

def check_sum_input = downloaded_cram
        .map{
            def sampleId = it[0]
            def file_path = it[1]
            def checksum = it[-1]
            tuple(sampleId, file_path, checksum)
        }

def check_result = Checksum(check_sum_input)

After running my Nextflow workflow, it seems to heavily prioritize the Downloader process. I noticed that although around 50 Downloader tasks have finished, only 2 downstream processes have been executed so far.

What I would prefer is for the pipeline to prioritize progressing through the entire workflow, rather than focusing mostly on the first process. How can I adjust the execution behavior to achieve this?

mahesh.binzerpanchal · May 7, 2025, 9:21am

You can use the maxForks directives to make sure there are cpus available for other processes to start. Nextflow will simply go through the script and fire off as many jobs as it can, and schedule them according to available resources (if not using an external scheduler). As long as process.withName:Downloader.(maxForks * cpus) is less than the available cpus, it’ll leave space for the remaining tasks to get in an execute.

CryALotz · May 8, 2025, 1:09am

I try to set maxForks of Downloader less than other process

here some part of my .config file :

process {
  cache = 'lenient'
  executor = 'slurm'
  queue = 'memory'
  maxForks = 10
  withName: Downloader {
    maxForks = 1
  }

executor {
    queueSize = 20
}
}

It’s getting better, but still only around 50% of the completed Downloader tasks are triggering the downstream process.

A couple of things I think might be affecting this:

export NXF_OPTS='-Xms4g -Xmx8g'

and also channel is more than thousand.

mahesh.binzerpanchal · May 8, 2025, 6:48am

I think since you have some data on how long the processes are taking, you can probably refine that maxForks a bit more along with cpus/memory to get a better balance if you think it’s not optimal. Perhaps try with Seqera’s AI to see if it can read the .nextflow.log and achieve a better optimization profile.

Topic		Replies	Views
Sequential Process execution question Ask for help	0	5	June 27, 2025
More advanced learning of nf-core and nextflow Ask for help	4	155	July 18, 2024
Is the parallelization working properly? Ask for help nextflow , nf-core	2	47	August 9, 2024
How to wait for a subworkflow to complete before proceeding to another subworkflow? Ask for help nextflow , nf-core	3	37	April 23, 2025
Pipeline only processes first element of channel Ask for help nextflow	2	227	May 21, 2024

The second process is not being executed frequently enough compare to the first process

Related topics