Help with Parallelizing Sample Processing in Nextflow

dacon06 · August 20, 2025, 3:00pm

Dear Nextflow Community,

I’m new to Nextflow and still learning. After going through the training, I tried building my own preprocessing workflow for scRNA-seq data in main.nf. My current workflow looks like this:

nextflow.enable.dsl=2

workflow {

    Channel
        .fromPath("${params.input_dir}/*", type: 'dir')
        .ifEmpty { error "❌ No sample directories found in: ${params.input_dir}" }
        .set { sample_dirs_ch }
    
    // Read metadata
    raw_ch = ReadMetadata(sample_dirs_ch)
}

// Process to read metadata from each sample
process ReadMetadata {
    tag "${input_file.getName()}"

    conda "${params.env_dir}/read_metadata.yaml"

    input:
    path input_file

    output:
    path "${input_file.simpleName}.raw.h5ad"

    publishDir "${params.output_dir}", mode: 'copy', overwrite: true

    script:
    """
    python ${workflow.projectDir}/scripts/read_metadata.py \
        --input ${input_file} \
        --output ${input_file.simpleName}.raw.h5ad
    """
}

This workflow works as expected, but the ReadMetadata process is quite slow because it processes each sample sequentially in my opinion (I don’t know how to check if it is not parallel inside the process).

I would like to parallelize this process so that each sample is processed independently, leveraging multiple CPU cores. Could someone guide me on the best way to achieve parallel execution for each sample in a process?

Thank you in advance for your help!

Best regards

Alexander_Nater · August 22, 2025, 10:59am

Assuming that each sample has its own directory in params.input_dir (it’s confusing that you call the input to your ReadMetadataprocess input_file), the samples will be processed in parallel by Nextflow given that you have enough resources to do so. Assuming this is running in local mode, you don’t seem to specifiy any process resource requirement like memory or cpus, so Nextflow has no idea how many resources each task of the process needs. You might have a default for all processes somewhere in your config. But without properly setting the available and required system resources, Nextflow cannot effectively parallelize your tasks.

dacon06 · August 22, 2025, 11:02am

// nextflow.config
params {
    experiment = "msc"
    input_dir  = "data/${params.experiment}/data/base"
    output_dir = "data/${params.experiment}/data/temp/temp1"
    env_dir = "envs"
}

process {
    executor = 'local'
    memory   = '100 GB'
    time     = '100d'

    // Dynamically detect total CPUs
    cpus = Runtime.runtime.availableProcessors()
}

conda {
    enabled     = true
    useMicromamba = true
}

It was my error that i didn’t share my nextflow.config. Here i specified cpus and memory.

Alexander_Nater · August 22, 2025, 11:58am

Ok, you are mixing up two things here: Available resources for the executor vs. resource requirement for process tasks. At the moment, you are telling Nextflow that every task needs 100 GB of memory and all available CPUs. Therefore, only one task can run at the same time. Process requirements are under the process scope, available resources under the executor scope: Configuration options — Nextflow documentation

Your config should therefore look like this:

process {
     executor = 'local'
     cpus =  1
     memory = 4.GB
     time = 1.h

executor {
    name = 'local'
    cpus = Runtime.runtime.availableProcessors() 
    memory = 100.GB

dacon06 · August 22, 2025, 2:45pm

Oh, that’s embarrassing . Thank you so much, this is exactly the solution I needed!

Topic		Replies	Views
Running workflow on multiple samples Ask for help nextflow	4	320	August 12, 2024
Unable to run parallel threads Ask for help nextflow , aws	4	182	April 23, 2024
Inability to parallelize sequential processes Ask for help nextflow	4	52	December 3, 2024
Nextflow configuration for sequential and parallel execution in multipod Ask for help	6	528	January 2, 2024
Nextflow only processes one of my paired-end samples Ask for help	3	332	February 8, 2024

Help with Parallelizing Sample Processing in Nextflow

Related topics