Is the parallelization working properly?

I have a simple pipeline which just concatenates and does multiQC:

workflow CHAMPLAIN {

    take:
    ch_samplesheet // channel: samplesheet read in from --input

    main:

    ch_versions = Channel.empty()
    ch_multiqc_files = Channel.empty()

    ch_samplesheet
    .map { sample_ID,fastqList ->
        def files = fastqList[0].split(',')
        [sample_ID, tuple(files)]
    }
    .set{ch_samples}

    CAT_FASTQ (ch_samples)

    FASTQC (CAT_FASTQ.out.reads)

    ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]})
    ch_versions = ch_versions.mix(FASTQC.out.versions.first())
   etc etc

But I don’t think it’s parallelizing correctly, attached is the pipeline_info file (it’s still running so I am attaching the .txt file, some thing were cached), but the point is that the execution seems to be sequential
execution_trace_2024-08-08_10-01-07.txt (9.9 KB)

It’s two steps, CONCATENATION and then FASTQC, It seems that it’s running everything sequentially.

The slurm parameters I have as:

#!/bin/bash                                                                                         
#SBATCH --partition=short                                                                           
#SBATCH --nodes=1                                                                                   
#SBATCH --cpus-per-task=32                                                                          
#SBATCH --mem=64G                                                                                   
#SBATCH --time=3:00:00

any suggestions?

Well, what are your resource allocations for these processes? If they use anywhere close to 32GB of memory or 16 CPUs (the runner process will also use part of your total allocation), only one task will be able to run at the same time. Why don’t you use the slurm executor?

Thank you!! And this was the issue, very basic indeed. I had no idea that the slurm executor existed so will use that.

1 Like