Understanding Subworkflow logic

MTDouglas · June 26, 2024, 5:29pm

Hello,
I have a Nextflow subworkflow that performs Minimap2 Align → Bam_Sort_Stats_Samtools → Mosdepth. The input to this subwork flow is a Channel with the structure:
ch_sample // channel: [tuple val(meta), path(fastqs), path(scaffolds), path(fasta)] .
Upon using ‘–resume’ when running my pipeline that calls the subwork flow. I get inconsistent errors with the Mosdepth step. Its about a 50% chance that it runs without errors but in cases where an error occurs, I can see that the inputted Bam and Bai file do not correspond to the same sample. I’m trying to figure out how this could be possible since only one samples information should be processed at a time within the subworkflow based on the singular channel input? Am I missing something?
Subworkflow:

workflow ALIGNMENT_PER_REF {

    take:
    ch_sample // channel: [tuple val(meta), path(fastqs), path(scaffolds), path(fasta)]

    main:

    ch_versions = Channel.empty()

    //
    // MODULE: Minimap2 for aligning to Hepatitis C
    //
    MINIMAP2_ALIGN (
        ch_sample,
        true,
        'bai',
        false,
        false
    )
    ch_versions = MINIMAP2_ALIGN.out.versions

    //
    // BAM_SORT_STATS_SAMTOOLS: get stats from the alignment
    //
    BAM_SORT_STATS_SAMTOOLS(
        MINIMAP2_ALIGN.out.bam,
    )
    ch_versions = BAM_SORT_STATS_SAMTOOLS.out.versions

    //
    // MOSDEPTH: Calculate genome wide sequencing coverage
    //
    MOSDEPTH(
        BAM_SORT_STATS_SAMTOOLS.out.bam,
        BAM_SORT_STATS_SAMTOOLS.out.bai,
        ch_sample.map{
            meta, fastqs, scaffolds, fasta -> [fasta]
        },
        [],
        // BAM_SORT_STATS_SAMTOOLS.out.fasta
    )

    ch_versions = MOSDEPTH.out.versions

    emit:
    // TODO nf-core: edit emitted channels

    global_txt      = MOSDEPTH.out.global_txt
    summary_txt     = MOSDEPTH.out.summary_txt
    regions_txt     = MOSDEPTH.out.regions_txt
    bam         = BAM_SORT_STATS_SAMTOOLS.out.bam   // channel: [ val(meta), [ bam ] ]
    bai         = BAM_SORT_STATS_SAMTOOLS.out.bai   // channel: [ val(meta), [ bai ] ]


    versions = ch_versions                     // channel: [ versions.yml ]
}

Alexander_Nater · June 27, 2024, 4:45pm

I assume you are using nf-core subworkflows and modules here? What version of the MOSDEPTH module are you using, because your inputs don’t correspond at all to the current input specification of the module? Anyways, your problem has nothing to do with resume. You are wrongly assuming that the samples in the bam and bai channels from the subworkflow are in the same order. They are the outputs from two different processes (SAMTOOLS_SORT and SAMTOOLS_INDEX) and the order of samples in these channels is pretty much arbitrary. For that reason, I prefer to always keep files and their indices paired up in channels, i.e. [ val(meta), path(bam), path(bai) ] and emit a single bam_bai channel instead of two separate channels.

MTDouglas · July 2, 2024, 1:06pm

Yes I had made some changes to the input of MOSDEPTH. I added some join statements to join my BAM, BAI, and FASTA channels based on the meta tag. That was able to resolve my issue! I had falsely assumed that Subworkflows execute all the process for only one input channel. So joining the channels on the meta tag ensures that all the information for each sample stays together prior to executing a new process.

system · July 9, 2024, 1:06pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
.bai files not found in follow-up process Ask for help nextflow	2	25	June 4, 2025
How To emit nextflow subworkflow outputs ?! Ask for help	6	192	April 4, 2024
Fixing race condition when channel items altered by later subworkflow are seen by earlier one Ask for help	4	35	August 2, 2024
Nextflow can't file file from combined Channel when running on running awsbatch executor Ask for help nextflow , aws	2	27	September 25, 2024
Nextflow only processes one of my paired-end samples Ask for help	3	323	February 8, 2024

Understanding Subworkflow logic

Related topics