Understanding Subworkflow logic

Hello,
I have a Nextflow subworkflow that performs Minimap2 Align → Bam_Sort_Stats_Samtools → Mosdepth. The input to this subwork flow is a Channel with the structure:
ch_sample // channel: [tuple val(meta), path(fastqs), path(scaffolds), path(fasta)] .
Upon using ‘–resume’ when running my pipeline that calls the subwork flow. I get inconsistent errors with the Mosdepth step. Its about a 50% chance that it runs without errors but in cases where an error occurs, I can see that the inputted Bam and Bai file do not correspond to the same sample. I’m trying to figure out how this could be possible since only one samples information should be processed at a time within the subworkflow based on the singular channel input? Am I missing something?
Subworkflow:

workflow ALIGNMENT_PER_REF {

    take:
    ch_sample // channel: [tuple val(meta), path(fastqs), path(scaffolds), path(fasta)]

    main:

    ch_versions = Channel.empty()

    //
    // MODULE: Minimap2 for aligning to Hepatitis C
    //
    MINIMAP2_ALIGN (
        ch_sample,
        true,
        'bai',
        false,
        false
    )
    ch_versions = MINIMAP2_ALIGN.out.versions

    //
    // BAM_SORT_STATS_SAMTOOLS: get stats from the alignment
    //
    BAM_SORT_STATS_SAMTOOLS(
        MINIMAP2_ALIGN.out.bam,
    )
    ch_versions = BAM_SORT_STATS_SAMTOOLS.out.versions

    //
    // MOSDEPTH: Calculate genome wide sequencing coverage
    //
    MOSDEPTH(
        BAM_SORT_STATS_SAMTOOLS.out.bam,
        BAM_SORT_STATS_SAMTOOLS.out.bai,
        ch_sample.map{
            meta, fastqs, scaffolds, fasta -> [fasta]
        },
        [],
        // BAM_SORT_STATS_SAMTOOLS.out.fasta
    )

    ch_versions = MOSDEPTH.out.versions

    emit:
    // TODO nf-core: edit emitted channels

    global_txt      = MOSDEPTH.out.global_txt
    summary_txt     = MOSDEPTH.out.summary_txt
    regions_txt     = MOSDEPTH.out.regions_txt
    bam         = BAM_SORT_STATS_SAMTOOLS.out.bam   // channel: [ val(meta), [ bam ] ]
    bai         = BAM_SORT_STATS_SAMTOOLS.out.bai   // channel: [ val(meta), [ bai ] ]


    versions = ch_versions                     // channel: [ versions.yml ]
}


I assume you are using nf-core subworkflows and modules here? What version of the MOSDEPTH module are you using, because your inputs don’t correspond at all to the current input specification of the module? Anyways, your problem has nothing to do with resume. You are wrongly assuming that the samples in the bam and bai channels from the subworkflow are in the same order. They are the outputs from two different processes (SAMTOOLS_SORT and SAMTOOLS_INDEX) and the order of samples in these channels is pretty much arbitrary. For that reason, I prefer to always keep files and their indices paired up in channels, i.e. [ val(meta), path(bam), path(bai) ] and emit a single bam_bai channel instead of two separate channels.

Yes I had made some changes to the input of MOSDEPTH. I added some join statements to join my BAM, BAI, and FASTA channels based on the meta tag. That was able to resolve my issue! I had falsely assumed that Subworkflows execute all the process for only one input channel. So joining the channels on the meta tag ensures that all the information for each sample stays together prior to executing a new process.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.