Nextflow only processes one of my paired-end samples

Hello,
I’ve just started in Nextflow programming and I don’t understand why in the “bwa_map” step it only processes one paired-end sample, whereas in the “fastqc_trimmed” step it processes them all.

params.reads="/path/to/data/*_{1,2}_subsample.fastq"
    params.outdir = "/path/to/results"
    params.datadir = "/path/to/data"
    params.genome = "/path/to/data/genome.fasta"

    reads_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
   genome_ch = Channel.fromPath(params.genome, checkIfExists: true )

   process fastqc_raw {

       tag "FASTQC on $sample_id"
       conda 'bioconda::fastqc=0.11.9'
       publishDir "$params.outdir/Raw_data_total/fastQC/$sample_id"

       input:
           tuple val(sample_id), path(reads)

       output:
         file("*.{html,zip}")

       script:
       """
       fastqc -t 4 ${reads}
       """
   }

   process trim_raw {

       tag "trimmomatic on $sample_id"
       conda 'bioconda::trimmomatic=0.39'
       publishDir "$params.outdir/Cleaned_data/$sample_id"

       input:
           tuple val(sample_id), path(reads)

       output:
           tuple val(sample_id), file("*.trimmed_Q20.fastq.gz")

       script:
       """
       trimmomatic PE -phred33 -threads 4 ${reads[0]} ${reads[1]} ${sample_id}_1.trimmed_Q20.fastq.gz output_forward_unpaired.fq.gz ${sample_id}_2.trimmed_Q20.fastq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:/trinity/shared/apps/Trimmomatic-0.39/adapters/Total_adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:75
       """
   }

process fastqc_trimmed {

    tag "FASTQC on trimmed $sample_id"
    conda 'bioconda::fastqc=0.11.9'
    publishDir "$params.outdir/Cleaned_data/$sample_id"

    input:
        tuple val(sample_id), path(reads)

    output:
      file("*.{html,zip}")

    script:
    """
    fastqc -t 4 ${reads}
    """
}

process bwa_build_bwt {

    tag "bwa-mem2 indexing on ${genome}"
    conda 'bioconda::bwa-mem2=2.2.1'
    publishDir "$params.datadir"

    input:
        val genome

    output:
      file("*.0123")
      file("*.bwt.2bit.64")
      file("*.amb")
      file("*.ann")
      file("*.pac")

    script:
    def prefix = task.ext.prefix ?: "${genome.baseName}"
    """
    bwa-mem2 index -p ${prefix}.fasta ${genome}
    """

}

process samtools_faidx {

    tag "samtools faidx on ${genome}"
    conda 'bioconda::samtools=1.15.1'
    publishDir "$params.datadir"

    input:
        val genome

    output:
      file("*.fai")

    script:
    def prefix = task.ext.prefix ?: "${genome.baseName}"
    """
    samtools faidx --fai-idx ${prefix}.fasta.fai ${genome}
    """

}

process bwa_map {

    tag "BWA map $sample_id"
    conda 'bioconda::bwa-mem2=2.2.1 bioconda::samtools=1.15.1'
    publishDir "$params.outdir/Mapped_bam/"

    input:
        tuple val(sample_id), path(reads)
        val bwa_build_bwt
        val samtools_faidx
        val genome

    output:
      tuple val(sample_id), file("*_BWA.bam")

    script:
    rg="@RG\\tID:${sample_id}\\tPL:ILLUMINA\\tSM:${sample_id}"
    """
    bwa-mem2 mem -t 4 -R '${rg}' ${genome} ${reads[0]} ${reads[1]} | samtools view -Sb - > ${sample_id}_BWA.bam
    """
}

workflow {
    fastqc_raw(reads_ch)
    trim_raw(reads_ch)
    fastqc_trimmed(trim_raw.out)
    bwa_build_bwt(genome_ch)
    samtools_faidx(genome_ch)
    bwa_map(trim_raw.out, bwa_build_bwt.out[0], samtools_faidx.out, genome_ch)

}

Here are the inputs of bwa_map process :

genome_ch : /path/to/data/genome.fasta

bwa_build_bwt.out[0] : /path/to/genome.fasta.0123

samtools_faidx.out : /path/to/genome.fasta.fai

trim_raw.out :
[ERR8014964, [/path/to/ERR8014964_1.trimmed_Q20.fastq.gz, /path/to/ERR8014964_2.trimmed_Q20.fastq.gz]]
[ERR8014965, [/path/to/ERR8014965_1.trimmed_Q20.fastq.gz, /path/to/ERR8014965_2.trimmed_Q20.fastq.gz]]

There are two input samples but only one is analysed, why? From the documentation I can read that : “When two or more channels are declared as process inputs, the process waits until there is a complete input configuration, i.e. until it receives a value from each input channel. When this condition is satisfied, the process consumes a value from each channel and launches a new task, repeating this logic until one or more channels are empty.” So I have to create a genome channel with many fasta files as fastq paired files but I don’t know how to do that …

If you have a channel with a single element and you want a process to consume this same element multiple times, you must convert it to a value channel. More info on value channels here.

In your case, you can easily convert the genome_ch queue channel to a value channel adding a channel operator that returns a single value, such as first in:

  Channel
    .fromPath(params.genome,
              checkIfExists: true)
    .first()
    .set { genome_ch }

Or providing it to the process as a simple string. This way, Nextflow will automatically convert it to a value channel, such as in the example below:

...
    bwa_map(trim_raw.out,
            bwa_build_bwt.out[0],
            samtools_faidx.out,
            params.genome)
...

It works with :

  Channel
    .fromPath(params.genome,
              checkIfExists: true)
    .first()
    .set { genome_ch }

Thank you very much !

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.