Hello,
I’ve just started in Nextflow programming and I don’t understand why in the “bwa_map” step it only processes one paired-end sample, whereas in the “fastqc_trimmed” step it processes them all.
params.reads="/path/to/data/*_{1,2}_subsample.fastq"
params.outdir = "/path/to/results"
params.datadir = "/path/to/data"
params.genome = "/path/to/data/genome.fasta"
reads_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
genome_ch = Channel.fromPath(params.genome, checkIfExists: true )
process fastqc_raw {
tag "FASTQC on $sample_id"
conda 'bioconda::fastqc=0.11.9'
publishDir "$params.outdir/Raw_data_total/fastQC/$sample_id"
input:
tuple val(sample_id), path(reads)
output:
file("*.{html,zip}")
script:
"""
fastqc -t 4 ${reads}
"""
}
process trim_raw {
tag "trimmomatic on $sample_id"
conda 'bioconda::trimmomatic=0.39'
publishDir "$params.outdir/Cleaned_data/$sample_id"
input:
tuple val(sample_id), path(reads)
output:
tuple val(sample_id), file("*.trimmed_Q20.fastq.gz")
script:
"""
trimmomatic PE -phred33 -threads 4 ${reads[0]} ${reads[1]} ${sample_id}_1.trimmed_Q20.fastq.gz output_forward_unpaired.fq.gz ${sample_id}_2.trimmed_Q20.fastq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:/trinity/shared/apps/Trimmomatic-0.39/adapters/Total_adapters.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:75
"""
}
process fastqc_trimmed {
tag "FASTQC on trimmed $sample_id"
conda 'bioconda::fastqc=0.11.9'
publishDir "$params.outdir/Cleaned_data/$sample_id"
input:
tuple val(sample_id), path(reads)
output:
file("*.{html,zip}")
script:
"""
fastqc -t 4 ${reads}
"""
}
process bwa_build_bwt {
tag "bwa-mem2 indexing on ${genome}"
conda 'bioconda::bwa-mem2=2.2.1'
publishDir "$params.datadir"
input:
val genome
output:
file("*.0123")
file("*.bwt.2bit.64")
file("*.amb")
file("*.ann")
file("*.pac")
script:
def prefix = task.ext.prefix ?: "${genome.baseName}"
"""
bwa-mem2 index -p ${prefix}.fasta ${genome}
"""
}
process samtools_faidx {
tag "samtools faidx on ${genome}"
conda 'bioconda::samtools=1.15.1'
publishDir "$params.datadir"
input:
val genome
output:
file("*.fai")
script:
def prefix = task.ext.prefix ?: "${genome.baseName}"
"""
samtools faidx --fai-idx ${prefix}.fasta.fai ${genome}
"""
}
process bwa_map {
tag "BWA map $sample_id"
conda 'bioconda::bwa-mem2=2.2.1 bioconda::samtools=1.15.1'
publishDir "$params.outdir/Mapped_bam/"
input:
tuple val(sample_id), path(reads)
val bwa_build_bwt
val samtools_faidx
val genome
output:
tuple val(sample_id), file("*_BWA.bam")
script:
rg="@RG\\tID:${sample_id}\\tPL:ILLUMINA\\tSM:${sample_id}"
"""
bwa-mem2 mem -t 4 -R '${rg}' ${genome} ${reads[0]} ${reads[1]} | samtools view -Sb - > ${sample_id}_BWA.bam
"""
}
workflow {
fastqc_raw(reads_ch)
trim_raw(reads_ch)
fastqc_trimmed(trim_raw.out)
bwa_build_bwt(genome_ch)
samtools_faidx(genome_ch)
bwa_map(trim_raw.out, bwa_build_bwt.out[0], samtools_faidx.out, genome_ch)
}
Here are the inputs of bwa_map process :
genome_ch : /path/to/data/genome.fasta
bwa_build_bwt.out[0] : /path/to/genome.fasta.0123
samtools_faidx.out : /path/to/genome.fasta.fai
trim_raw.out :
[ERR8014964, [/path/to/ERR8014964_1.trimmed_Q20.fastq.gz, /path/to/ERR8014964_2.trimmed_Q20.fastq.gz]]
[ERR8014965, [/path/to/ERR8014965_1.trimmed_Q20.fastq.gz, /path/to/ERR8014965_2.trimmed_Q20.fastq.gz]]
There are two input samples but only one is analysed, why? From the documentation I can read that : “When two or more channels are declared as process inputs, the process waits until there is a complete input configuration, i.e. until it receives a value from each input channel. When this condition is satisfied, the process consumes a value from each channel and launches a new task, repeating this logic until one or more channels are empty.” So I have to create a genome channel with many fasta files as fastq paired files but I don’t know how to do that …