Hi there,
I’m processing datasets from WES and RNA sequencing. Each workflow comprises multiple steps.
I’d like to keep the pipeline for other samples going if any few samples fail.
How do I keep track of samples/ids that have failed or/and at which step?
How do I achieve it?
For e.g., I’ve fastp step that removes adapters. For one of the samples there’s an issue, where fastp wouldn’t run on it. I’d like to have the pipeline going for the remaining fastq/samples.
process FASTP {
conda '/data1/software/miniconda/envs/MMRADAR/'
maxForks 5
debug true
errorStrategy 'retry'
maxRetries 2
label 'low_mem'
publishDir path: "${params.outdir}/${batch}/${timepoint}/WES/primary/fastp/normal/", mode: 'copy', pattern: '*_N*'
publishDir path: "${params.outdir}/${batch}/${timepoint}/WES/primary/fastp/tumor/", mode: 'copy', pattern: '*_T*'
input:
tuple val(batch),val(timepoint),val(tissue),val(seq_type),path(tumor_read1,stageAs:'fastp_reads/*'),path(tumor_read2,stageAs:'fastp_reads/*')
tuple val(batch),val(timepoint),val(tissue),val(seq_type),path(normal_read1,stageAs:'fastp_reads/*'),path(normal_read2,stageAs:'fastp_reads/*')
output:
tuple val(batch),val(patient_id_tumor), val(timepoint), path("${patient_id_tumor}_trim_{1,2}.fq.gz"), emit: reads_tumor
path("${patient_id_tumor}.fastp.json"), emit: json_tumor
path("${patient_id_tumor}.fastp.html"), emit: html_tumor
tuple val(batch),val(patient_id_normal), val(timepoint),path("${patient_id_normal}_trim_{1,2}.fq.gz"), emit: reads_normal
path("${patient_id_normal}.fastp.json"), emit: json_normal
path("${patient_id_normal}.fastp.html"), emit: html_normal
script:
patient_id_normal=timepoint+"_N"
patient_id_tumor=timepoint+"_T"
//def(r1_normal, r2_normal)=normal_reads
//def(r1_tumor,r2_tumor)=tumor_reads
"""
echo "starting with fastp"
fastp --in1 "${tumor_read1}" --in2 "${tumor_read2}" -q 20 -u 20 -l 40 --detect_adapter_for_pe --out1 "${patient_id_tumor}_trim_1.fq.gz" \
--out2 "${patient_id_tumor}_trim_2.fq.gz" --json "${patient_id_tumor}.fastp.json" \
--html "${patient_id_tumor}.fastp.html" --thread 10
fastp --in1 "${normal_read1}" --in2 "${normal_read2}" -q 20 -u 20 -l 40 --detect_adapter_for_pe --out1 "${patient_id_normal}_trim_1.fq.gz" \
--out2 "${patient_id_normal}_trim_2.fq.gz" --json "${patient_id_normal}.fastp.json" \
--html "${patient_id_normal}.fastp.html" --thread 10
"""
}
workflow.onComplete {
log.info ( workflow.success ? "completed fastp primary WES!" : "Oops .. something went wrong in fastp primary WES")
}
In the code above the tumor sample fails. I’d like to exit the fastp or/and continue normal data, and carry on with samples that went fine. The failed samples should be excluded from eventual steps.
Let me know if any other information is required.