How to keep the pipeline going if error occurs at any step for any sample?

complexgenome · February 11, 2024, 2:15am

Hi there,

I’m processing datasets from WES and RNA sequencing. Each workflow comprises multiple steps.

I’d like to keep the pipeline for other samples going if any few samples fail.
How do I keep track of samples/ids that have failed or/and at which step?

How do I achieve it?

For e.g., I’ve fastp step that removes adapters. For one of the samples there’s an issue, where fastp wouldn’t run on it. I’d like to have the pipeline going for the remaining fastq/samples.

process FASTP {
	conda '/data1/software/miniconda/envs/MMRADAR/'
	maxForks 5
	debug true
	errorStrategy 'retry'
    maxRetries 2
 label 'low_mem'

	publishDir path: "${params.outdir}/${batch}/${timepoint}/WES/primary/fastp/normal/", mode: 'copy', pattern: '*_N*'
    publishDir path: "${params.outdir}/${batch}/${timepoint}/WES/primary/fastp/tumor/", mode: 'copy', pattern: '*_T*'

    input:	
    tuple val(batch),val(timepoint),val(tissue),val(seq_type),path(tumor_read1,stageAs:'fastp_reads/*'),path(tumor_read2,stageAs:'fastp_reads/*')
	tuple val(batch),val(timepoint),val(tissue),val(seq_type),path(normal_read1,stageAs:'fastp_reads/*'),path(normal_read2,stageAs:'fastp_reads/*')

	output:
	tuple val(batch),val(patient_id_tumor), val(timepoint), path("${patient_id_tumor}_trim_{1,2}.fq.gz"), emit: reads_tumor
	path("${patient_id_tumor}.fastp.json"), emit: json_tumor
	path("${patient_id_tumor}.fastp.html"), emit: html_tumor
	
	tuple val(batch),val(patient_id_normal), val(timepoint),path("${patient_id_normal}_trim_{1,2}.fq.gz"), emit: reads_normal
	path("${patient_id_normal}.fastp.json"), emit: json_normal
	path("${patient_id_normal}.fastp.html"), emit: html_normal
	
    script:
	patient_id_normal=timepoint+"_N"
	patient_id_tumor=timepoint+"_T"
	//def(r1_normal, r2_normal)=normal_reads
	//def(r1_tumor,r2_tumor)=tumor_reads

    """

echo "starting with fastp"

fastp  --in1 "${tumor_read1}" --in2 "${tumor_read2}" -q 20  -u 20 -l 40 --detect_adapter_for_pe --out1 "${patient_id_tumor}_trim_1.fq.gz" \
--out2 "${patient_id_tumor}_trim_2.fq.gz" --json "${patient_id_tumor}.fastp.json" \
--html "${patient_id_tumor}.fastp.html" --thread 10

fastp  --in1 "${normal_read1}" --in2 "${normal_read2}" -q 20  -u 20 -l 40 --detect_adapter_for_pe --out1 "${patient_id_normal}_trim_1.fq.gz" \
--out2 "${patient_id_normal}_trim_2.fq.gz" --json "${patient_id_normal}.fastp.json" \
--html "${patient_id_normal}.fastp.html" --thread 10 

   """
}

workflow.onComplete { 
	log.info ( workflow.success ? "completed fastp primary WES!" : "Oops .. something went wrong in fastp primary WES")
}

In the code above the tumor sample fails. I’d like to exit the fastp or/and continue normal data, and carry on with samples that went fine. The failed samples should be excluded from eventual steps.

Let me know if any other information is required.

mribeirodantas · February 11, 2024, 2:35am

Nextflow provides a process directive called errorStrategy. You can read about it in the official documentation here.

The specific strategy that helps you in the situation you described is ignore. If tasks from a process with errorStrategy set to ignore fail, the pipeline will continue. You will still be informed that tasks failed, but the pipeline won’t be stopped because of it. See example below:

process DO_SOMETHING {
  errorStrategy 'ignore'

  input:
  path ifile

  output:
  path 'output_file'

  script:
  """
  do something
  """
}

Whatever process you have, simply add the errorStrategy 'ignore' at the beginning of the process, or in a config file, just like any other process directive.

You may also want to check the output of ignored tasks. I wrote a snippet about it here.

system · April 2, 2024, 5:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to have process continue even if one fails Ask for help	3	32	May 9, 2025
Handling single sample failures Ask for help	2	264	April 4, 2024
How to stop/exit workflow? Ask for help	2	358	January 18, 2024
How to skip specific/failed samples on next `-resume` Tips & Tricks nextflow	2	102	April 16, 2025
Caching doesn't work always \|\| already processed data fails Ask for help	1	208	February 14, 2024

How to keep the pipeline going if error occurs at any step for any sample?

Related topics