How to keep the pipeline going if error occurs at any step for any sample?

Hi there,

I’m processing datasets from WES and RNA sequencing. Each workflow comprises multiple steps.

I’d like to keep the pipeline for other samples going if any few samples fail.
How do I keep track of samples/ids that have failed or/and at which step?

How do I achieve it?

For e.g., I’ve fastp step that removes adapters. For one of the samples there’s an issue, where fastp wouldn’t run on it. I’d like to have the pipeline going for the remaining fastq/samples.

process FASTP {
	conda '/data1/software/miniconda/envs/MMRADAR/'
	maxForks 5
	debug true
	errorStrategy 'retry'
    maxRetries 2
 label 'low_mem'

	publishDir path: "${params.outdir}/${batch}/${timepoint}/WES/primary/fastp/normal/", mode: 'copy', pattern: '*_N*'
    publishDir path: "${params.outdir}/${batch}/${timepoint}/WES/primary/fastp/tumor/", mode: 'copy', pattern: '*_T*'

    input:	
    tuple val(batch),val(timepoint),val(tissue),val(seq_type),path(tumor_read1,stageAs:'fastp_reads/*'),path(tumor_read2,stageAs:'fastp_reads/*')
	tuple val(batch),val(timepoint),val(tissue),val(seq_type),path(normal_read1,stageAs:'fastp_reads/*'),path(normal_read2,stageAs:'fastp_reads/*')

	output:
	tuple val(batch),val(patient_id_tumor), val(timepoint), path("${patient_id_tumor}_trim_{1,2}.fq.gz"), emit: reads_tumor
	path("${patient_id_tumor}.fastp.json"), emit: json_tumor
	path("${patient_id_tumor}.fastp.html"), emit: html_tumor
	
	tuple val(batch),val(patient_id_normal), val(timepoint),path("${patient_id_normal}_trim_{1,2}.fq.gz"), emit: reads_normal
	path("${patient_id_normal}.fastp.json"), emit: json_normal
	path("${patient_id_normal}.fastp.html"), emit: html_normal
	
    script:
	patient_id_normal=timepoint+"_N"
	patient_id_tumor=timepoint+"_T"
	//def(r1_normal, r2_normal)=normal_reads
	//def(r1_tumor,r2_tumor)=tumor_reads

    """

echo "starting with fastp"

fastp  --in1 "${tumor_read1}" --in2 "${tumor_read2}" -q 20  -u 20 -l 40 --detect_adapter_for_pe --out1 "${patient_id_tumor}_trim_1.fq.gz" \
--out2 "${patient_id_tumor}_trim_2.fq.gz" --json "${patient_id_tumor}.fastp.json" \
--html "${patient_id_tumor}.fastp.html" --thread 10

fastp  --in1 "${normal_read1}" --in2 "${normal_read2}" -q 20  -u 20 -l 40 --detect_adapter_for_pe --out1 "${patient_id_normal}_trim_1.fq.gz" \
--out2 "${patient_id_normal}_trim_2.fq.gz" --json "${patient_id_normal}.fastp.json" \
--html "${patient_id_normal}.fastp.html" --thread 10 

   """
}

workflow.onComplete { 
	log.info ( workflow.success ? "completed fastp primary WES!" : "Oops .. something went wrong in fastp primary WES")
}

In the code above the tumor sample fails. I’d like to exit the fastp or/and continue normal data, and carry on with samples that went fine. The failed samples should be excluded from eventual steps.

Let me know if any other information is required.

Nextflow provides a process directive called errorStrategy. You can read about it in the official documentation here.

The specific strategy that helps you in the situation you described is ignore. If tasks from a process with errorStrategy set to ignore fail, the pipeline will continue. You will still be informed that tasks failed, but the pipeline won’t be stopped because of it. See example below:

process DO_SOMETHING {
  errorStrategy 'ignore'

  input:
  path ifile

  output:
  path 'output_file'

  script:
  """
  do something
  """
}

Whatever process you have, simply add the errorStrategy 'ignore' at the beginning of the process, or in a config file, just like any other process directive.

You may also want to check the output of ignored tasks. I wrote a snippet about it here.