Best practices for error handling in Nextflow

Hello everyone!

Could you please tell me any best practices for writing Nextflow code to flexibly configure pipeline error handling?

If the samples are OK and the pipeline runs without failures, then the scenarios from Nextflow Training work perfectly. However, I’ve found virtually no examples of how to competently and flexibly handle process errors.

For example, in my pipeline, I created a module for checking md5sum for raw reads:

process MD5_CHECKSUM {

    conda "${CONDA_PREFIX_1}/envs/multiqc"

    tag "Md5sum on ${sample_id}"

    publishDir "${params.outdir}/md5sum/${sample_id}", mode: "copy"

    cpus 1

    maxForks 20

    input:

    tuple val(sample_id), path(reads)

    path md5sum_txt

    output:

    tuple val(sample_id), path(reads), emit: sample_id__reads

    path("${sample_id}.md5sum.ok"), emit: md5_ok

    errorStrategy 'ignore'

    script:

    """
    set -euo pipefail
    r1="${reads[0]}"
    r2="${reads[1]}"
    b1=\$(basename "\$r1")
    b2=\$(basename "\$r2")

    md5_1=\$(grep -E "([[:space:]]|\\\\*)\${b1}\$" "${md5sum_txt}" | awk '{print \$1}' | head -n1)
    md5_2=\$(grep -E "([[:space:]]|\\\\*)\${b2}\$" "${md5sum_txt}" | awk '{print \$1}' | head -n1)

    if [[ -z "\$md5_1" || -z "\$md5_2" ]]; then
      echo "ERROR: md5 not found for \$b1 or \$b2 in ${md5sum_txt}" >&2
      echo "Matches for b1/b2 in md5 file:" >&2
      grep -n -F "\$b1" "${md5sum_txt}" >&2 || true
      grep -n -F "\$b2" "${md5sum_txt}" >&2 || true
      exit 2
    fi
    printf "%s  %s\\n" "\$md5_1" "\$r1" >  check.md5
    printf "%s  %s\\n" "\$md5_2" "\$r2" >> check.md5
    md5sum -c check.md5 | tee md5sum_for_pair.log
    touch "${sample_id}.md5sum.ok"
    """
}

Currently, this is implemented using errorStrategy: ignore: if the md5 doesn’t match, the pipeline element is simply not generated, and only valid files are passed to subsequent steps (for example, FastQC). This generally works, but is this approach considered correct?

Perhaps it would be better to intercept the exit status within the process itself (so that the final exit code is always 0), pass the check status as a flag to the output, and filter valid samples before the next steps at the workflow level? And if we go this route, what’s the best way to organize the collection of general statistics on problematic samples?

I’d be grateful for any advice or practical examples!

Hi @magletdinov , good question.

We are discussing this problem in this issue. The ignore strategy is a common approach right now but it has some limitations

I think the better approach is to wrap certain error conditions in the process script with exit 0 so that the task “succeeds”, but with different outputs indicating some failure condition. In the issue I distinguish between “domain errors” and “execution errors” – the former you might want to handle in your pipeline logic (e.g. produce a report of “failed” samples), whereas the latter might be transient so it makes more sense to retry or fail the pipeline

Right now the problem is that the process output syntax is not always flexible enough to accommodate this approach, although you might be able to make it work in your particular case

1 Like