How to skip specific/failed samples on next `-resume`

Processes fail for different reasons. Sometimes the input is just not viable, and you want to skip that sample when you -resume. By default, nextflow will try those samples again if the input doesn’t change. This can potentially waste a lot of resources. One way to circumvent this is to use an input file to select samples, and write out a similar file of only samples to keep. That can then be used as input on the next resume.

A toy example:

workflow {
    ch_in = Channel.fromPath(params.inputs, checkIfExists: true)
        .splitCsv(header: true)
        .map{ row ->
            def fname = file(row.file, checkIfExists: true)
            tuple([id: fname.name], fname) }
    TASK(ch_in)
    ch_kept = ch_in.join(TASK.out.passed)
        .collect { meta, orig_path, task_path -> orig_path }
    FILTER_INPUT( ch_kept )
}

process TASK {
    input:
    tuple val(meta), path(txt)

    script:
    """
    grep "PASS" $txt || exit 1
    """

    output:
    tuple val(meta), path(txt), emit: passed
}

process FILTER_INPUT {
    publishDir "${params.outdir}/resume/${workflow.runName}"

    input:
    val to_resume

    exec:
    file("${task.workDir}/inputs_to_resume.csv").text = (["file"] + to_resume).join("\n")

    output:
    path "inputs_to_resume.csv"
}

nextflow.config:

params.inputs = 'inputs.csv'
params.outdir = 'results'
process {
    withName: 'TASK' {
        errorStrategy = 'ignore'
    }
}
resume = true

inputs.csv:

file
sample1.txt
sample2.txt
sample3.txt

sample{1,3}.txt:

PASS

sample2.txt:

FAIL

terminal:

$ nextflow run main.nf

 N E X T F L O W   ~  version 24.10.5

Launching `main.nf` [maniac_kalman] DSL2 - revision: 5e373e9cf3

executor >  local (1)
[6b/c6128e] TASK (2)     [100%] 3 of 3, cached: 2, failed: 1 ✔
[72/75e7f3] FILTER_INPUT [100%] 1 of 1, cached: 1 ✔
[6b/c6128e] NOTE: Process `TASK (2)` terminated with an error exit status (1) -- Error is ignored

$ nextflow run main.nf --inputs results/resume/maniac_kalman/inputs_to_resume.csv

 N E X T F L O W   ~  version 24.10.5

Launching `main.nf` [compassionate_brown] DSL2 - revision: 5e373e9cf3

[58/dbf2df] TASK (1)     [100%] 2 of 2, cached: 2 ✔
[72/75e7f3] FILTER_INPUT [100%] 1 of 1, cached: 1 ✔
1 Like

This highlights a nice feature of Nextflow’s resume, which is that only task executions are cached, not the workflow logic.

You might think that the entire run would be re-executed because the input files have changed. But in fact the workflow logic is always re-executed, because it is cheap to do so. As long as the workflow logic produces the same task executions from the previous run, they will be resumed. This gives you a great deal of flexibility to re-organize your workflow logic without breaking your cache.

2 Likes