Processes fail for different reasons. Sometimes the input is just not viable, and you want to skip that sample when you -resume
. By default, nextflow will try those samples again if the input doesn’t change. This can potentially waste a lot of resources. One way to circumvent this is to use an input file to select samples, and write out a similar file of only samples to keep. That can then be used as input on the next resume.
A toy example:
workflow {
ch_in = Channel.fromPath(params.inputs, checkIfExists: true)
.splitCsv(header: true)
.map{ row ->
def fname = file(row.file, checkIfExists: true)
tuple([id: fname.name], fname) }
TASK(ch_in)
ch_kept = ch_in.join(TASK.out.passed)
.collect { meta, orig_path, task_path -> orig_path }
FILTER_INPUT( ch_kept )
}
process TASK {
input:
tuple val(meta), path(txt)
script:
"""
grep "PASS" $txt || exit 1
"""
output:
tuple val(meta), path(txt), emit: passed
}
process FILTER_INPUT {
publishDir "${params.outdir}/resume/${workflow.runName}"
input:
val to_resume
exec:
file("${task.workDir}/inputs_to_resume.csv").text = (["file"] + to_resume).join("\n")
output:
path "inputs_to_resume.csv"
}
nextflow.config
:
params.inputs = 'inputs.csv'
params.outdir = 'results'
process {
withName: 'TASK' {
errorStrategy = 'ignore'
}
}
resume = true
inputs.csv
:
file
sample1.txt
sample2.txt
sample3.txt
sample{1,3}.txt
:
PASS
sample2.txt
:
FAIL
terminal
:
$ nextflow run main.nf
N E X T F L O W ~ version 24.10.5
Launching `main.nf` [maniac_kalman] DSL2 - revision: 5e373e9cf3
executor > local (1)
[6b/c6128e] TASK (2) [100%] 3 of 3, cached: 2, failed: 1 ✔
[72/75e7f3] FILTER_INPUT [100%] 1 of 1, cached: 1 ✔
[6b/c6128e] NOTE: Process `TASK (2)` terminated with an error exit status (1) -- Error is ignored
$ nextflow run main.nf --inputs results/resume/maniac_kalman/inputs_to_resume.csv
N E X T F L O W ~ version 24.10.5
Launching `main.nf` [compassionate_brown] DSL2 - revision: 5e373e9cf3
[58/dbf2df] TASK (1) [100%] 2 of 2, cached: 2 ✔
[72/75e7f3] FILTER_INPUT [100%] 1 of 1, cached: 1 ✔