The default option for Nextflow is to start a workflow from the beginning. When a few processes have failed, you often want instead to resume from the last successfully completed processes. This can be done by using -resume on the command line. An alternative is placing the following statement in your nextflow.config, which enables the -resume behaviour by default.
resume = true
However there are times when you only want this behaviour for certain processes. Here you can use Nextflow’s powerful selector expressions to disable caching for certain processes, for example by negating a match to a process name, or using wildcards or grouping to select a subset of processes to disable caching.
resume = true
process {
withName: '!TASK' {
cache = false // Only -resume for process TASK
}
}
or:
resume = true
process {
withName: '.*:SUBWORKFLOW:(TASK1|TASK2)' {
cache = false // TASK1 and TASK2 in SUBWORKFLOW should always restart
}
}
I encountered a scenario along these lines where I had a pipeline with some long initial steps that ran through to completion. There was a problem with the output of a process in the middle of the pipeline. It didn’t cause the pipeline to crash but the pipeline still needed to be rerun from that point, and because the first few steps were quite long, I wanted to avoid rerunning the entire pipeline if possible.
Before I found your post, I used a hack-y way of restarting the pipeline from the desired process by changing the script section of the process slightly (e.g. by adding an echo statement).
That said, it would be nice if Nextflow had a CLI parameter one could use to restart the pipeline from a desired step. Something like:
nextflow run main.nf -resume TASK
or
nextflow run main.nf -resume -resume_from TASK
Where TASK is the process you want to restart from.
There is a mechanism for this - the cache directive! Given workflow
workflow {
Channel.of("Rob", "Charles")
| One
| Two
}
process One {
input: val(i)
output: val(i)
script: true
}
process Two {
input: val(i)
output: val(i)
script: true
}
You can resume a run and force recalculation of tasks from the Two process onwards by adding a couple of lines of configuration:
process {
withName: 'Two' {
cache = 'false'
}
}
Example:
$ nextflow run . -resume
N E X T F L O W ~ version 25.07.0-edge
Launching `./main.nf` [sleepy_montalcini] DSL2 - revision: e0daab3f3e
executor > local (2)
[ea/82d69e] process > One (2) [100%] 2 of 2, cached: 2 ✔
[79/614c85] process > Two (2) [100%] 2 of 2 ✔