Thanks.
Wouldn’t it be a useful feature?
I have been very enthusiastic about Nextflow since I started 3 months ago, however at the moment I have one big problem with it, which is how to deal with the caching mechanism. I end up involuntarily restarting many time-consuming tasks because of this.
More precisely:
- I think a less strict criterion for resuming would be useful. Possible solutions:
- a configuration option to loosen it, like
cache = "filenames+timestamp"
- a way to prevent relaunching processes with a specific name.
- a configuration option to loosen it, like
- In the absence of the above point, a way to predict caching/resuming without executing any process. The
-preview
option seems like the most natural option to enhance with this.
Based on my recent experience, examples of modifications to processes that prevented resuming:
-
removing one of the files from the output. E.g. from:
output: tuple path('out.ext1'), path('out.ext2')
to:
output: path('out.ext1')
-
adding input/output
val
that don’t impact the executed command, for example to pass metadata. -
adding line returns, other whitespaces, or reorder arguments in a way should not change the result of the script block. This might be impossible to detect programmatically which is why a looser criterion for resuming would be nice.
-
Many ones where I don’t have an explanation, for example I currently have this independent block in a workflow which got entirely restarted (222 long-running processes):
Channel.fromPath('data/raw/*.bam') | bam_coverage
I cannot tell if it’s because I changed the
maxForks
directive of the process, or if it’s because I updated one ofnextflow.config
or the other config and param files I set up. I don’t remember changing anything more closely related to this process but I might be wrong.
I’d like to know what you and other developers at Nextflow think. Actually I would imagine a similar request has already been made.