Show cached tasks in nextflow run preview

Thanks.

Wouldn’t it be a useful feature?

I have been very enthusiastic about Nextflow since I started 3 months ago, however at the moment I have one big problem with it, which is how to deal with the caching mechanism. I end up involuntarily restarting many time-consuming tasks because of this.

More precisely:

  1. I think a less strict criterion for resuming would be useful. Possible solutions:
    • a configuration option to loosen it, like cache = "filenames+timestamp"
    • a way to prevent relaunching processes with a specific name.
  2. In the absence of the above point, a way to predict caching/resuming without executing any process. The -preview option seems like the most natural option to enhance with this.

Based on my recent experience, examples of modifications to processes that prevented resuming:

  • removing one of the files from the output. E.g. from:

    output:
    tuple path('out.ext1'), path('out.ext2')
    

    to:

    output:
    path('out.ext1')
    
  • adding input/output val that don’t impact the executed command, for example to pass metadata.

  • adding line returns, other whitespaces, or reorder arguments in a way should not change the result of the script block. This might be impossible to detect programmatically which is why a looser criterion for resuming would be nice.

  • Many ones where I don’t have an explanation, for example I currently have this independent block in a workflow which got entirely restarted (222 long-running processes):

    Channel.fromPath('data/raw/*.bam') | bam_coverage
    

    I cannot tell if it’s because I changed the maxForks directive of the process, or if it’s because I updated one of nextflow.config or the other config and param files I set up. I don’t remember changing anything more closely related to this process but I might be wrong.

I’d like to know what you and other developers at Nextflow think. Actually I would imagine a similar request has already been made.