Resume not loading retries from cache

We have a Process with a large amount of variation in the input file sizes, so we have retry with increasing memory:

process {
    withLabel:IndexDedup {
        queue='regular'
        cpus = 5
        memory={ 10.GB * task.attempt }
        time='8h'
        errorStrategy ={ 'retry'  }
        maxRetries= 5
    }
}

When I resume, the result from the input that worked with 10 GB is successfully retrieved from cache, but the result for a large input file that had 2 failures before success, with memory = { 10.GB * 3 } = 30.GB, is not retrieved from cache.

Increasing the initial memory so that the process works first time fixed the issue. However I don’t particularly want to allocate 30 GB for all inputs, and I want to resume from the success.

I see on Caching and resuming — Nextflow documentation that Task attempt is part of the cache hash. Do I need a way of setting it to be the final, successful task attempt?

Thanks,
Jocelyn

Can you share a minimal reproducible example?

Are you using $task.memory in the script block, for example?

Thanks for replying so fast. No, I am not using task.memory in the scripts, but my first attempt at a stripped-down example did not reproduce the problem. I will reply again when I have managed to reproduce with something less than the full suite

Thanks, @JocelynSP. In the meantime, I’d recommend having a read at this blogpost, that shows some cases in which the process was [maybe unexpectedly, for the user] re-run. More here, and here.

Hi, are there any updates on this? It is quite cumbersome to re-run part of the analysis if a task succeeded on next attempt…

No, sorry. My 1st attempt at a minimal example didn’t reproduce it, and I got busy elsewhere. I will try to revisit it this month

I found this thread Should `-resume` always resubmit processes that failed but had errorStrategy set to ignore? · nextflow-io/nextflow · Discussion #4062 · GitHub
Perhaps, workflow.skipIgnoredOnResume option will be implemented soon…

I can’t make a reproducible example. It seems to be an intermittent problem, and my attempts to reproduce with a small example are not succeeding in failing.
Please close this, and if it recurs I may raise it again

1 Like