This is linked to the slack message here: Slack
and probably linked to this old/closed post: https://community.seqera.io/t/resume-not-loading-retries-from-cache/892
My initial message was:
A pipeline step succeeds (but on a retry attempt), then the pipeline continues and fails on a later step, but then on pipeline resume, the on-retry-succeeded step is not picked up, and that step runs again? :/
I figured that this isn’t always the case; you also need to have attempted to run that step before, at least once (i.e., having a relevant work folder for that module created).
Here are the steps to reproduce the caching system fail:
- clone proteinfamilies
- alter conf/test.config here to make it fail on the 1st attempt, but pass on the 2nd:
withName: 'NFCORE_PROTEINFAMILIES:PROTEINFAMILIES:FAA_SEQFU_SEQKIT:SEQKIT_SEQ' {
memory = { task.attempt == 1 ? 1.MB : 2.GB } // memory = { 1.4.GB * task.attempt }
time = { 2.m * task.attempt }
}
- Your executor has to be able to stop a process when it requires excess memory than what was defined (or use docker), so I used slurm with this slurm.config:
profiles {
slurm {
executor {
name = "slurm"
queueSize = 100
queueGlobalStatus = true
}
workDir = "/path/to/work_proteinfamilies_cache_test/"
process {
queue = 'standard'
cache = 'lenient'
}
params {
// Boilerplate options
outdir = "${launchDir}/results"
}
}
}
-
NXF_VER=25.10.4 nextflow run proteinfamilies -c conf/slurm.config -profile singularity,test,slurm -resume
-
As soon as the work folder for SEQKIT_SEQ is created, kill the pipeline (ctrl +C) (this step is improtant, otherwise if it passes on the 2nd attempt, without pre-failing at least once, the cache will work properly)
-
Resume the pipeline with the same command, and wait for SEQKIT_SEQ to complete successfully on its 2nd attempt.
-
Kill the pipeline again.
-
Resume the pipeline with the same command. Verdict: SEQKIT_SEQ will begin to run again, while SEQFU_STATS_BEFORE, which has run in parallel and completed on its first attempt, will have been cached properly.