Steps for reproducible example for broken Nextflow cache, after successful retries

Evangelos_Karatzas · March 3, 2026, 4:05pm

This is linked to the slack message here: Slack
and probably linked to this old/closed post: https://community.seqera.io/t/resume-not-loading-retries-from-cache/892

My initial message was:

A pipeline step succeeds (but on a retry attempt), then the pipeline continues and fails on a later step, but then on pipeline resume, the on-retry-succeeded step is not picked up, and that step runs again? :/

I figured that this isn’t always the case; you also need to have attempted to run that step before, at least once (i.e., having a relevant work folder for that module created).

Here are the steps to reproduce the caching system fail:

clone proteinfamilies
alter conf/test.config here to make it fail on the 1st attempt, but pass on the 2nd:

withName: 'NFCORE_PROTEINFAMILIES:PROTEINFAMILIES:FAA_SEQFU_SEQKIT:SEQKIT_SEQ' {
       memory = { task.attempt == 1 ? 1.MB : 2.GB } // memory = { 1.4.GB * task.attempt }
       time   = { 2.m    * task.attempt }
   }

Your executor has to be able to stop a process when it requires excess memory than what was defined (or use docker), so I used slurm with this slurm.config:

profiles {
   slurm {
       executor {
           name              = "slurm"
           queueSize         = 100
           queueGlobalStatus = true

       }
       workDir = "/path/to/work_proteinfamilies_cache_test/"
       process {
           queue  = 'standard'
           cache  = 'lenient'
       }
       params {
           // Boilerplate options
           outdir = "${launchDir}/results"
       }
   }
}

NXF_VER=25.10.4 nextflow run proteinfamilies -c conf/slurm.config -profile singularity,test,slurm -resume
As soon as the work folder for SEQKIT_SEQ is created, kill the pipeline (ctrl +C) (this step is improtant, otherwise if it passes on the 2nd attempt, without pre-failing at least once, the cache will work properly)
Resume the pipeline with the same command, and wait for SEQKIT_SEQ to complete successfully on its 2nd attempt.
Kill the pipeline again.
Resume the pipeline with the same command. Verdict: SEQKIT_SEQ will begin to run again, while SEQFU_STATS_BEFORE, which has run in parallel and completed on its first attempt, will have been cached properly.

Evangelos_Karatzas · March 3, 2026, 4:20pm

I fed these steps to the bots and they suggested a potential fix.

I am leaving the relevant PR here: bot attempt to figure the bug, based on reproducible example by vagkaratzas · Pull Request #6882 · nextflow-io/nextflow · GitHub

Topic		Replies	Views
Resume not loading retries from cache Ask for help nextflow	8	212	January 16, 2025
Caching doesn't work always \|\| already processed data fails Ask for help	1	251	February 14, 2024
Caching and resuming to skip one of the parallel jobs Ask for help nextflow	8	73	October 22, 2025
Handling Partial Pipeline Execution - How to skip successful upstream tasks without breaking cache Ask for help nextflow	8	42	February 4, 2026
Slurm requiring multiple resumes for pipeline advancement Ask for help	1	92	August 13, 2024

Steps for reproducible example for broken Nextflow cache, after successful retries

Related topics