Caching and resuming to skip one of the parallel jobs

rachelgriffard · October 21, 2025, 7:18pm

I have a Nextflow pipeline where two processes run multiple parallel tasks (e.g., 19 groups through step1() → step2()). The pipeline is stuck on step2(19), while the other 18 tasks are cached successfully. Is there a way to stop and resume the pipeline so that it uses the cached results for the 18 completed groups and skips or excludes the failed group (19) from moving forward?

The ideal goal is that we can retain our current progress and remove that one “group” that was running in parallel to move forward. My initial thoughts was to remove the information needed for that parallel run to make the step2 fail and set it to fail silently? But I did not know if this would somehow mess up the current caching.

rachelgriffard · October 21, 2025, 7:31pm

For example, would something like this work? I want to make sure I can still move forward to the next step with the other 18 cached results.

step1_out.filter { !it.name.contains(‘group19’) } .set { step1_filtered }

mribeirodantas · October 21, 2025, 8:37pm

Hi Rachel! Welcome to the community forum

The first question that comes to mind is: Why is this task (process instance) getting stuck? Did you check the logs? If you did, but didn’t find a reason for that, you can ask Seqera AI for help debugging this issue (there is also a VS Code integration that makes life much easier with nextflow ).

As for getting rid of this task and moving forward, you can terminate this pipeline just fine, remove this troublesome sample(s) (for task 19) and run the pipeline again with `-resume`. This way, you wouldn’t even need to change the code. Simply don’t provide sample 19 as input to the pipeline.

Does that work?

rachelgriffard · October 21, 2025, 9:19pm

Great question. While we are working on adjusting the pipeline, we just want to keep going on the data we have for a student on a deadline. Eventually we will go back and rerun it once that step is fixed, but we don’t want to lose the progress for them for a class. Definitely an odd scenario.

I actually have steps before this that manually split one large manifest into 19 separate pieces to run, and then sends it to the next step to run in parallel. In other words, I do not manually feed in 18 different samples, I have the pipeline split up my samples for me. I am putting that portion here. In this way, I would not be able to remove those samples without removing from the work directory, which I assume which mess up the resume flag? (s5a takes in one file, splits it to many, runs through the next steps, and s5d puts them back together. s5c is failing for the 19th group. We want to take 1-18 and move forward and drop the samples in the 19th group.)

s5a_out = s5a_QIIME2_split2denoise(s3_out, base_name_ch, trunc_forward_ch, trunc_reverse_ch, s4_out.done)

s5b1_out = s5b1_QIIME2_splitManifest(s5a_out, metadata_ch, input_format_ch, base_name_ch)

s5b_out = s5b2_QIIME2_importGroups(s5b1_out, s5a_out, input_type_ch, input_format_ch)

s5c_in = s5b_out
            .flatten()
            .combine(trunc_forward_ch)
            .combine(trunc_reverse_ch)
            .combine(base_name_ch)


s5c_out = s5c_QIIME2_denoiseGroups(s5c_in)
        
s5_out = s5d_QIIME2_mergeDenoised(s5c_out.rep_seqs, s5c_out.table, base_name_ch)

bentsherman · October 21, 2025, 9:22pm

For what it’s worth, we are discussing a possible language improvement that seems related to your issue:

github.com/nextflow-io/nextflow

Error handling for processes

opened 10:36AM - 22 Oct 18 UTC

micans

pinned lang/processes

We would like NF to support metadata tracking and routing; this would for exampl…e enable a process to inform another process that an action was not taken or a stage not passed, this could trigger other tasks. I include two examples from our current pipeline below; perhaps not a perfect match, but they informed my initial thinking about this. ```groovy process irods { tag "${samplename}" input: val samplename from sample_list.flatMap{ it.readLines() } output: set val(samplename), file('*.cram') optional true into ch_cram_files file('*.lostcause.txt') optional true into ch_lostcause_irods script: """ if bash -euo pipefail irods.sh -t ${params.studyid} -s ${samplename}; then true else stat=\$? tag='UNKNOWN' if [[ \$stat == 64 ]]; then tag='nofiles'; fi echo -e "${samplename}\\tirods\\t\$tag" > ${samplename}.lostcause.txt fi """ } # later ... ch_star_accept = Channel.create() ch_star_reject = Channel.create() star_aligned .choice(ch_star_accept, ch_star_reject) { namelogsbams -> check_log(namelogsbams[1]) ? 0 : 1 } ch_star_accept .map { name, logs, bams -> [name, bams] } .into { ch_featurecounts; ch_indexbam } ch_star_reject .map { it -> "${it[0]}\tSTAR\tlowmapping\n" } .mix(ch_lostcause_irods) .set { ch_lostcause } ```

The current idea is to add an error: section that works like the output: section but allows you to handle certain kinds of failures in your workflow logic instead of Nextflow’s error handling mechanism. For example, you could save errors due to “bad samples” to a report and allow the run to finish “successfully”

Feel free to jump in on that discussion if you’re interested

rachelgriffard · October 21, 2025, 9:23pm

This would be great to have. I will definitely follow along with that.

mahesh.binzerpanchal · October 22, 2025, 11:31am

Something some people have been doing in the nf-core community is making use of the when: block.

Every nf-core module has this in it:

process SOME_PROCESS {
    input:
    tuple val(meta), path(some_file)
    ....

    when:
    task.ext.when == null || task.ext.when

    script:
    ....

This effectively moves the ability to control flow from the workflow to the config.

process {
    withName: 'SOME_PROCESS' {
        ext.when = { meta.id !in ['sample19'] } // Exclude sample 19
    }
}

mribeirodantas · October 22, 2025, 1:15pm

Loved @mahesh.binzerpanchal’s solution, @rachelgriffard ! It’s easy to turn it on/off along the way, by just commenting or uncommenting these few lines in a configuration file.

Topic		Replies	Views
Caching doesn't work always \|\| already processed data fails Ask for help	1	213	February 14, 2024
Enabling resume for specific processes Tips & Tricks nextflow	5	378	September 22, 2025
How to skip specific/failed samples on next `-resume` Tips & Tricks nextflow	2	118	April 16, 2025
Show cached tasks in nextflow run preview Ask for help nextflow	8	68	December 4, 2024
How to keep the pipeline going if error occurs at any step for any sample? Ask for help	2	286	February 11, 2024

Caching and resuming to skip one of the parallel jobs

Related topics