Caching and resuming to skip one of the parallel jobs

I have a Nextflow pipeline where two processes run multiple parallel tasks (e.g., 19 groups through step1()step2()). The pipeline is stuck on step2(19), while the other 18 tasks are cached successfully. Is there a way to stop and resume the pipeline so that it uses the cached results for the 18 completed groups and skips or excludes the failed group (19) from moving forward?

The ideal goal is that we can retain our current progress and remove that one “group” that was running in parallel to move forward. My initial thoughts was to remove the information needed for that parallel run to make the step2 fail and set it to fail silently? But I did not know if this would somehow mess up the current caching.

For example, would something like this work? I want to make sure I can still move forward to the next step with the other 18 cached results.

step1_out.filter { !it.name.contains(‘group19’) } .set { step1_filtered }

Hi Rachel! Welcome to the community forum :slight_smile:

The first question that comes to mind is: Why is this task (process instance) getting stuck? Did you check the logs? If you did, but didn’t find a reason for that, you can ask Seqera AI for help debugging this issue (there is also a VS Code integration that makes life much easier with nextflow :wink:).

As for getting rid of this task and moving forward, you can terminate this pipeline just fine, remove this troublesome sample(s) (for task 19) and run the pipeline again with `-resume`. This way, you wouldn’t even need to change the code. Simply don’t provide sample 19 as input to the pipeline.

Does that work?

1 Like

Great question. While we are working on adjusting the pipeline, we just want to keep going on the data we have for a student on a deadline. Eventually we will go back and rerun it once that step is fixed, but we don’t want to lose the progress for them for a class. Definitely an odd scenario.

I actually have steps before this that manually split one large manifest into 19 separate pieces to run, and then sends it to the next step to run in parallel. In other words, I do not manually feed in 18 different samples, I have the pipeline split up my samples for me. I am putting that portion here. In this way, I would not be able to remove those samples without removing from the work directory, which I assume which mess up the resume flag? (s5a takes in one file, splits it to many, runs through the next steps, and s5d puts them back together. s5c is failing for the 19th group. We want to take 1-18 and move forward and drop the samples in the 19th group.)

s5a_out = s5a_QIIME2_split2denoise(s3_out, base_name_ch, trunc_forward_ch, trunc_reverse_ch, s4_out.done)

s5b1_out = s5b1_QIIME2_splitManifest(s5a_out, metadata_ch, input_format_ch, base_name_ch)

s5b_out = s5b2_QIIME2_importGroups(s5b1_out, s5a_out, input_type_ch, input_format_ch)

s5c_in = s5b_out
            .flatten()
            .combine(trunc_forward_ch)
            .combine(trunc_reverse_ch)
            .combine(base_name_ch)


s5c_out = s5c_QIIME2_denoiseGroups(s5c_in)
        
s5_out = s5d_QIIME2_mergeDenoised(s5c_out.rep_seqs, s5c_out.table, base_name_ch)

For what it’s worth, we are discussing a possible language improvement that seems related to your issue:

The current idea is to add an error: section that works like the output: section but allows you to handle certain kinds of failures in your workflow logic instead of Nextflow’s error handling mechanism. For example, you could save errors due to “bad samples” to a report and allow the run to finish “successfully”

Feel free to jump in on that discussion if you’re interested

1 Like

This would be great to have. I will definitely follow along with that.

Something some people have been doing in the nf-core community is making use of the when: block.

Every nf-core module has this in it:

process SOME_PROCESS {
    input:
    tuple val(meta), path(some_file)
    ....

    when:
    task.ext.when == null || task.ext.when

    script:
    ....

This effectively moves the ability to control flow from the workflow to the config.

process {
    withName: 'SOME_PROCESS' {
        ext.when = { meta.id !in ['sample19'] } // Exclude sample 19
    }
}
2 Likes

Loved @mahesh.binzerpanchal’s solution, @rachelgriffard ! It’s easy to turn it on/off along the way, by just commenting or uncommenting these few lines in a configuration file.

1 Like