Which features of a task must be unchanged for resuming to work?

Hi people!

Now that I have run a very long task with my first workflow, I am interested in resuming. My own understanding of “resuming” was that if any intermediate output already exists (and is newer than its dependencies), we start again from it. This is what I have experienced with snakemake for example.

In Nextflow, I understand that it is much more restrictive, as it will only resume from exactly identical tasks, as defined in Caching and resuming — Nextflow documentation : the task hash represents many variables such as the process script, the process inputs, or the session id.

This session id troubles me in particular, because the only information I found is “Unique identifier (UUID) associated to current execution.” How is it computed exactly?

So for example, if I start the same unchanged workflow script one day after, will the session id be identical?
What if I changed code in the workflow that is not affecting the process to resume (such as adding downstream processes)?

Thanks in advance!

Generally, a task restarts under these conditions:

  • Task Input data changes
  • The process script changes
  • The output is missing
  • The previous attempt had failed
  • A directory generated by another process, is given as input and is written to by the next process.

You don’t need to worry about the session id. At least what I understand the session id to be here is the name of the last session ( of the form adjective_scientist). -resume can take a session name as a value e.g. -resume intergalactic_wescoff, but if the session name is left off, it will restart from the last one in the nextflow history. This makes a new session id, but this is appropriately tracked in nextflow that tasks are associated correctly with certain sessions. I would suggest trying a toy workflow to explore the -resume behaviour, for example if you add extra input samples. You’ll find the behaviour very similar to Snakemake, except for that last point.

The session id’s change per run, but it’s to tell Nextflow which process working directories to check.
If you’re adding a downstream process without affecting input then the workflow will continue from the last successfully executed tasks with -resume.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.