Task.hash on resume null

Sam_Neaves · February 19, 2024, 12:59pm

In nextflow I have processes that accept a number of files from a channel one at a time.

In my settings I set the publish dir to be based on the hash or task.inputs which works fine for the first run of the workflow. However it seems these vars evaluate to null when using the resume option and so I only get one dir in my published dir rather than the multiple ones expected.

    publishDir = [
    path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}/${task.hash}" },
    mode: params.publish_dir_mode,
    saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

Output on first run:

  drwxrwxr-x    3 sruv domain_users  25K Feb 19 10:14 00173ae51b98f95373c353c17ee00ecc
  drwxrwxr-x    3 sruv domain_users  25K Feb 19 10:08 001c4bfe7782888fca798a1dadcbe73f
  drwxrwxr-x    3 sruv domain_users  25K Feb 19 10:03 001f1c9805ca32094c58ef7cb6e4a911
  drwxrwxr-x    3 sruv domain_users  25K Feb 19 10:03 
...

On second run with resume:

  drwxrwxr-x  3 sruv domain_users 512 Feb 19 10:39 null

How can I set these to stay unique even on resume?

mribeirodantas · February 19, 2024, 2:05pm

Welcome, @Sam_Neaves !

Help me understand your problem, please. The resumed jobs are already published by the previous run, aren’t they?

Sam_Neaves · February 19, 2024, 2:11pm

Yes they are.

The initial run all results are published as expected, with a dir for each time the process runs with the differnt input. But for subsequant runs using resume it seems that the var task.hash evaluates to null and so each of the files will over write the output into the ‘null’ dir and I only get one output.

mribeirodantas · February 19, 2024, 2:13pm

Could you please share a minimal reproducible example?

Sam_Neaves · February 19, 2024, 2:35pm

#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

process echoValue {

    input:
    val x 

    output:
    path "value.txt"
    
    script:
    """
    echo "${x}" > value.txt
    """
}

workflow {
    value_ch = Channel.of('Value1', 'Value2', 'Value3')
    echoValue(value_ch)
}

With config:

process {
        publishDir = [
        path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}/${task.hash}" },	
        mode: 'copy',
        saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
    ]
}

First run:

nextflow run main.nf --outdir /delete90d/sruv/out/y_2024_02_19_01 -w /delete90d/sruv/work/

In my echovalue dir I get three folders with the hashes as expected:

  drwxrwxr-x 2 sruv domain_users 25K Feb 19 15:29 05018bb830f082a3c15217783802f0ee
  drwxrwxr-x 2 sruv domain_users 25K Feb 19 15:29 6416b2ccc0a205595bd466057a62e53f
  drwxrwxr-x 2 sruv domain_users 25K Feb 19 15:29 dce9a02e785e116f04e1c7240ccb085a

But then if I run:

 nextflow run main.nf --outdir /delete90d/sruv/out/y_2024_02_19_02 -resume -w /delete90d/sruv/work/

In my echovalue dir I only get one folder “null”.

I am trying to achieve something like:

https://www.nextflow.io/docs/latest/faq.html#how-do-i-get-a-unique-id-based-on-the-file-name

But with a config for all processes rather than having something specific for each processes.

Sam_Neaves · February 19, 2024, 4:20pm

task.index seems to do the job

mribeirodantas · February 19, 2024, 7:09pm

It does, indeed, but I was wondering how much information these folder names give you about the runs. Are you sure you don’t want to add a timestamp or something like that? When you check the folders, how do you know which folder is the one you’re looking for?

Sam_Neaves · February 19, 2024, 7:24pm

I manually set the outdirr for each run I do, then I can see the output of each process. I think I can also do something more clever with dynamic tags based on inputs.

mribeirodantas · February 19, 2024, 8:26pm

Another good suggestion from @robsyme is to add the runName to the folder name. You gotta see what’s useful for you. As long as it’s clear from what run those outputs came from, I think it should be OK

Topic		Replies	Views
Resume workflow based on files in publishDir (or other external directory) Ask for help	9	273	September 27, 2024
Which features of a task must be unchanged for resuming to work? Ask for help	2	70	September 26, 2024
Why would native processes ( exec ) be resubmitted even though the hash inputs don't change? Ask for help nextflow	2	27	July 4, 2025
Does -resume account for tool version? Training nextflow	2	46	March 12, 2025
How to optimise the pipeline for resume functionality? Ask for help nextflow	2	135	April 2, 2024

Task.hash on resume null

Related topics