Task.hash on resume null

In nextflow I have processes that accept a number of files from a channel one at a time.

In my settings I set the publish dir to be based on the hash or task.inputs which works fine for the first run of the workflow. However it seems these vars evaluate to null when using the resume option and so I only get one dir in my published dir rather than the multiple ones expected.

    publishDir = [
    path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}/${task.hash}" },
    mode: params.publish_dir_mode,
    saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

Output on first run:

  drwxrwxr-x    3 sruv domain_users  25K Feb 19 10:14 00173ae51b98f95373c353c17ee00ecc
  drwxrwxr-x    3 sruv domain_users  25K Feb 19 10:08 001c4bfe7782888fca798a1dadcbe73f
  drwxrwxr-x    3 sruv domain_users  25K Feb 19 10:03 001f1c9805ca32094c58ef7cb6e4a911
  drwxrwxr-x    3 sruv domain_users  25K Feb 19 10:03 
...

On second run with resume:

  drwxrwxr-x  3 sruv domain_users 512 Feb 19 10:39 null

How can I set these to stay unique even on resume?

Welcome, @Sam_Neaves !

Help me understand your problem, please. The resumed jobs are already published by the previous run, aren’t they?

Yes they are.

The initial run all results are published as expected, with a dir for each time the process runs with the differnt input. But for subsequant runs using resume it seems that the var task.hash evaluates to null and so each of the files will over write the output into the ‘null’ dir and I only get one output.

1 Like

Could you please share a minimal reproducible example?

#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

process echoValue {

    input:
    val x 

    output:
    path "value.txt"
    
    script:
    """
    echo "${x}" > value.txt
    """
}

workflow {
    value_ch = Channel.of('Value1', 'Value2', 'Value3')
    echoValue(value_ch)
}

With config:

process {
        publishDir = [
        path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}/${task.hash}" },	
        mode: 'copy',
        saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
    ]
}

First run:

nextflow run main.nf --outdir /delete90d/sruv/out/y_2024_02_19_01 -w /delete90d/sruv/work/

In my echovalue dir I get three folders with the hashes as expected:

  drwxrwxr-x 2 sruv domain_users 25K Feb 19 15:29 05018bb830f082a3c15217783802f0ee
  drwxrwxr-x 2 sruv domain_users 25K Feb 19 15:29 6416b2ccc0a205595bd466057a62e53f
  drwxrwxr-x 2 sruv domain_users 25K Feb 19 15:29 dce9a02e785e116f04e1c7240ccb085a

But then if I run:

 nextflow run main.nf --outdir /delete90d/sruv/out/y_2024_02_19_02 -resume -w /delete90d/sruv/work/

In my echovalue dir I only get one folder “null”.

I am trying to achieve something like:

https://www.nextflow.io/docs/latest/faq.html#how-do-i-get-a-unique-id-based-on-the-file-name

But with a config for all processes rather than having something specific for each processes.

task.index seems to do the job

It does, indeed, but I was wondering how much information these folder names give you about the runs. Are you sure you don’t want to add a timestamp or something like that? When you check the folders, how do you know which folder is the one you’re looking for? :sweat_smile:

I manually set the outdirr for each run I do, then I can see the output of each process. I think I can also do something more clever with dynamic tags based on inputs.

Another good suggestion from @robsyme is to add the runName to the folder name. You gotta see what’s useful for you. As long as it’s clear from what run those outputs came from, I think it should be OK :slight_smile: