Complete DAG: show tasks (not process) dependencies

arnaudceol · March 26, 2025, 10:05pm

I know I can generate a DAG or a report from a workflow run. It will show the list of tasks on one side, the process dependencies on the other side. What I would like to see is the dependencies between the tasks.

For instance, look at this pseudo workflow:

process A {
 input: path in
 output: path out
}

process B {
 input: path in
 output: path out
}

Workflow {
 channel(file1, files2) | A | B
}

Trace would report:

taskid1 A(1)
taskid2 A(2)
taskid3 B(1)
taskid4 B(2)

The information I’m missing is taskid1 → taskid3 and taskid2 → taskid3. This would help for instance to track the processing of the files from the input to the output.

Is this information already available? Or do would I need to write my own plugin?

Thanks

Adam_Talbot · March 27, 2025, 10:18am

No plugin required, this is a core feature of Nextflow but it’s not very obvious at first.

Tag Directive

Firstly, the exact solution you are looking for is the tag directive. This allows you to tag any tasks which will appear in the log.

In this example, I use a file method to get the name of the file and use it as a tag which will appear in the log:

process A {

    tag "${in.simpleName}"

    input: 
        path in
    output: 
        path "*_out"

    script:
    """
    mv $in ${in.simpleName}_out
    """

}

process B {

    tag "${in.simpleName}"

    input: 
        path in
    output: 
        stdout

    script:
    """
    echo $in 
    """
}

workflow {

    def file1 = file("${workDir}/hello.txt")
    file1.text = "Hello, world!"
    
    def file2 = file("${workDir}/morning.txt")
    file2.text = "Good morning!"
    

    Channel.of(file1, file2) | A | B
}

> nextflow run . -ansi-log false
N E X T F L O W  ~  version 24.10.5
Launching `./main.nf` [crazy_jennings] DSL2 - revision: 4a7461e06d
[e8/39fc6d] Submitted process > A (hello)
[96/58ffcd] Submitted process > A (morning)
[d0/d55e07] Submitted process > B (hello_out)
[05/6d8792] Submitted process > B (morning_out)

Metadata Propagation

Of course, this is a very simple example and relies on filenames. Filenames are deeply unreliable and should never be used to hold metadata.

Nextflow supports propagating data with the files, i.e. you can pass sample information such as the ID, treatment etc along with the files themselves and use that information in each process. This is extremely valuable because you can construct complex instructions from all the data you have accessible. For a deep dive into this topic, check out the advanced training: Metadata Propagation - training.nextflow.io

In this example, I build some files using a map from a greeting and return a tuple of [ greeting, file ]. I then use the greeting as the tag to identify the task:

process A {

    tag "${greeting}"

    input: 
        tuple val(greeting), path(in)
    output: 
        tuple val(greeting), path("*_out")

    script:
    """
    mv $in ${greeting}_out
    """

}

process B {

    tag "${greeting}"

    input: 
        tuple val(greeting), path(in)
    output: 
        stdout

    script:
    """
    echo $in 
    """
}

workflow {

    Channel.of("hello", "morning")
        .map { greeting ->
            def greetingFile = file("${workDir}/${greeting}.txt")
            greetingFile.text = "${greeting} world!"
            return [ greeting, greetingFile ]
        }
        .set { greetings }

    greetings | A | B
}

> nextflow run . -ansi-log false
N E X T F L O W  ~  version 24.10.5
Launching `./main.nf` [determined_lamarck] DSL2 - revision: c1789b1008
[0a/a31ba3] Submitted process > A (morning)
[9b/f30dbc] Submitted process > A (hello)
[b7/5b3025] Submitted process > B (hello)
[1b/016608] Submitted process > B (morning)

There’s nothing stopping you constructing complex tags, e.g. "${sampleId}_${referenceName}" to indicate combinations of inputs.

Provenance and Dependency Chaining

Maybe I’ve missed the point entirely, and you are really after provenance of the tasks, which in this situation means which task relates to which earlier tasks. This is more complicated and can’t really be expressed by tags but it’s something we’re working on.

arnaudceol · March 28, 2025, 9:23am

Thanks Adam. I’m actually looking for the provenance of the task, and for some way to do it for existing pipelines. I’ve looks at the entrypoints for plugins, and at the task handlers, but didn’t found any attribute/method with this information. The closest think I 've found is looking at the symbolic links in the work directory to see where the data comes from (e.g. from which task).

bentsherman · March 28, 2025, 7:41pm

Check out the dag format in nf-prov, it will give you the task DAG as a Mermaid diagram. You might also look at the source code to see how I use the TraceObserver to infer the provenance.

The only caveat is that it uses input/output files to track provenance, so it can’t track e.g. a val output being passed to a val input. But this is a rare edge case and there are ways to avoid it if needed.

arnaudceol · April 1, 2025, 11:07am

Great, this is what I was looking for. thanks a lot!

system · April 8, 2025, 11:07am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to get more informative log - pipeline trace? Ask for help	3	152	February 26, 2024
Provenance or Audit Trail of computation Ask for help nextflow , platform	2	33	September 10, 2024
Connecting / Chaining existing Nextflow workflows in a meta workflow Tips & Tricks nextflow	1	145	July 25, 2024
Dynamic tagging of processes Ask for help nextflow	4	23	June 23, 2025
How to capture logs from htCondor with nextflow Ask for help	8	29	July 4, 2025

Complete DAG: show tasks (not process) dependencies

Tag Directive

Metadata Propagation

Provenance and Dependency Chaining

Related topics