Bundling in tuple gives different results in process output vs. workflow

bskubi · March 18, 2024, 11:20pm

I’m trying to understand how to bundle channels into tuples and pass them as process or subworkflow inputs and outputs.

Here is a minimal example that make me confused.

process p1 {
    input:
    path A
    path B
    
    output:
    tuple(path(A), path(B))

    script:
    "touch ${A}; touch ${B}"
}

process p2 {
    input:
    tuple(path(A), path(B))

    script:
    ":"
}

workflow {
    A = Channel.fromPath("A.txt")
    B = Channel.fromPath("B.txt")

    // This works
    //p2(p1(A, B))

    // This gives the error
    p2(tuple(A, B))
}

The above example run with nextflow run demo.nf using nextflow 23.10.1 on Ubuntu 22.04.4 LTS gives the error:

Not a valid path value type: groovyx.gpars.dataflow.DataflowBroadcast (DataflowBroadcast around DataflowStream[?])

This surprised me. Both the commented-out example labeled ‘this works’ and the ‘this gives the error’ lines seem like they ought to produce equivalent results.

It seems like bundling channels into a tuple in the process output section leads to a different result than doing it in the workflow section. What’s the difference under the hood? Is there a way to modify the ‘this gives the error’ example to make it work as an input to p2?

mribeirodantas · March 19, 2024, 2:51am

The p1 process outputs a channel element with two items (two paths). The p2 process expects a channel element containing a tuple with two paths as input, so it works just fine. tuple(A, B), however, creates a tuple with two channels. This is not what p2 expects. Inside process instances (tasks), there’s no channel, just the items inside a channel element: Paths, values, and so on. When you pass a channel to it, it then lets you know that a channel (groovyx.gpars.dataflow.DataflowBroadcast) is not a valid path.

I confess it’s a bit confusing if you’re not well versed in the concepts of channels and processes, but what you’re doing is like passing a Channel.of(Channel.of... Why? Because when you pass to a channel a non-channel variable type (such as a tuple), Nextflow will implicitly convert it to a value channel, so what you’re actually doing is Channel.value(tuple(Channel.fromPath("A.txt"), Channel.fromPath("B.txt")))

bskubi · March 19, 2024, 5:45am

Thank you very much for your help. Sounds like the combine operator is what I’d use to assemble an input to p2 via: p2(A.combine(B))

So just to check my understanding, it sounds like every line of process input and output implicitly corresponds to a single channel. What’s described in the input/output is what will be the contents of those channels.

When we call A = Channel.fromPath("A.txt"), we are creating a channel named A which contains a path item (sun.nio.fs.UnixPath according to A.map{it.getClass())} whose path string is “[working directory]/A.txt”. A.getClass() by contrast is groovyx.gpars.dataflow.DataflowBroadcast, which I’m assuming is the under-the-hood implementation of a channel. Is that correct?

mribeirodantas · March 19, 2024, 1:27pm

Yes. Every line in the input block refers to a single element from an input channel. That’s why we often say that usually the only places you’re dealing with regular variables (not channels) in Nextflow are inside a process or inside a channel operator.

When you pass a channel to a process, the process consumes an element from that channel (a regular variable). If you pass a channel with a channel (or a multi-channel, output from e.g. branch or a process with multiple outputs), Nextflow will complain.

Just to clarify terminology, you’re not wrong to refer to the entities inside channels as items, but we usually refer to them as elements, and to parts of it (if it’s a list for example) as items. It makes it easier to understand when things get complicated.

For example:

Channel.of([1, 2, 3], [4, 5, 6])

The channel above has two elements with three items each.

Adam_Talbot · March 19, 2024, 5:43pm

When we call A = Channel.fromPath("A.txt") , we are creating a channel named A which contains a path item (sun.nio.fs.UnixPath according to A.map{it.getClass())} whose path string is “[working directory]/A.txt”. A.getClass() by contrast is groovyx.gpars.dataflow.DataflowBroadcast, which I’m assuming is the under-the-hood implementation of a channel. Is that correct?

Marcel has given a great explanation, but allow me to use a little analogy.

Your process are the metro stations, each item in the channel is the train, each line is the channel. Your workflow is the whole tube map.

In your example:
Station: p1, p2
Lines: A, B
Trains (consist of either): path(A), path(B) or tuple(path(A), path(B))

And you gotta make sure those trains fit on the platform. That’s what the operators are for.

I have to apologise if I’m explaining that monads are burritos.

system · March 26, 2024, 5:43pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help passing a tuple with paths as input for a process without splitting the files (Invalid method invocation 'call' with arguments: ...) Ask for help nextflow , nf-core	3	36	July 7, 2025
Collecting channel entries consisting of tuples and sending it to a process Tips & Tricks nextflow	0	237	March 21, 2024
How to parse nested tuples in process input? Ask for help nextflow	1	45	October 29, 2024
Nested tuple as input Ask for help nextflow	4	116	October 23, 2024
How to send combined channel to a process \|\| facing cardinality issue Ask for help	3	266	March 13, 2024

Bundling in tuple gives different results in process output vs. workflow

Related topics