Multiple Inputs vs Tuple Inputs

I’m a little confused about multiple inputs in nextflow. I find myself using tuple inputs (almost) exclusively when a process should take more than one input (which is pretty much every process).

Tuples seem clearly superior, because they allow passing arbitrary combinations of values to a process. Using a single, long tuple as a process input means the process can be reasoned about the same as a function in a procedural language (albeit an async function).

The only time I do use multiple inputs (without wrapping them in a tuple) is when passing a values that are “global” to the entire workflow execution, for example a “Run ID”.
But this could also be achieved with a tuple (after re-reading the docs for .join and .combine).

So what’s the utility of having multiple inputs function the way they currently do?
Is this a remant from early nextflow that has to be kept for backward compatibility? Or am I too stuck in the procedural paradigm to grok this dataflow concept?

1 Like

I would probably say yes, this is a remnant of early Nextflow.
This is basically the direction I think the developers want to go with the introduction of Record types, and reduction of channel operators. The aim would be to make the combine explicit

1 Like

Great question! You’re not missing anything — your intuition about tuples is solid, and this is a common point of confusion.

When multiple inputs make sense:

The key distinction is between queue channels (multiple items, consumed as they flow through) and value channels (single item, reusable). When your second input is truly a “global” value that won’t change per-sample (like your Run ID example, or a reference genome path), using a separate value channel input is a valid choice:

process ALIGN {
    input:
    tuple val(sample_id), path(reads)
    path reference  // value channel - same for all samples

    // ...
}

This works cleanly because value channels automatically repeat for each item in the queue channel. Note that using the multi-channel approach locks you into a workflow that cannot scale to multiple reference genomes - the multi-channel approach should not be used if you think there is a chance that you’ll want to supply multiple items in both channels.

The tuple-everything approach (your “wide tuple”):

Using combine to merge everything into one tuple before the process is also perfectly valid! You’re right that it has some of great features (some of my personal favourites):

  • Pipe-friendly: Clean chaining with |

  • Explicit data flow: All the “shape manipulation” happens in channel operations, making the process signature simpler

  • Flexibility: Easy to switch from combining with one item to many

The nf-core community generally favors keeping tuples focused on sample-specific data and using separate input channels for shared resources. This isn’t because wide tuples are wrong, it’s more about readability at scale and consistency across many authors (a convention everyone follows, even if not strictly “better”)

I don’t think you’re stuck in procedural thinking. In fact, I think that the multi-channel/narrow-tuple approach indicates a more procedural approach (it looks closer to a regular function call). Both approaches are valid Nextflow. If wide tuples work well for your team and workflow complexity, keep using them! The “best” choice often depends on who else needs to read and maintain the code.

As Mahesh notes, the combine approach is probably more likely to become the standard when Record types are available in next year.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.