Imposing names, cardinality, structure on channel elements?

Processes can enforce a specific cardinality and type on the elements of channels they receive as input. Is it possible to do this with workflow inputs and outputs?

This question stems from a larger challenge I’m having, which is figuring out how to structure channels.

In an object-oriented setting, we define classes to bundle together related methods, give meaningful names to variables, and enforce expectations on how the data they carry is structured.

Channels lack much of that functionality. We can give them a name (which is analogous to a class name), but the variables they store are amorphous, untyped and unnamed. There’s no central reference akin to a class definition that enforces how a channel must be structured.

So far, my best attempt to create something like this is to define channel-creation workflows that simply return a single channel with a specific structure. This works OK, but it still doesn’t, for example, permit accessing specific channel elements by name.

channel.fromFilePairs(params.fastq).map{it[0]} does not make it obvious that this will output a channel containing just the sample ids or that it[0] is a value rather than a path, for example.

I could imagine creating Groovy classes that store channels and have named accessor functions that return the channel elements, but this seems cumbersome, so I’m not sure what to do.

Thanks for any information you can provide on features Nextflow might provide to help with this.

This is something that we are working to improve in the language. Still in early stages, but basically we want to add better static type inference and support for custom record types to Nextflow (similar to WDL). We’re also working on an IDE plugin to leverage these features, so that e.g. you could hover over a function or variable and see the (return) type.

With these pieces in place, I don’t think you’ll ever need to annotate the type of a channel because it will be inferred from the types of params, channel factories, operators, and processes. The IDE should be able to show it, and the compiler should be able to show it if there is a type mismatch

Likely also the documentation could be improved by adding proper function signatures with argument / return types for channel factories and operators

As a stopgap for now, nf-core pipelines use comments to denote the type / structure of workflow takes/emits