OK so letās break it down.
- Create the ability to insert pause points into the run.
I have a class WorkflowSequencePermit which has params and a description of the workflow steps loaded in, and is passed between subworkflows. Its allows() method lets the subworkflows determine if they should be run.
This assigns responsibility for parsing this aspect of nextflow params to the WorkflowSequencePermit and allows a description of the expected linear sequence of processing steps to be articulated in the base workflow only.
Iām not 100% sure the purpose of this is over the normal execution of Nextflow. I canāt see anything here that canāt be achieved with channels, could you provide a concrete example?
- Wrapping importable common closures to facilitate operating on elements structured as a single LinkedHashMap item rather than as a list of items.
I use a variety of closures to enable more powerful manipulation of channel contents than Nextflow affords. Since Nextflow doesnāt provide functionality to import Groovy classes and methods using the include statement, and I donāt want to compile them, I wrap the closures as statics in a class called ExOp. This is imported in a common.groovy
file using the GroovyClassLoader, which can then be included in my .nf modules via evaluate(new File("common.groovy"))
.
The reason I use these closures is that I want the ability to conveniently identify channel item elements with keys, so that instead of a list of elements, we have a LinkedHashMap of elements. The closures I wrote facilitate extracting and ordering channel items as process inputs and receiving outputs back into the LinkedHashMap format. They also facilitate doing joins with the join operator by splintering the appropriately-ordered LInkedHashMap into individual key:value pairs, doing the join, and then reassembling the complete LinkedHashMap.
This functionality gives several advantages:
- Each subworkflow is shielded from having to know the order in which outputs are provided/needed by previous/subsequent subworkflows.
- We can select desired elements from subworkflow inputs using legible key strings rather than inscrutable integer offsets.
- We can much more easily organize parameters provided by users in a .csv specifying per-sample processing params and trivially update them with sensible defaults.
- Channels become more inspectable, since the view operator will display key labels along with values.
Although it may not solve all of your issues, you can use a map (LinkedHashMap) as an input to Nextflow and access elements by name. This is a very common pattern in nf-core and widely used in all Nextflow. Hereās a simple example:
process CAT_FILE {
input:
val input
output:
// Note how we preserve the map as a through value for joining etc.
tuple val(input), path("${input.name}.txt")
script:
"""
cat ${input.file} > ${input.name}.txt
"""
}
workflow {
// Channel of 2 maps.
ch_input = Channel.of(
[name: "test_copy" , file: file("test.txt", checkIfExists: true)],
[name: "test_another_copy", file: file("test.txt", checkIfExists: true)]
)
CAT_FILE(ch_input).view()
}
However, this is hardly complete. It sounds like what you are after is a full typing system, with the ability to parse arbitrary objects as inputs and outputs. This is exactly something @bentsherman has been working on and has some prototypes already. I donāt think we can give any firm timelines yet but using a more robust input and output system than tuples is a clear improvement for the Nextflow language and key objective. Iām sure heād appreciate some input to design decisions or code if you have any thoughts.
- I am still experimenting with this, but I will probably create classes specialized for parsing specific aspects of nextflow params.
I want to decouple parsing parameters from implementing the behavior those params are meant to control. The natural way to do this is to create a class that parses params and provides more stable methods that workflows can poll to determine how they ought to behave. The WorkflowSequencePermit I described above is an example.
My overriding goal here is to make it so that workflows donāt have any responsibility for keeping track of the UI (i.e. the mapping from nextflow params to desired behaviors). This is because they have the responsibility for implementing the control logic that governs which processes are run and how. Something has to map params to desired behaviors, though, and the best answer Iāve come up with is a class.
So a simple use case of yours might be:
workflow {
ch_qc_files
// Skip QC boolean based on a number of parameters
| filter { !WorkflowSequencePermit.skipQc() }
| QC
}
Where WorkflowSequencePermit.skipQc
checks for number of parameters for whether to run QC.
Of course, this example doesnāt have to use a class:
def skip_qc(channel, params) {
// extremely complicated logic here
return true
}
workflow {
ch_qc_files
| filter { !skip_qc(it, params) }
| QC
}
Naturally, you could do this with the filter operator (as above), if statements or then when statement, in conjunction with functions and/or closures. You can inspect the contents of a channel with these tools to make fairly complicated evaluations, however if your example is exceedingly complex you may want a method. You can do this using a class in the lib/ directory although I note that nf-core has moved away from this recently because they found it wasnāt necessary: GitHub - nf-core/sarek at dev.
In summary, I think you can achieve a good chunk without adding classes and the additional parts should be language level features which we may already be working on. I think you should be able to write your pipeline with Nextflow in itās current form, but additional features would help and I think these would make excellent examples of the direction the language could take, we always appreciate feedback and ideas so donāt hesitate to post them here or Github.