Here is a toy example of a pipeline architecture with a somewhat counterintuitive behavior:
workflow A {
take:
ch
main:
ch | filter{it.containsKey("b")} | view
emit:
ch
}
workflow B {
take:
ch
main:
ch | map{it.b = true} | set{ch}
}
workflow {
ch = channel.fromList([[:], [b:false]])
ch | A | B
}
Output
[b:true]
[b:true]
In this example, an alteration to a channel item made in a subsequent subworkflow is seen by the previous subworkflow. This counterintuitive behavior makes some sense – B fires immediately because A doesn’t alter the channel items, and the changes it makes are seen by the filter and view operators in A due to the timing of when they fire.
The context in which this occurs in practice is where the same QC-launching subworkflow is called multiple times on the same channel. This channel contains hashmap channel items, where the hashmap may have several different keys associated with QCable stats files. Later subworkflows may add keys and a filename as the setup to calling a process that will produce the stats file itself. But the earlier QC subworkflow (analogous to A, above) can see that the key and filename was added by the later subworkflow (analogous to B), so it tries to pass that not-yet-created stats file to its QC process call, which creates an error.
I have workarounds, but it seems like there ought to be a more elegant solution. Using the toy example above, what would be the right way to achieve the intended behavior, where A should display [b:false]
by filtering out the empty hashmap and calling view before B is run?