I was wondering how to join two channels based on only one field of the meta tuple, eg. if I have a channel like:
[[id:target, group:target, control:input, single_end:true], genome.target.dedup.bam]
[[id:input, group:input, control:, single_end:true], genome.input.dedup.bam]
and I want to do a “left join” where I join the channel to itself based on id ==control so my output channel is:
[[id:target, group:target, control:input, single_end:true], genome.target.dedup.bam, genome.input.dedup.bam]
the second channel entry from the original channel would be ignored because it has no control value.
Is this possible? Or is there a more Nextflow pattern for doing that?
One approach would be to split your channel into two separate ones (ex. whether control is defined or not). And then temporarily move the key you want to join on to be the first element (id and control). Then you could join based on these keys, and reformat to have the metadata and bam files be in the order you need.
// Recreate the example channel
def meta_target = [:]; meta_target.id = "target"; meta_target.group = "target"; meta_target.control = "input"; meta_target.single_end = true
def bam_target = file("genome.target.dedup.bam")
def meta_input = [:]; meta_input.id = "input"; meta_input.group = "input"; meta_input.control = null; meta_input.single_end = true
def bam_input = file("genome.input.dedup.bam")
def ch_bam = Channel.of([[meta_target, bam_target], [meta_input, bam_input]]).flatMap()
// Split channel based on whether control is defined
ch_bam
.branch { meta, bam ->
control: meta.control
no_control: !meta.control
}
.set { result }
// Move the key you want to join on to be the first element
ch_control = result.control.map { meta, bam -> [meta.control, meta, bam ]}
ch_no_control = result.no_control.map { meta, bam -> [meta.id, meta, bam ]}
// Join the channels, and reformat the output to drop the join key, and keep only the first metadata
ch_bam_join = ch_control.join(ch_no_control).map { key, meta_1,bam_1,meta_2,bam_2 -> [meta_1, bam_1, bam_2]}
ch_bam_join.view()