Join channels based on meta field

Hi

I was wondering how to join two channels based on only one field of the meta tuple, eg. if I have a channel like:
[[id:target, group:target, control:input, single_end:true], genome.target.dedup.bam]
[[id:input, group:input, control:, single_end:true], genome.input.dedup.bam]

and I want to do a “left join” where I join the channel to itself based on id ==control so my output channel is:
[[id:target, group:target, control:input, single_end:true], genome.target.dedup.bam, genome.input.dedup.bam]
the second channel entry from the original channel would be ignored because it has no control value.

Is this possible? Or is there a more Nextflow pattern for doing that?

One approach would be to split your channel into two separate ones (ex. whether control is defined or not). And then temporarily move the key you want to join on to be the first element (id and control). Then you could join based on these keys, and reformat to have the metadata and bam files be in the order you need.

    // Recreate the example channel
    def meta_target = [:]; meta_target.id = "target"; meta_target.group = "target"; meta_target.control = "input"; meta_target.single_end = true
    def bam_target  = file("genome.target.dedup.bam")
    def meta_input  = [:]; meta_input.id = "input"; meta_input.group = "input"; meta_input.control = null; meta_input.single_end = true
    def bam_input   = file("genome.input.dedup.bam")    
    def ch_bam      = Channel.of([[meta_target, bam_target], [meta_input, bam_input]]).flatMap()

    // Split channel based on whether control is defined
    ch_bam
        .branch { meta, bam -> 
            control:     meta.control
            no_control: !meta.control
        }
        .set { result }

    // Move the key you want to join on to be the first element
    ch_control    = result.control.map    { meta, bam -> [meta.control, meta, bam ]}
    ch_no_control = result.no_control.map { meta, bam -> [meta.id, meta, bam ]}

    // Join the channels, and reformat the output to drop the join key, and keep only the first metadata
    ch_bam_join = ch_control.join(ch_no_control).map { key, meta_1,bam_1,meta_2,bam_2 -> [meta_1, bam_1, bam_2]}
    ch_bam_join.view()
2 Likes

Very helpful, thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.