Issue grouping by id

I am trying to group technical replicates for further analysis, if I do:

    //Merge technical replicates of the same type {
        meta, bam ->
        fmeta = meta + [ id: meta.sample + '_' + meta.type ]
        [fmeta, bam] }.view()

I get what is supposed to (an updated id):

[[id:test1_signal, sample:test1, replicate:replicate2, type:signal, single_end:false],test1_replicate2_signal.bam]
[[id:test1_background, sample:test1, replicate:replicate2, type:background, single_end:false], test1_replicate2_background.bam]
[[id:test1_signal, sample:test1, replicate:replicate1, type:signal, single_end:false],test1_replicate1_signal.bam]
[[id:test1_background, sample:test1, replicate:replicate1, type:background, single_end:false], test1_replicate1_background.bam]

But then, following the example in the nascent pipeline, if I try to group by this new id: {
        meta, bam ->
        fmeta = meta + [ id: meta.sample + '_' + meta.type ]
        [fmeta, bam] }
        .groupTuple(by: [0])
	.map { it ->  [ it[0], it[1].flatten() ] }

I seem to get a very similar thing:

[[id:test1_background, sample:test1, replicate:replicate2, type:background, single_end:false], [test1_replicate2_background.bam]]
[[id:test1_signal, sample:test1, replicate:replicate2, type:signal, single_end:false],
[[id:test1_signal, sample:test1, replicate:replicate1, type:signal, single_end:false],
[[id:test1_background, sample:test1, replicate:replicate1, type:background, single_end:false], [test1_replicate1_background.bam]]
-[nf-core/eclipseq] Pipeline completed successfully-

I don’t see the grouping working!! It should group by the id, all the test1_signal together and the test1_background together!! The outdir/samtools/merge directory looks like this:

(env_nf) bash-4.2$ ls -lt test/samtools/*.bam
ls -lt test/samtools/*.bam
-rw-r--r-- 1 rbarrant pi-jdragon 8251 May 27 18:43 test/samtools/test1_signal.bam
-rw-r--r-- 1 rbarrant pi-jdragon 8699 May 27 18:43 test/samtools/test1_background.bam
-rw-r--r-- 1 rbarrant pi-jdragon 7920 May 27 18:42 test/samtools/test1_replicate1_background_genome.bam
-rw-r--r-- 1 rbarrant pi-jdragon 7822 May 27 18:42 test/samtools/test1_replicate1_signal_genome.bam
-rw-r--r-- 1 rbarrant pi-jdragon 7947 May 27 18:42 test/samtools/test1_replicate2_background_genome.bam
-rw-r--r-- 1 rbarrant pi-jdragon 7466 May 27 18:42 test/samtools/test1_replicate2_signal_genome.bam

I don’t think this is merging the bam files properly is it?

Let’s inspect the first element of your channel.


When you use groupTuple and specify the index to 0, what you want to say is that you want to collect elements in this channel by the entire first item of the list which is [id:test1_signal, sample:test1, replicate:replicate2, type:signal, single_end:false].

In such circumstances, what you usually need to do is to clone the item value you’re interested in so that you can group based on it. Check my snippet below:

// Reproducing your input channel
  .of([[id:'test1_signal', sample:'test1', replicate:'replicate2', type:'signal', single_end:'false'], file('test1_replicate2_signal.bam')],
      [[id:'test1_background', sample:'test1', replicate:'replicate2', type:'background', single_end:'false'], file('test1_replicate2_background.bam')],
      [[id:'test1_signal', sample:'test1', replicate:'replicate1', type:'signal', single_end:'false'], file('test1_replicate1_signal.bam')],
      [[id:'test1_background', sample:'test1', replicate:'replicate1', type:'background', single_end:'false'], file('test1_replicate1_background.bam')])
  .set { my_ch }

  .map { meta, paths -> [, [meta, paths]] } // cloning to index 0
  .groupTuple(by: 0)

The output:

You can easily get rid of the “extra” id later if you want with something like:

... { extra_id, grouped_items -> grouped_items }.set { my_ch }

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.