Issue grouping by id

ramirobarrantes · May 27, 2024, 10:28pm

I am trying to group technical replicates for further analysis, if I do:

    //Merge technical replicates of the same type
    BAM_MARKDUPLICATES_PICARD.out.bam.map {
        meta, bam ->
        fmeta = meta + [ id: meta.sample + '_' + meta.type ]
        [fmeta, bam] }.view()

I get what is supposed to (an updated id):

[[id:test1_signal, sample:test1, replicate:replicate2, type:signal, single_end:false],test1_replicate2_signal.bam]
[[id:test1_background, sample:test1, replicate:replicate2, type:background, single_end:false], test1_replicate2_background.bam]
[[id:test1_signal, sample:test1, replicate:replicate1, type:signal, single_end:false],test1_replicate1_signal.bam]
[[id:test1_background, sample:test1, replicate:replicate1, type:background, single_end:false], test1_replicate1_background.bam]

But then, following the example in the nascent pipeline, if I try to group by this new id:

    BAM_MARKDUPLICATES_PICARD.out.bam.map {
        meta, bam ->
        fmeta = meta + [ id: meta.sample + '_' + meta.type ]
        [fmeta, bam] }
        .groupTuple(by: [0])
	.map { it ->  [ it[0], it[1].flatten() ] }
        .view()

I seem to get a very similar thing:

[[id:test1_background, sample:test1, replicate:replicate2, type:background, single_end:false], [test1_replicate2_background.bam]]
[[id:test1_signal, sample:test1, replicate:replicate2, type:signal, single_end:false],
 [test1_replicate2_signal.bam]]
[[id:test1_signal, sample:test1, replicate:replicate1, type:signal, single_end:false],
 [test1_replicate1_signal.bam]]
[[id:test1_background, sample:test1, replicate:replicate1, type:background, single_end:false], [test1_replicate1_background.bam]]
-[nf-core/eclipseq] Pipeline completed successfully-

I don’t see the grouping working!! It should group by the id, all the test1_signal together and the test1_background together!! The outdir/samtools/merge directory looks like this:


(env_nf) bash-4.2$ ls -lt test/samtools/*.bam
ls -lt test/samtools/*.bam
-rw-r--r-- 1 rbarrant pi-jdragon 8251 May 27 18:43 test/samtools/test1_signal.bam
-rw-r--r-- 1 rbarrant pi-jdragon 8699 May 27 18:43 test/samtools/test1_background.bam
-rw-r--r-- 1 rbarrant pi-jdragon 7920 May 27 18:42 test/samtools/test1_replicate1_background_genome.bam
-rw-r--r-- 1 rbarrant pi-jdragon 7822 May 27 18:42 test/samtools/test1_replicate1_signal_genome.bam
-rw-r--r-- 1 rbarrant pi-jdragon 7947 May 27 18:42 test/samtools/test1_replicate2_background_genome.bam
-rw-r--r-- 1 rbarrant pi-jdragon 7466 May 27 18:42 test/samtools/test1_replicate2_signal_genome.bam

I don’t think this is merging the bam files properly is it?

mribeirodantas · May 27, 2024, 10:56pm

Let’s inspect the first element of your channel.

[
  [
    id:test1_signal,
    sample:test1,
    replicate:replicate2,
    type:signal,
    single_end:false
  ],
  test1_replicate2_signal.bam
]

When you use groupTuple and specify the index to 0, what you want to say is that you want to collect elements in this channel by the entire first item of the list which is [id:test1_signal, sample:test1, replicate:replicate2, type:signal, single_end:false].

In such circumstances, what you usually need to do is to clone the item value you’re interested in so that you can group based on it. Check my snippet below:

// Reproducing your input channel
Channel
  .of([[id:'test1_signal', sample:'test1', replicate:'replicate2', type:'signal', single_end:'false'], file('test1_replicate2_signal.bam')],
      [[id:'test1_background', sample:'test1', replicate:'replicate2', type:'background', single_end:'false'], file('test1_replicate2_background.bam')],
      [[id:'test1_signal', sample:'test1', replicate:'replicate1', type:'signal', single_end:'false'], file('test1_replicate1_signal.bam')],
      [[id:'test1_background', sample:'test1', replicate:'replicate1', type:'background', single_end:'false'], file('test1_replicate1_background.bam')])
  .set { my_ch }

my_ch
  .map { meta, paths -> [meta.id, [meta, paths]] } // cloning meta.id to index 0
  .groupTuple(by: 0)
  .take(1)
  .view()

The output:

You can easily get rid of the “extra” id later if you want with something like:

...
  my_ch.map { extra_id, grouped_items -> grouped_items }.set { my_ch }
...

system · June 3, 2024, 10:57pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problem grouping technical replicates Ask for help	3	88	June 3, 2024
How to group appropriately Ask for help	1	65	June 5, 2024
Grouping,mapping is shuffling the labels Ask for help nextflow	4	59	June 21, 2024
I don't know how to sort Ask for help	3	60	June 28, 2024
Duplicate file names Ask for help	3	30	November 7, 2024

Issue grouping by id

Related topics