Collapse sample names in one table but not another


I am testing nf-core/sarek with test_full profile (somatic). Everything is good. But there is a thing with general stats table.

  • With default config of MultiQC, the table is quite sparse, thus, difficult for report readers.
  • I tried cleaning filenames, but this would lead to overwriting in bcftools stats.
  • Does it make sense if I remove those stats from general stats table, and create a general stats in bcftools section, just like VEP?

Thank you very much, I truly appreciate your support!

1 Like

Hi @Hanh_Hoang,

Thanks for posting and welcome to the community! You’ve started us off with a really excellent question :smile:

To summarise (please correct me if I’m wrong), the problem here is that you want table-specific sample name cleaning. To clean + collapse sample names in General Stats, but not in other tables.

This is not currently possible within MultiQC, but it could be interesting to do. I think it’s basically the same as the ancient GitHub issue #542 from way back in 2017:

I suspect that this would be the best solution :point_up_2:, @vlad.savelyev maybe we can take a fresh look at this and see if we can move it up the roadmap a little.

Another approach would be to look at how this data is getting into the report in the first place, from the nf-core/sarek pipeline. @maxulysse / @FriederikeHanssen - have you guys had any similar requests in the past, or any ideas on this topic?


1 Like

I think it’s a good idea, and I’m all for improving the MultiQC reports however we can.

I just made an issue for the simpler per-table sample name cleaning idea here:

Hi Phil,

Thank you for your really useful suggestion. I wonder if the sample_merge_groups lead to overwriting in general stats table? A bit more about the context, all my screenshots was from general stats table.

In addition, cleaning file name does affect the other sections of the report (reduce, ovewriting the samples, I think that your suggestion could help with this). Please kindly take a look at the default & cleaned fn reports here

I kinda wonder why bcftools stats is in general stats table, while VEP is not and has it own general stats table at VEP sections.

Also, should I do the same with bcftools, as VEP. Maybe if I do so, the row with sample name like HCC1395T_vs_HCC1395N.strelka.somatic_snvs that comes from bcftools, may not appear in general stats table.


Then I will only need to clean (the md and recal also lead to overwrite tho :slightly_frowning_face:):

  - "_val"
  - "_1"
  - "_2"
  - ".md"
  - ".recal"
  - "-1"

Please correct me if I misundestand something! Thank you very much!

aha, I didn’t realise this - thank you for clarifying! Ok, then issue #2097 will not help you :disappointed: (at least, not in respect to Bcftools data being overwritten in the table).

This is a good point - that would be another easy fix at MultiQC level, to move that info into a separate table. The reason is to have these statistics in the General Stats table alongside other “general” stats from other tools, for comparison (eg. do samples with high % duplicates have low SNP counts, or whatever). But in this case, pairing with #2097 it’d be easier to have them in a separate table.

We could think about moving them, or even having a module-specific config flag to choose whether they go in General Stats (as now, default behaviour) or a separate table instead (opt-in). @maxulysse / @FriederikeHanssen - any opinions on this behaviour for Bcftools specifically?

Sincerely thank you!

Trích dẫn that would be another easy fix at MultiQC level, to move that info into a separate table.

Could you please specify how to do that? Thank you very much!

This is a change to core MultiQC module code, so for this please submit a new issue on the MultiQC GitHub repository requesting the change.

Ahh I got it, thank you! (Sorry, I was thinking that it is something could be done with setting config, like we could select the stats from which tools will be add to general stats).

I think this is not a common need, and might be not a problem if just one or two tools are run. So maybe I will try customizing a bit on my own instead of create a request!

Once again, sincerely thank you and Seqera Lab Team for timely support! :smiling_face:

Hey! apologies, I didn’t get any notifications about this thread :scream: . In general, I have no strong opinion about where to put the bcftools stats table. I agree that the general table at the top is pretty hard to read and it would be nice to split it up a bit. I don’t know if we want to collapse multiple variantcallers, but at least splitting up preprocessing and variantcalling into several stats tables would probably already improve readability a lot.