Thanks for posting and welcome to the community! You’ve started us off with a really excellent question
To summarise (please correct me if I’m wrong), the problem here is that you want table-specific sample name cleaning. To clean + collapse sample names in General Stats, but not in other tables.
This is not currently possible within MultiQC, but it could be interesting to do. I think it’s basically the same as the ancient GitHub issue #542 from way back in 2017:
I suspect that this would be the best solution , @vlad.savelyev maybe we can take a fresh look at this and see if we can move it up the roadmap a little.
Another approach would be to look at how this data is getting into the report in the first place, from the nf-core/sarek pipeline. @maxulysse / @FriederikeHanssen - have you guys had any similar requests in the past, or any ideas on this topic?
Thank you for your really useful suggestion. I wonder if the sample_merge_groups lead to overwriting in general stats table? A bit more about the context, all my screenshots was from general stats table.
In addition, cleaning file name does affect the other sections of the report (reduce, ovewriting the samples, I think that your suggestion could help with this). Please kindly take a look at the default & cleaned fn reports here
I kinda wonder why bcftools stats is in general stats table, while VEP is not and has it own general stats table at VEP sections.
Also, should I do the same with bcftools, as VEP. Maybe if I do so, the row with sample name like HCC1395T_vs_HCC1395N.strelka.somatic_snvs that comes from bcftools, may not appear in general stats table.
Then I will only need to clean (the md and recal also lead to overwrite tho ):
- "_val"
- "_1"
- "_2"
- ".md"
- ".recal"
- "-1"
Please correct me if I misundestand something! Thank you very much!
aha, I didn’t realise this - thank you for clarifying! Ok, then issue #2097 will not help you (at least, not in respect to Bcftools data being overwritten in the table).
This is a good point - that would be another easy fix at MultiQC level, to move that info into a separate table. The reason is to have these statistics in the General Stats table alongside other “general” stats from other tools, for comparison (eg. do samples with high % duplicates have low SNP counts, or whatever). But in this case, pairing with #2097 it’d be easier to have them in a separate table.
We could think about moving them, or even having a module-specific config flag to choose whether they go in General Stats (as now, default behaviour) or a separate table instead (opt-in). @maxulysse / @FriederikeHanssen - any opinions on this behaviour for Bcftools specifically?
Ahh I got it, thank you! (Sorry, I was thinking that it is something could be done with setting config, like we could select the stats from which tools will be add to general stats).
I think this is not a common need, and might be not a problem if just one or two tools are run. So maybe I will try customizing a bit on my own instead of create a request!
Once again, sincerely thank you and Seqera Lab Team for timely support!
Hey! apologies, I didn’t get any notifications about this thread . In general, I have no strong opinion about where to put the bcftools stats table. I agree that the general table at the top is pretty hard to read and it would be nice to split it up a bit. I don’t know if we want to collapse multiple variantcallers, but at least splitting up preprocessing and variantcalling into several stats tables would probably already improve readability a lot.