Reports with large sample numbers

A general question.

I’ve been using MultiQC for a few years, and am now dealing with larger sample numbers. While MultiQC supports both flat and interactive plots, each has its drawbacks. Flat plots lack the interactivity that MultiQC offers, making them hard to review. On the other hand, interactive plots can be slow and cumbersome when dealing with multiple modules and hundreds to thousands of samples.

Does anyone have tips or strategies for presenting results from large sample sets more effectively in MultiQC? I’m interested in hearing about any creative solutions from those with more experience.

There aren’t a lot of great solutions to this that I know of - as you say, both approaches are somewhat compromises. One option you haven’t mentioned is using the parsed data that is generated in multiqc_data and pulling that into custom scripts / notebooks etc. to generate your own bespoke plots for presentation. This allows a bit more flexibility in how you display outputs. It’s a bit of a pain to need to write your own script though, and kind of goes against the whole point of MultiQC.

Dealing with huge numbers of samples was one of the motivations for starting the MegaQC project long ago:

That project has largely stalled though and is not super usable. I’m hoping that we can build something new before long (2024?) which will meet this need.


1 Like