Recently I ran nf-core/rnaseq on my RNAseq data. I was wondering if there was a way to change the behaviour of how the multiqc sorts the data.
Here I have samples in a time series. The samples are sorted top-to-bottom and left-to-right alpha-numerically, but I would like to sort them sensibly by time (ie. 0h, 2h, 4h, etc. instead of 0h, 10h, 12h, etc.). Is there a way to achieve this behaviour or pass in a custom predicate function for desired sorting behaviour for sample names?
Lazy answer: Honestly, the easiest fix is probably to rename your samples to have a leading 0 so that they sort alphabetically (eg. 02h etc). This is especially true given that you’re running within the nf-core/rnaseq pipeline, which makes modification more awkward / undesirable.
Note that there is a button above the plot to switch between Sorted by sample and Clustered. Assuming some biological similarity, I’d kind of hope that the clustered view might make more sense in your case.
That said, let’s try to look for some sorting logic..
I can’t see any sorting of samples within the pipeline, it seems to just cat all the files together, so likely will be random / POSIX within that file (you can check that source file to confirm):
I also don’t think that we sort within the custom content module for heatmaps, I guess that the only sort we do is here in the heatmap code:
So yeah, I don’t think that there is any way to customise this currently, sorry. Feel free to put in a GitHub issue requesting it as a new feature.
We do natural sample name sorting (i.e. 1, 2, 10, 20 instead of 1, 10, 2, 20) for other plot types, so it would be very straightforward to do that for heatmap. Will do that update!