How to process multiple runs of featurecounts in the same directory where each run used a differnt gtf file

Hello, I ran two featurecount command lines in the same directory that used the same bam files but using a different GTF file. Therefore, I have featurecounts.tsv and featurecounts_smORF.tsv. The two sample names are identical in the ouptut files. Therefore, when running MultiQC, it is only showing results for featurecounts.tsv and not featurecounts_smORF.tsv. How to get MultiQC to show results for both files.

I don’t want to change the sample names in each output file nor move them into their own directory. I tried various params like fullnames and fn_as_s_name but didn’t work

This would have been my suggestion. Are you able to attach some files for us to try to replicate locally please?

I’m getting errors uploading the TSV and other feature count files. How can I share them with you?

I’ve just updated the forum to allow .tsv files. But if there’s anything else / generally speaking, just zip / tar / compress the files and attach. That should be allowed.

Hi Phil,

Attached is the tar file. I changed the featurecounts files to remove identifying information and truncated them to make smaller. See the README.txt for version info and command lines. I had to delete the multiqc html files because the upload size was too big
featurecounts_test.tar (200 KB)

Thank you,

Paul

Please ignore the “multiqc_data” folder. I forgot to delete it.

1 Like

Thanks for this, it helps. Let me explain what’s going on here.

The setup is that you have two featureCounts .summary files, each with two samples:

  • featurecounts.tsv.summary
    • fp_3hpi
    • fp_uninf_1
  • featurecounts_smORF.tsv.summary
    • fp_3hpi
    • fp_uninf_1

You want a final report with 4 sets of stats.

First, the default behaviour. The MultiQC featureCounts module takes sample names from the header row of the .summary file, which contains input files. This generates fp_3hpi and fp_uninf_1 in both cases, so as you say - the results from the second file overwrite the first.

Trying with --fn_as_s_name does actually work as expected, but ends with similar behaviour. Instead of using the sample names from the header, MultiQC uses the log filenames. This means that the two log files are now generating separate sets of stats, however both columns get the same name, so the two columns overwrite each other in both cases and you get samples called featurecounts and featurecounts_smORF.

What you need is a combination of both of these features - sample names from the summary header and sample names from the log filename. Unfortunately, there is no native way to do this in MultiQC that I can think of. By far the easiest is to move the log files into separate subdirectories and then use the --dirs / --dirs-depth.

If this really isn’t an option then say and I can have a think of other ways to achieve the same effect.

Phil

Hi Phil,

Thank you for the detailed description. I thought about putting them into separate directories but I was hoping there was some custom multiqc config that could be written. I’ll just copy them into different directorires.

Thank you again

1 Like