How to process multiple runs of featurecounts in the same directory where each run used a differnt gtf file

pcantalupo · June 14, 2024, 6:20pm

Hello, I ran two featurecount command lines in the same directory that used the same bam files but using a different GTF file. Therefore, I have featurecounts.tsv and featurecounts_smORF.tsv. The two sample names are identical in the ouptut files. Therefore, when running MultiQC, it is only showing results for featurecounts.tsv and not featurecounts_smORF.tsv. How to get MultiQC to show results for both files.

I don’t want to change the sample names in each output file nor move them into their own directory. I tried various params like fullnames and fn_as_s_name but didn’t work

ewels · June 14, 2024, 10:40pm

This would have been my suggestion. Are you able to attach some files for us to try to replicate locally please?

pcantalupo · June 14, 2024, 11:43pm

I’m getting errors uploading the TSV and other feature count files. How can I share them with you?

ewels · June 15, 2024, 4:28am

I’ve just updated the forum to allow .tsv files. But if there’s anything else / generally speaking, just zip / tar / compress the files and attach. That should be allowed.

pcantalupo · June 15, 2024, 11:02am

Hi Phil,

Attached is the tar file. I changed the featurecounts files to remove identifying information and truncated them to make smaller. See the README.txt for version info and command lines. I had to delete the multiqc html files because the upload size was too big
featurecounts_test.tar (200 KB)

Thank you,

Paul

pcantalupo · June 15, 2024, 11:03am

Please ignore the “multiqc_data” folder. I forgot to delete it.

ewels · June 15, 2024, 6:13pm

Thanks for this, it helps. Let me explain what’s going on here.

The setup is that you have two featureCounts .summary files, each with two samples:

featurecounts.tsv.summary
- fp_3hpi
- fp_uninf_1
featurecounts_smORF.tsv.summary
- fp_3hpi
- fp_uninf_1

You want a final report with 4 sets of stats.

First, the default behaviour. The MultiQC featureCounts module takes sample names from the header row of the .summary file, which contains input files. This generates fp_3hpi and fp_uninf_1 in both cases, so as you say - the results from the second file overwrite the first.

Trying with --fn_as_s_name does actually work as expected, but ends with similar behaviour. Instead of using the sample names from the header, MultiQC uses the log filenames. This means that the two log files are now generating separate sets of stats, however both columns get the same name, so the two columns overwrite each other in both cases and you get samples called featurecounts and featurecounts_smORF.

What you need is a combination of both of these features - sample names from the summary header and sample names from the log filename. Unfortunately, there is no native way to do this in MultiQC that I can think of. By far the easiest is to move the log files into separate subdirectories and then use the --dirs / --dirs-depth.

If this really isn’t an option then say and I can have a think of other ways to achieve the same effect.

Phil

pcantalupo · June 15, 2024, 8:27pm

Hi Phil,

Thank you for the detailed description. I thought about putting them into separate directories but I was hoping there was some custom multiqc config that could be written. I’ll just copy them into different directorires.

Thank you again

Topic		Replies	Views
Naming Convention for Consolidating FASTQC and Bowtie2 Reports Ask for help multiqc	2	35	November 22, 2024
Fastp: not enough samples showing Ask for help multiqc , fastp	4	487	December 11, 2023
Collapse sample names in one table but not another Ask for help multiqc	9	430	October 27, 2023
Multiqc output files have changed v1.21 Ask for help multiqc	2	34	August 14, 2024
Combining several MultiQC reports into one Ask for help multiqc	4	526	February 15, 2024

How to process multiple runs of featurecounts in the same directory where each run used a differnt gtf file

Related topics