Samtools coverage output not found by multiqc

ErinKinghorn · June 21, 2024, 7:16am

Hello! I have been running samtools coverage on 25 different cram files. However, neither stdout or .txt files are being picked up by MultiQC. I have tried to figure out if this is an error on my part with file extensions that MultiQC prefers but am not able to find anything online.

Any assistance would be greatly appreciated.

Thanks
Erin

vlad.savelyev · June 21, 2024, 7:15pm

Hi @ErinKinghorn, would you be able to share the outputs from samtools coverage, so we can reproduce and investigate the problem?

ewels · June 24, 2024, 7:51am

You may find the MultiQC samtools documentation helpful:

You can see in the file search patterns at the bottom of that page that the samtools coverage module doesn’t look at filenames (or extensions) but rather looks for this string within the first 10 lines:

#rname\tstartpos\tendpos\tnumreads\tcovbases\tcoverage\tmeandepth\tmeanbaseq\tmeanmapq

There’s also general troubleshooting documentation available for when MultiQC can’t find any outputs from a tool:

If you attach a log file or two we can try to look into the problem our side. If the file extensions aren’t allowed in the forum, just create a zip archive.

Phil

ErinKinghorn · June 26, 2024, 9:26am

Hi All

Thank you for your responses. Please find the .log file that is outputted when I run samtools coverge. It had put all information for all samples into one .log file. However I am able to split them per sample if I need. It looks like the column headings are for example startpos instead of tstartpos like you highlighted above. Is this a problem on my end?

multiqc_prep.log (372.1 KB)

Thank you!
Erin

ewels · June 26, 2024, 10:06am

It’s \t then startpos - the \t is an escape character for a <tab>. So don’t worry, the pattern is the same as the header in your file.

Was it samtools itself that did this, or you / the way that you ran samtools? I think you’ll need to split them into separate files. If nothing else, that will be required to give each sample a different sample identifier. Otherwise there’s no way to know what coverage data applies to which sample.

When I try running MultiQC with your example .log file it finds it fine, but then crashes:

╭───────────────────────────────────── Oops! The 'samtools' MultiQC module broke... ─────────────────────────────────────╮
│ Please copy this log and report it at https://github.com/MultiQC/MultiQC/issues                                        │
│ Please attach a file that triggers the error. The last file found was: ./multiqc_prep.log                              │
│                                                                                                                        │
│ Traceback (most recent call last):                                                                                     │
│   File "/Users/ewels/GitHub/MultiQC/MultiQC/multiqc/core/exec_modules.py", line 71, in exec_modules                    │
│     these_modules: Union[BaseMultiqcModule, List[BaseMultiqcModule]] = module_initializer()                            │
│                                                                        ^^^^^^^^^^^^^^^^^^^^                            │
│   File "/Users/ewels/GitHub/MultiQC/MultiQC/multiqc/modules/samtools/samtools.py", line 65, in __init__                │
│     n["coverage"] = self.parse_samtools_coverage()                                                                     │
│                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                     │
│   File "/Users/ewels/GitHub/MultiQC/MultiQC/multiqc/modules/samtools/coverage.py", line 18, in parse_samtools_coverage │
│     metrics_by_chrom = parse_single_report(f)                                                                          │
│                        ^^^^^^^^^^^^^^^^^^^^^^                                                                          │
│   File "/Users/ewels/GitHub/MultiQC/MultiQC/multiqc/modules/samtools/coverage.py", line 236, in parse_single_report    │
│     startpos=int(startpos),                                                                                            │
│              ^^^^^^^^^^^^^                                                                                             │
│ ValueError: invalid literal for int() with base 10: 'startpos'                                                         │
│                                                                                                                        │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

This is because the repeated header lines confuse the module parsing. I will put in a pull-request to make the module tolerate this kind of thing without a hard crash.

So in my testing, if you split the file up into multiple without concatenation, then I think it should work.

Could you paste the log output from MultiQC that you get when running it? What version of MultiQC are you using?

Phil

ErinKinghorn · June 26, 2024, 10:25am

Hi Phil!

Thanks for your response!
I did split them into different files and it still didn’t work!
Please see one attached here

SC_AGVPKS5573913_coverage.txt (198.8 KB)

I’m using multiqc/1.17. And I don’t get a log output from MultiQC when I run it. You should be able to see in the screenshot attached - this is the only output I get from MultiQC.

Erin

vlad.savelyev · June 26, 2024, 10:59am

Hi Erin! Parsing the samtools coverage output was only added into multiqc=v1.21. Would you be able to update MultiQC in your system?

pip install --upgrade multiqc

ewels · June 26, 2024, 11:18am

Pull request to throw warnings instead of the hard crash here:

ErinKinghorn · June 28, 2024, 9:25am

Hi!

I have updated my multiqc and attempted to run both on the concatenated file as well as in a directory where there are multiple reports per sample. This is the error I received using the file above:

Please copy this log and report it at https://github.com/MultiQC/MultiQC/issues                                                                    │
│ Please attach a file that triggers the error. The last file found was: ./multiqc_prep.log                                                          │
│                                                                                                                                                    │
│ Traceback (most recent call last):                                                                                                                 │
│   File "/software/bio/multiqc/1.22.3/.venv/lib/python3.12/site-packages/multiqc/core/exec_modules.py", line 71, in exec_modules                    │
│     these_modules: Union[BaseMultiqcModule, List[BaseMultiqcModule]] = module_initializer()                                                        │
│                                                                        ^^^^^^^^^^^^^^^^^^^^                                                        │
│   File "/software/bio/multiqc/1.22.3/.venv/lib/python3.12/site-packages/multiqc/modules/samtools/samtools.py", line 65, in __init__                │
│     n["coverage"] = self.parse_samtools_coverage()                                                                                                 │
│                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                 │
│   File "/software/bio/multiqc/1.22.3/.venv/lib/python3.12/site-packages/multiqc/modules/samtools/coverage.py", line 18, in parse_samtools_coverage │
│     metrics_by_chrom = parse_single_report(f)                                                                                                      │
│                        ^^^^^^^^^^^^^^^^^^^^^^                                                                                                      │
│   File "/software/bio/multiqc/1.22.3/.venv/lib/python3.12/site-packages/multiqc/modules/samtools/coverage.py", line 235, in parse_single_report    │
│     startpos=int(startpos),                                                                                                                        │
│              ^^^^^^^^^^^^^                                                                                                                         │
│ ValueError: invalid literal for int() with base 10: 'startpos'

ewels · June 28, 2024, 12:07pm

MultiQC will not work on a concatenated file. You must run it on individual outputs.

The pull-request I linked above is not merged yet and not released. Even when it’s available in a release, it doesn’t change this behaviour - it just throws lots of error messages instead of crashing.

What happens when you try this? If you get the same error as before can you try to narrow down which files cause it? I would be highly suspicious that some concatenated / modified files remain.

Phil

Topic		Replies	Views
Fail in searching a tsv in a multiqc plugin Ask for help multiqc	4	78	March 28, 2025
MultiQC does not find picard rnaseqmetrics files Ask for help multiqc	1	133	June 13, 2024
Unicode characters (UTF8-BOM) in TSV export samtools-flagstat-dp.tsv Ask for help multiqc	2	54	April 7, 2025
Quick Start Help Ask for help multiqc	3	43	November 15, 2024
How to process multiple runs of featurecounts in the same directory where each run used a differnt gtf file Ask for help multiqc	7	129	June 15, 2024

Samtools coverage output not found by multiqc

Related topics