Hello! I have been running samtools coverage on 25 different cram files. However, neither stdout or .txt files are being picked up by MultiQC. I have tried to figure out if this is an error on my part with file extensions that MultiQC prefers but am not able to find anything online.
You may find the MultiQC samtools documentation helpful:
You can see in the file search patterns at the bottom of that page that the samtools coverage module doesn’t look at filenames (or extensions) but rather looks for this string within the first 10 lines:
There’s also general troubleshooting documentation available for when MultiQC can’t find any outputs from a tool:
If you attach a log file or two we can try to look into the problem our side. If the file extensions aren’t allowed in the forum, just create a zip archive.
Thank you for your responses. Please find the .log file that is outputted when I run samtools coverge. It had put all information for all samples into one .log file. However I am able to split them per sample if I need. It looks like the column headings are for example startpos instead of tstartpos like you highlighted above. Is this a problem on my end?
It’s \t then startpos - the \t is an escape character for a <tab>. So don’t worry, the pattern is the same as the header in your file.
Was it samtools itself that did this, or you / the way that you ran samtools? I think you’ll need to split them into separate files. If nothing else, that will be required to give each sample a different sample identifier. Otherwise there’s no way to know what coverage data applies to which sample.
When I try running MultiQC with your example .log file it finds it fine, but then crashes:
╭───────────────────────────────────── Oops! The 'samtools' MultiQC module broke... ─────────────────────────────────────╮
│ Please copy this log and report it at https://github.com/MultiQC/MultiQC/issues │
│ Please attach a file that triggers the error. The last file found was: ./multiqc_prep.log │
│ │
│ Traceback (most recent call last): │
│ File "/Users/ewels/GitHub/MultiQC/MultiQC/multiqc/core/exec_modules.py", line 71, in exec_modules │
│ these_modules: Union[BaseMultiqcModule, List[BaseMultiqcModule]] = module_initializer() │
│ ^^^^^^^^^^^^^^^^^^^^ │
│ File "/Users/ewels/GitHub/MultiQC/MultiQC/multiqc/modules/samtools/samtools.py", line 65, in __init__ │
│ n["coverage"] = self.parse_samtools_coverage() │
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ │
│ File "/Users/ewels/GitHub/MultiQC/MultiQC/multiqc/modules/samtools/coverage.py", line 18, in parse_samtools_coverage │
│ metrics_by_chrom = parse_single_report(f) │
│ ^^^^^^^^^^^^^^^^^^^^^^ │
│ File "/Users/ewels/GitHub/MultiQC/MultiQC/multiqc/modules/samtools/coverage.py", line 236, in parse_single_report │
│ startpos=int(startpos), │
│ ^^^^^^^^^^^^^ │
│ ValueError: invalid literal for int() with base 10: 'startpos' │
│ │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
This is because the repeated header lines confuse the module parsing. I will put in a pull-request to make the module tolerate this kind of thing without a hard crash.
So in my testing, if you split the file up into multiple without concatenation, then I think it should work.
Could you paste the log output from MultiQC that you get when running it? What version of MultiQC are you using?
I’m using multiqc/1.17. And I don’t get a log output from MultiQC when I run it. You should be able to see in the screenshot attached - this is the only output I get from MultiQC.
I have updated my multiqc and attempted to run both on the concatenated file as well as in a directory where there are multiple reports per sample. This is the error I received using the file above:
Please copy this log and report it at https://github.com/MultiQC/MultiQC/issues │
│ Please attach a file that triggers the error. The last file found was: ./multiqc_prep.log │
│ │
│ Traceback (most recent call last): │
│ File "/software/bio/multiqc/1.22.3/.venv/lib/python3.12/site-packages/multiqc/core/exec_modules.py", line 71, in exec_modules │
│ these_modules: Union[BaseMultiqcModule, List[BaseMultiqcModule]] = module_initializer() │
│ ^^^^^^^^^^^^^^^^^^^^ │
│ File "/software/bio/multiqc/1.22.3/.venv/lib/python3.12/site-packages/multiqc/modules/samtools/samtools.py", line 65, in __init__ │
│ n["coverage"] = self.parse_samtools_coverage() │
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ │
│ File "/software/bio/multiqc/1.22.3/.venv/lib/python3.12/site-packages/multiqc/modules/samtools/coverage.py", line 18, in parse_samtools_coverage │
│ metrics_by_chrom = parse_single_report(f) │
│ ^^^^^^^^^^^^^^^^^^^^^^ │
│ File "/software/bio/multiqc/1.22.3/.venv/lib/python3.12/site-packages/multiqc/modules/samtools/coverage.py", line 235, in parse_single_report │
│ startpos=int(startpos), │
│ ^^^^^^^^^^^^^ │
│ ValueError: invalid literal for int() with base 10: 'startpos'
MultiQC will not work on a concatenated file. You must run it on individual outputs.
The pull-request I linked above is not merged yet and not released. Even when it’s available in a release, it doesn’t change this behaviour - it just throws lots of error messages instead of crashing.
What happens when you try this? If you get the same error as before can you try to narrow down which files cause it? I would be highly suspicious that some concatenated / modified files remain.