Unicode characters (UTF8-BOM) in TSV export samtools-flagstat-dp.tsv

lindenb · November 28, 2024, 12:41pm

I’ve been given a one-page html multiqc (v1.23) report and I exported the samtools-flags table as TSV. It looks like the output file is encoded as an unicode UTF-8-BOM file.

$ head -n1 samtools-flagstat-dp.tsv | cut -f1 |  file -
/dev/stdin: Unicode text, UTF-8 (with BOM) text

$ head -n1 samtools-flagstat-dp.tsv | cut -f1 |  hexdump -C
00000000  ef bb bf 53 61 6d 70 6c  65 0a                    |...Sample.|
0000000a

is it a bug or is it a feature ?

EDIT: furthermore, it looks like the export function adds an extra trailing TAB in the data and R doesn’t like this

Asking because when I try to load the table in R, R ‘shifts’ the colums. I tried:

read.csv("samtools-flagstat-dp.tsv", header = TRUE, sep="\t", fileEncoding = "UTF-8-BOM"

Thanks.
P

ewels · November 29, 2024, 11:13am

I exported the samtools-flags table as TSV.

Can you elaborate on this please? Exported from the report toolbox? Or copied to the clipboard? Do you have an example report?

It looks like the output file is encoded as an unicode UTF-8-BOM file.

The code exports from the toolbox as UTF-8, so that part is expected. Is that a problem?

github.com

MultiQC/MultiQC/blob/057de1245cb04993b476cb4d5250a08934920094/multiqc/templates/default/assets/js/toolbox.js#L452


      
              // Not many plots to export, just trigger a download for each
              saveAs(blob, fname);
            } else {
              // Lots of plots - add to a zip file for download
              zip.file(fname, blob);
            }
          } else if (format === "tsv" || format === "csv") {
            let plot = mqc_plots[target];
            if (plot !== undefined) {
              let text = plot.exportData(format);
              const blob = new Blob([text], { type: "text/plain;charset=utf-8" });
              if (checked_plots.length <= zip_threshold) {
                // Not many plots to export, just trigger a download for each
                saveAs(blob, fname);
              } else {
                // Lots of plots - generate a zip file for download.
                // Add to a zip archive
                zip.file(fname, blob);
              }
            } else {
              skipped_plots += 1;

It looks like the export function adds an extra trailing TAB in the data

That’s unexpected, but I wasn’t able to track down where it might be coming from in a quick search… An example report to validate would help.

ewels · April 7, 2025, 6:00am

This topic was automatically closed after 9 days. New replies are no longer allowed.

Topic		Replies	Views
Samtools coverage output not found by multiqc Ask for help multiqc	9	179	June 28, 2024
Inconsistent number of columns found Ask for help multiqc	4	65	February 4, 2025
.bai files not found in follow-up process Ask for help nextflow	2	27	June 4, 2025
Fail in searching a tsv in a multiqc plugin Ask for help multiqc	4	78	March 28, 2025
Pandoc fails to convert html to pdf in the presence of a UNIcode ≥ sign Ask for help multiqc	4	99	October 4, 2024

Unicode characters (UTF8-BOM) in TSV export samtools-flagstat-dp.tsv

Related topics