Unicode characters (UTF8-BOM) in TSV export samtools-flagstat-dp.tsv

I’ve been given a one-page html multiqc (v1.23) report and I exported the samtools-flags table as TSV. It looks like the output file is encoded as an unicode UTF-8-BOM file.

$ head -n1 samtools-flagstat-dp.tsv | cut -f1 |  file -
/dev/stdin: Unicode text, UTF-8 (with BOM) text

$ head -n1 samtools-flagstat-dp.tsv | cut -f1 |  hexdump -C
00000000  ef bb bf 53 61 6d 70 6c  65 0a                    |...Sample.|
0000000a

is it a bug or is it a feature :slight_smile: ?

EDIT: furthermore, it looks like the export function adds an extra trailing TAB in the data and R doesn’t like this

Asking because when I try to load the table in R, R ‘shifts’ the colums. I tried:

read.csv("samtools-flagstat-dp.tsv", header = TRUE, sep="\t", fileEncoding = "UTF-8-BOM"

Thanks.
P

I exported the samtools-flags table as TSV.

Can you elaborate on this please? Exported from the report toolbox? Or copied to the clipboard? Do you have an example report?

It looks like the output file is encoded as an unicode UTF-8-BOM file.

The code exports from the toolbox as UTF-8, so that part is expected. Is that a problem?

It looks like the export function adds an extra trailing TAB in the data

That’s unexpected, but I wasn’t able to track down where it might be coming from in a quick search… An example report to validate would help.