How do I make categorical values from integers?

Hi, I have a custom table with MLST results. MLST results include sequence types which appear numerical but are actually strings. Can put in some custom yaml to make it not highlight the values in the report? Thank you for any insights!

File    mlst_scheme     7-gene_ST       locus1  locus2  locus3  locus4  locus5  locus6  locus7
SRR28368091.shovill.skesa.fasta ecoli   21      adk(16) fumC(4) gyrB(12)        icd(16) mdh(9)       purA(7) recA(7)
SRR28368090.shovill.skesa.fasta ecoli   993     adk(10) fumC(7) gyrB(154)       icd(8)  mdh(12)      purA(8) recA(2)

image

Hi @lskatz!

Yes you can. You’ll need a table config for that header, then set the scale to False.

Assuming that you’re doing this with Custom content, you can see the docs for custom content config and table config. You can also find an example of a custom content table with an associated config file here.

I hope that helps!

Phil

Thank you for your help!

1 Like

Is it possible to format with s instead of f? I want to treat it like a string. I tried this and it worked but it is still introducing a separator at the thousands digit.

#headers:
#  7-gene_ST:
#    scale: false

This reverts it back to default formatting with a decimal digit

#headers:
#  7-gene_ST:
#    scale: false
#    format: "{:,s}"

Hi @lskatz! “{:,s}” is not a correct Python format string. MultiQC internally just calls header["format"].format(value), and if this fails, silently falls back to default formatting (We probably should validate and print lining errors for incorrect formats - created an issue for that Validate incorrect format strings in table headers · Issue #2524 · MultiQC/MultiQC · GitHub).

To answer the original question. If I understand Phil’s answer correctly, simply adding scale: False won’t do anything, and adding format: "{:s}" would result in an error since MultiQC “guessed” the column to be numeric and casted it to float by the time the “format” was applied.

One way to work around this is to specify as format "{:.0f}", which would tell it to render it as a number with zero decimal numbers. Worked for me.

Make sure to unset only_defined_headers, otherwise it will print only the columns listed in headers:

#pconfig:
#  only_defined_headers: false
#headers:
#  7-gene_ST:
#    scale: false
#    format: "{:.0f}"