Pandoc fails to convert html to pdf in the presence of a UNIcode ≥ sign

I’m trying to convert my multiqc report to pdf using pandoc as described here. My report has picard WGS metrics and so in the general stats a sign appears in the column header:
image

I tried using the --pdf as described in the link above, but while the argument was accepted, the program makes no mention of attempting to convert to pdf, and indeed a pdf is not produced.

I tried calling pandoc on the html output using the following command:

pandoc multiqc_report.html -t pdf -o multiqc_report.pdf  --standalone --pdf-engine=xelatex 

but got an error:

[WARNING] Missing character: There is no ≥ (U+2265) (U+2265) in font [lmroman10-regular]:mapping=t
[WARNING] Missing character: There is no ≥ (U+2265) (U+2265) in font [lmroman10-regular]:mapping=t

any pointers or suggestions for how to get a pdf would be much appreciated.

(pre-empting @ewels question: my client prefers having a single page PDF that doesn’t depend on the screen size etc.)

MultiQC has a --pdf option. Have you tried that?

Thanks @mahesh.binzerpanchal. Yes, I did. it didn’t seem to have any effect on the logs or the output.

A guess at a quick fix is to replace the ASCII character with it’s HTML encoded equivalent in the HTML:

sed -i 's/≥/\>/g' multiqc_report.html

More long term, the proper fix would be to HTML-encoding table headers and probably table cell contents when rendering the report. However, I’m hesitant to do this as it could break stuff in other ways (eg. if anyone is using emoji in table headers??). Also, PDF report generation is very rare and we’ll likely deprecate its official support completely in the near-ish future.

Let us know if the above works!

Phil

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.