LLM summaries for multiqc reports

nick-youngblut · August 3, 2024, 7:05pm

I can’t share my code (yet), but I just wanted to point out a method that works for generating LLM summaries of all tables/plots in a multiqc report and adding the summary to the top of the report. Any feedback would be appreciated. The workflow:

Load the input data via multiqc.parse_logs
Create the html report via multiqc.write_report
Parse relevant tables from the html report via BeautifulSoup (soup.find('table', id=table_id))
Convert each table to markdown
Load plots and PNGs via multiqc.list_plots + reading in as base64 strings
Create a contents object of the markdown tables and images
Create a system prompt tailored to the input type (e.g., fastqc or rnaseq)
Send the contents and system prompt LLM (gpt-4o or gpt-4o-mini)
Parse the response as markdown via the markdown package
Convert the response to html (a new div via Beautifulsoup)
Insert the new div after the analysis_dirs_wrapper div

The workflow is working quite well. For instance, the LLM (if prompted correctly) can spot errors in the data (e.g., demultiplexing errors or the incorrect reference used when mapping reads) and thus help users quickly identify problems and possible solutions.

ewels · August 6, 2024, 7:06pm

@nick-youngblut sounds awesome! Would love to see some code when possible

Any chance you could drop an example report in the mean time to see the end result?

nick-youngblut · August 7, 2024, 3:22pm

I can’t share all of the data, but here’s a summary of bcl-convert output, in which one sample did not demux (wrong indices provided).

nick-youngblut · August 7, 2024, 3:24pm

I should note that the code is pretty simple: it’s just a python package with 12 functions and a few 100 lines of code.

ewels · January 23, 2025, 4:13pm

Hi @nick-youngblut,

Just wanted to circle back here - your original post was great timing as we were starting to think about this functionality ourselves for MultiQC. This has now been written and released in MultiQC v1.27 with integrated AI summaries. You can see an overview in today’s blog post here:

I hope this is helpful! I’d love to hear any feedback, especially given that you’ve been playing around with similar!

Phil

nick-youngblut · January 24, 2025, 4:07pm

That’s great to hear! Congrats on implementing the feature! I saw your demo at the last Nextflow conference. AI summaries looked great! I’m excited to try them out.

I’ve started to look into using LLMs to automatically handle failed jobs (e.g., retry with more memory, time, or other resources… or fail) by leveraging task.previousException, task.previousTrace, etc. Any thoughts on this approach?

nick-youngblut · January 24, 2025, 5:29pm

A bit of a problem when using: quay.io/biocontainers/multiqc:1.27--pyhdfd78af_0

write_results | AI summary requested through `config.ai_summary`, but required dependencies are not installed. Install them with `pip install "multiqc[openai]"`

It appears that there is no multiqc-openai biocontainer or openai tag for the multiqc biocontainer.

ewels · February 6, 2025, 8:43pm

Thanks for flagging - we’re about to release a patch release v1.27.1 that simplifies the dependencies and gets rid of this, so stuff should be working via bioconda again shortly…

Topic		Replies	Views
Combining several MultiQC reports into one Ask for help multiqc	4	530	February 15, 2024
Better practices using MultiQC in Quarto Notebooks for sets of images Ask for help multiqc	8	129	March 28, 2025
Retrieve information from .command.log and add as table Ask for help multiqc	2	256	December 19, 2023
Custom content Ask for help multiqc	6	166	June 24, 2024
Using MultiQC from Python to add to an existing report Ask for help multiqc	1	17	July 22, 2025

LLM summaries for multiqc reports

Related topics