LLM summaries for multiqc reports

I can’t share my code (yet), but I just wanted to point out a method that works for generating LLM summaries of all tables/plots in a multiqc report and adding the summary to the top of the report. Any feedback would be appreciated. The workflow:

  • Load the input data via multiqc.parse_logs
  • Create the html report via multiqc.write_report
  • Parse relevant tables from the html report via BeautifulSoup (soup.find('table', id=table_id))
  • Convert each table to markdown
  • Load plots and PNGs via multiqc.list_plots + reading in as base64 strings
  • Create a contents object of the markdown tables and images
  • Create a system prompt tailored to the input type (e.g., fastqc or rnaseq)
  • Send the contents and system prompt LLM (gpt-4o or gpt-4o-mini)
  • Parse the response as markdown via the markdown package
  • Convert the response to html (a new div via Beautifulsoup)
  • Insert the new div after the analysis_dirs_wrapper div

The workflow is working quite well. For instance, the LLM (if prompted correctly) can spot errors in the data (e.g., demultiplexing errors or the incorrect reference used when mapping reads) and thus help users quickly identify problems and possible solutions.

3 Likes

@nick-youngblut sounds awesome! Would love to see some code when possible :star_struck:

Any chance you could drop an example report in the mean time to see the end result?

I can’t share all of the data, but here’s a summary of bcl-convert output, in which one sample did not demux (wrong indices provided).

1 Like

I should note that the code is pretty simple: it’s just a python package with 12 functions and a few 100 lines of code.

Hi @nick-youngblut,

Just wanted to circle back here - your original post was great timing as we were starting to think about this functionality ourselves for MultiQC. This has now been written and released in MultiQC v1.27 with integrated AI summaries. You can see an overview in today’s blog post here:

I hope this is helpful! I’d love to hear any feedback, especially given that you’ve been playing around with similar!

Phil

That’s great to hear! Congrats on implementing the feature! I saw your demo at the last Nextflow conference. AI summaries looked great! I’m excited to try them out.

I’ve started to look into using LLMs to automatically handle failed jobs (e.g., retry with more memory, time, or other resources… or fail) by leveraging task.previousException, task.previousTrace, etc. Any thoughts on this approach?

1 Like

A bit of a problem when using: quay.io/biocontainers/multiqc:1.27--pyhdfd78af_0

write_results | AI summary requested through `config.ai_summary`, but required dependencies are not installed. Install them with `pip install "multiqc[openai]"`

It appears that there is no multiqc-openai biocontainer or openai tag for the multiqc biocontainer.

1 Like

Thanks for flagging - we’re about to release a patch release v1.27.1 that simplifies the dependencies and gets rid of this, so stuff should be working via bioconda again shortly…

1 Like