LLM summaries for multiqc reports

I can’t share my code (yet), but I just wanted to point out a method that works for generating LLM summaries of all tables/plots in a multiqc report and adding the summary to the top of the report. Any feedback would be appreciated. The workflow:

  • Load the input data via multiqc.parse_logs
  • Create the html report via multiqc.write_report
  • Parse relevant tables from the html report via BeautifulSoup (soup.find('table', id=table_id))
  • Convert each table to markdown
  • Load plots and PNGs via multiqc.list_plots + reading in as base64 strings
  • Create a contents object of the markdown tables and images
  • Create a system prompt tailored to the input type (e.g., fastqc or rnaseq)
  • Send the contents and system prompt LLM (gpt-4o or gpt-4o-mini)
  • Parse the response as markdown via the markdown package
  • Convert the response to html (a new div via Beautifulsoup)
  • Insert the new div after the analysis_dirs_wrapper div

The workflow is working quite well. For instance, the LLM (if prompted correctly) can spot errors in the data (e.g., demultiplexing errors or the incorrect reference used when mapping reads) and thus help users quickly identify problems and possible solutions.

3 Likes

@nick-youngblut sounds awesome! Would love to see some code when possible :star_struck:

Any chance you could drop an example report in the mean time to see the end result?

I can’t share all of the data, but here’s a summary of bcl-convert output, in which one sample did not demux (wrong indices provided).

1 Like

I should note that the code is pretty simple: it’s just a python package with 12 functions and a few 100 lines of code.