I can’t share my code (yet), but I just wanted to point out a method that works for generating LLM summaries of all tables/plots in a multiqc report and adding the summary to the top of the report. Any feedback would be appreciated. The workflow:
Load the input data via multiqc.parse_logs
Create the html report via multiqc.write_report
Parse relevant tables from the html report via BeautifulSoup (soup.find('table', id=table_id))
Convert each table to markdown
Load plots and PNGs via multiqc.list_plots + reading in as base64 strings
Create a contents object of the markdown tables and images
Create a system prompt tailored to the input type (e.g., fastqc or rnaseq)
Send the contents and system prompt LLM (gpt-4o or gpt-4o-mini)
Parse the response as markdown via the markdown package
Convert the response to html (a new div via Beautifulsoup)
Insert the new div after the analysis_dirs_wrapper div
The workflow is working quite well. For instance, the LLM (if prompted correctly) can spot errors in the data (e.g., demultiplexing errors or the incorrect reference used when mapping reads) and thus help users quickly identify problems and possible solutions.
Just wanted to circle back here - your original post was great timing as we were starting to think about this functionality ourselves for MultiQC. This has now been written and released in MultiQC v1.27 with integrated AI summaries. You can see an overview in today’s blog post here:
I hope this is helpful! I’d love to hear any feedback, especially given that you’ve been playing around with similar!
That’s great to hear! Congrats on implementing the feature! I saw your demo at the last Nextflow conference. AI summaries looked great! I’m excited to try them out.
I’ve started to look into using LLMs to automatically handle failed jobs (e.g., retry with more memory, time, or other resources… or fail) by leveraging task.previousException, task.previousTrace, etc. Any thoughts on this approach?
A bit of a problem when using: quay.io/biocontainers/multiqc:1.27--pyhdfd78af_0
write_results | AI summary requested through `config.ai_summary`, but required dependencies are not installed. Install them with `pip install "multiqc[openai]"`
It appears that there is no multiqc-openai biocontainer or openai tag for the multiqc biocontainer.
Thanks for flagging - we’re about to release a patch release v1.27.1 that simplifies the dependencies and gets rid of this, so stuff should be working via bioconda again shortly…