I can’t share my code (yet), but I just wanted to point out a method that works for generating LLM summaries of all tables/plots in a multiqc report and adding the summary to the top of the report. Any feedback would be appreciated. The workflow:
- Load the input data via
multiqc.parse_logs
- Create the html report via
multiqc.write_report
- Parse relevant tables from the html report via BeautifulSoup (
soup.find('table', id=table_id)
) - Convert each table to markdown
- Load plots and PNGs via
multiqc.list_plots
+ reading in as base64 strings - Create a contents object of the markdown tables and images
- Create a system prompt tailored to the input type (e.g., fastqc or rnaseq)
- Send the contents and system prompt LLM (gpt-4o or gpt-4o-mini)
- Parse the response as markdown via the
markdown
package - Convert the response to html (a new div via Beautifulsoup)
- Insert the new div after the
analysis_dirs_wrapper
div
The workflow is working quite well. For instance, the LLM (if prompted correctly) can spot errors in the data (e.g., demultiplexing errors or the incorrect reference used when mapping reads) and thus help users quickly identify problems and possible solutions.