Intergrate MultiQC more natively into the Nextflow backbone

moving the conversation over from twitter

I’ve been contemplating the potential benefits of integrating MultiQC more natively into Nextflow, especially considering its widespread use across various pipelines and it has now joined the broader Seqera family. There could be a few different levels of integration to consider:

  1. Simplified Wrapper Approach:
    One approach could involve just having Nextflow act as a simplified wrapper for MultiQC. This means that the user would only need to specify the output of a process as follows:
    output multiqc(<tool_name>)
    For example:
    output multiqc(fastp)
    Nextflow would then take care of ensuring that the relevant files required by MultiQC are added to the output channel.

  2. Dedicated MultiQC Channel:
    Another option is to create a dedicated channel for MultiQC reports. Nextflow could handle the mixing of reports after each process without requiring users to manipulate the channels themselves.

  3. Complete Integration:
    The most extreme level of integration would involve making MultiQC reports an integral part of the Nextflow execution, trace, and timeline reports. In this scenario, users would only need to add a common tool identifier to their processes for Nextflow to recognize and link to MultiQC. However, one potential challenge here is that Nextflow would need to call a MultiQC process at the end to generate the report. The ability to do this could vary depending on users’ setups, docker vs conda, local vs HPC vs cloud etc. Tools like Tower and Wave might help in managing some consistency, and provide appropriate configurations for the diversity of user environments.

Its only a random thought, I might be over thinking it, but thought I would put in out there.

Cheers,

3 Likes

Option 1 sounds nice but I suspect that in practice it could often be difficult. It would require Nextflow to know MultiQC search patterns, which change over time with different versions of MultiQC - how would this be kept in sync with the (user-defined) version of MultiQC to be used at a later stage? Also it could be a fairly significant bit of code, as some search patterns require looking through file contents to find strings and so on. Not impossible, but I’m not completely sure that it’s worth the complexity for something that’s not super difficult to do currently (specify a filename for the channel).

Option 2 sounds like the most practical of the three to me, and quite similar to the proposed “Channel topics” syntax for Nextflow:

Yes, the channel topics syntax would be the perfect solution to option 2. I would definitely support that. Thanks for linking me to the experimental feature.