How to implement multiqc?

complexgenome · May 21, 2024, 9:11pm

Hi Developers,

I’m interested in implementing multiqc for my pipeline. I looked into the code of publicly available repositories but I am unable to wrap my head around it.

local rna seq:

github.com

nf-core/rnaseq/blob/b89fac32650aacc86fcda9ee77e00612a1d77066/modules/local/multiqc/main.nf

process MULTIQC {
    label 'process_medium'

    conda "bioconda::multiqc=1.19"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/multiqc:1.19--pyhdfd78af_0' :
        'biocontainers/multiqc:1.19--pyhdfd78af_0' }"

    input:
    path multiqc_config
    path multiqc_custom_config
    path software_versions
    path workflow_summary
    path methods_description
    path logo
    path fail_trimming_summary
    path fail_mapping_summary
    path fail_strand_check
    path ('fastqc/raw/*')
    path ('fastqc/trim/*')

This file has been truncated. show original

Here I can’t understand how so many input directories are passed.

I looked at their rnaseq script as well.

github.com

nf-core/rnaseq/blob/b89fac32650aacc86fcda9ee77e00612a1d77066/workflows/rnaseq.nf

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    PRINT PARAMS SUMMARY
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

include { paramsSummaryLog; paramsSummaryMap; fromSamplesheet } from 'plugin/nf-validation'

def logo = NfcoreTemplate.logo(workflow, params.monochrome_logs)
def citation = '\n' + WorkflowMain.citation(workflow) + '\n'
def summary_params = paramsSummaryMap(workflow)

// Print parameter summary log to screen
log.info logo + paramsSummaryLog(workflow) + citation

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    VALIDATE INPUTS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

This file has been truncated. show original

nf-core sarek:

github.com

nf-core/sarek/blob/b5b766d3b4ac89864f2fa07441cdc8844e70a79e/modules/nf-core/multiqc/main.nf

process MULTIQC {
    label 'process_single'

    conda "${moduleDir}/environment.yml"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/multiqc:1.21--pyhdfd78af_0' :
        'biocontainers/multiqc:1.21--pyhdfd78af_0' }"

    input:
    path  multiqc_files, stageAs: "?/*"
    path(multiqc_config)
    path(extra_multiqc_config)
    path(multiqc_logo)

    output:
    path "*multiqc_report.html", emit: report
    path "*_data"              , emit: data
    path "*_plots"             , optional:true, emit: plots
    path "versions.yml"        , emit: versions

This file has been truncated. show original

Here I looked at the script/command. Most of the nf-core pipelines have same code for multiqc.

Is there more extensive code/tutorial for paired samples, or data where multi-omics (WES, RNA) are processed at the same time, that addresses how to use multiqc in nextflow when one particular dataset is unavailable?

mribeirodantas · May 22, 2024, 2:37am

In the fundamentals training material, there’s a section for a simple RNAseq pipeline. Within it, there’s an example using MultiQC for two processes (two paths with files for MultiQC to use for reports). It’s the same reasoning for any number of processes. It’s also the same thing that the nf-core pipelines do. Using the mix channel operator to combine all output channels that have what you want to pass to MultiQC, then running collect to make it all a single element (list of paths) and then passing this to MultiQC. If it’s not clear, follow the section in the training material for the simple RNAseq pipeline.

complexgenome · May 22, 2024, 3:20pm

Hi @mribeirodantas
I’ve looked at the code already in the links I put in the post earlier.

I don’t know how mix would be helpful here.

I’ve three channels as:

  ch_fastp_normal = Channel.of(
    [ [batch:'SEMA-MM-001', timepoint:'MM-0486-T-01', tissue:'normal', sequencing_type:'wes'], 
    tuple( file('normal_wes_fastq1.gz'),file('normal_wes_fastq2.gz')),file('1_normal.html'),file('1_normal.json') ],
    [ [batch:'SEMA-MM-001', timepoint:'MM-0487-T-01', tissue:'normal', sequencing_type:'wes'], 
    tuple( file('normal_wes_fastq1.gz'),file('normal_wes_fastq2.gz')),file('1_normal.html'),file('1_normal.json') ]
)

ch_fastp_tumor = Channel.of(
    [ [batch:'SEMA-MM-001', timepoint:'MM-0486-T-01', tissue:'tumor', sequencing_type:'wes'], 
    tuple( file('tumor_wes_fastq1.gz'),file('tumor_wes_fastq2.gz')),file('1_tumor.html'),file('1_tumor.json') ],
    [ [batch:'SEMA-MM-001', timepoint:'MM-0487-T-01', tissue:'tumor', sequencing_type:'wes'], 
    tuple( file('tumor_wes_fastq1.gz'),file('tumor_wes_fastq2.gz')),file('1_tumor.html'),file('1_tumor.json') ]
)

ch_rna_fastp_tumor = Channel.of(
    [ [batch:'SEMA-MM-001', timepoint:'MM-0486-T-01', tissue:'rna', sequencing_type:'wes'], 
    tuple( file('rna_fastq1.gz'),file('rna_fastq2.gz')),file('1_rna.html'),file('1_rna.json') ]
)

I mix them:

ch_fastp_normal.mix(ch_fastp_tumor).mix(ch_rna_fastp_tumor).groupTuple().collect().set{grouped_mixed}
grouped_mixed.view()

I get output as:

[[‘batch’:‘SEMA-MM-001’, ‘timepoint’:‘MM-0486-T-01’, ‘tissue’:‘normal’, ‘sequencing_type’:‘wes’],
[[/mnt/data1/users//nextflow/learn_nextflow/normal_wes_fastq1.gz,
/mnt/data1/users//nextflow/learn_nextflow/normal_wes_fastq2.gz]],
[/mnt/data1/users//nextflow/learn_nextflow/1_normal.html],
[/mnt/data1/users//nextflow/learn_nextflow/1_normal.json],
[‘batch’:‘SEMA-MM-001’, ‘timepoint’:‘MM-0486-T-01’, ‘tissue’:‘tumor’, ‘sequencing_type’:‘wes’],
[[/mnt/data1/users//nextflow/learn_nextflow/tumor_wes_fastq1.gz,
/mnt/data1/users//nextflow/learn_nextflow/tumor_wes_fastq2.gz]],
[/mnt/data1/users//nextflow/learn_nextflow/1_tumor.html],
[/mnt/data1/users//nextflow/learn_nextflow/1_tumor.json],
[‘batch’:‘SEMA-MM-001’, ‘timepoint’:‘MM-0487-T-01’, ‘tissue’:‘normal’, ‘sequencing_type’:‘wes’],
[[/mnt/data1/users//nextflow/learn_nextflow/normal_wes_fastq1.gz,
/mnt/data1/users//nextflow/learn_nextflow/normal_wes_fastq2.gz]],
[/mnt/data1/users//nextflow/learn_nextflow/1_normal.html],
[/mnt/data1/users//nextflow/learn_nextflow/1_normal.json],
[‘batch’:‘SEMA-MM-001’, ‘timepoint’:‘MM-0486-T-01’, ‘tissue’:‘rna’, ‘sequencing_type’:‘wes’],
[[/mnt/data1/users//nextflow/learn_nextflow/rna_fastq1.gz,
/mnt/data1/users//nextflow/learn_nextflow/rna_fastq2.gz]],
[/mnt/data1/users//nextflow/learn_nextflow/1_rna.html],
[/mnt/data1/users//nextflow/learn_nextflow/1_rna.json],
[‘batch’:‘SEMA-MM-001’, ‘timepoint’:‘MM-0487-T-01’, ‘tissue’:‘tumor’, ‘sequencing_type’:‘wes’],
[[/mnt/data1/users//nextflow/learn_nextflow/tumor_wes_fastq1.gz,
/mnt/data1/users//nextflow/learn_nextflow/tumor_wes_fastq2.gz]],
[/mnt/data1/users//nextflow/learn_nextflow/1_tumor.html],
[/mnt/data1/users//nextflow/learn_nextflow/1_tumor.json]]

What I’d have desired is, per sample RNA-WES_tumor-WES_normal or where RNA are missing WES_tumor-WES_normal

Second, I do not know how to pass this mixed channel, as the number of inputs could be variable.

mribeirodantas · May 22, 2024, 4:15pm

There is already an nf-core module for the MultiQC tool. Can’t you use that to use MultiQC in your pipeline?

Also, you’re aware MultiQC expects tools outputs, right? What you’re sharing seems to be channels with input/custom files, which is not what MultiQC expects.

Topic		Replies	Views
Intergrate MultiQC more natively into the Nextflow backbone Ask for help nextflow , multiqc	2	404	November 5, 2023
How to process multiqc per sample instead of generating single report all togethe for the given list of sample in a workflow? Ask for help nextflow , multiqc	1	411	October 20, 2023
Custom Module for MultiQC with Nextflow Ask for help multiqc	6	205	May 22, 2024
Adding custom data to multiqc Ask for help multiqc	7	117	March 28, 2025
How to do full outer join? run multiqc Ask for help	2	97	June 5, 2024

How to implement multiqc?

Related topics