How to implement multiqc?

Hi Developers,

I’m interested in implementing multiqc for my pipeline. I looked into the code of publicly available repositories but I am unable to wrap my head around it.

local rna seq:

  • Here I can’t understand how so many input directories are passed.

I looked at their rnaseq script as well.

nf-core sarek:

  • Here I looked at the script/command. Most of the nf-core pipelines have same code for multiqc.

Is there more extensive code/tutorial for paired samples, or data where multi-omics (WES, RNA) are processed at the same time, that addresses how to use multiqc in nextflow when one particular dataset is unavailable?

In the fundamentals training material, there’s a section for a simple RNAseq pipeline. Within it, there’s an example using MultiQC for two processes (two paths with files for MultiQC to use for reports). It’s the same reasoning for any number of processes. It’s also the same thing that the nf-core pipelines do. Using the mix channel operator to combine all output channels that have what you want to pass to MultiQC, then running collect to make it all a single element (list of paths) and then passing this to MultiQC. If it’s not clear, follow the section in the training material for the simple RNAseq pipeline.

Hi @mribeirodantas
I’ve looked at the code already in the links I put in the post earlier.

I don’t know how mix would be helpful here.

I’ve three channels as:

  ch_fastp_normal = Channel.of(
    [ [batch:'SEMA-MM-001', timepoint:'MM-0486-T-01', tissue:'normal', sequencing_type:'wes'], 
    tuple( file('normal_wes_fastq1.gz'),file('normal_wes_fastq2.gz')),file('1_normal.html'),file('1_normal.json') ],
    [ [batch:'SEMA-MM-001', timepoint:'MM-0487-T-01', tissue:'normal', sequencing_type:'wes'], 
    tuple( file('normal_wes_fastq1.gz'),file('normal_wes_fastq2.gz')),file('1_normal.html'),file('1_normal.json') ]
)

ch_fastp_tumor = Channel.of(
    [ [batch:'SEMA-MM-001', timepoint:'MM-0486-T-01', tissue:'tumor', sequencing_type:'wes'], 
    tuple( file('tumor_wes_fastq1.gz'),file('tumor_wes_fastq2.gz')),file('1_tumor.html'),file('1_tumor.json') ],
    [ [batch:'SEMA-MM-001', timepoint:'MM-0487-T-01', tissue:'tumor', sequencing_type:'wes'], 
    tuple( file('tumor_wes_fastq1.gz'),file('tumor_wes_fastq2.gz')),file('1_tumor.html'),file('1_tumor.json') ]
)

ch_rna_fastp_tumor = Channel.of(
    [ [batch:'SEMA-MM-001', timepoint:'MM-0486-T-01', tissue:'rna', sequencing_type:'wes'], 
    tuple( file('rna_fastq1.gz'),file('rna_fastq2.gz')),file('1_rna.html'),file('1_rna.json') ]
)

I mix them:

ch_fastp_normal.mix(ch_fastp_tumor).mix(ch_rna_fastp_tumor).groupTuple().collect().set{grouped_mixed}
grouped_mixed.view()

I get output as:

[[‘batch’:‘SEMA-MM-001’, ‘timepoint’:‘MM-0486-T-01’, ‘tissue’:‘normal’, ‘sequencing_type’:‘wes’],
[[/mnt/data1/users//nextflow/learn_nextflow/normal_wes_fastq1.gz,
/mnt/data1/users//nextflow/learn_nextflow/normal_wes_fastq2.gz]],
[/mnt/data1/users//nextflow/learn_nextflow/1_normal.html],
[/mnt/data1/users//nextflow/learn_nextflow/1_normal.json],
[‘batch’:‘SEMA-MM-001’, ‘timepoint’:‘MM-0486-T-01’, ‘tissue’:‘tumor’, ‘sequencing_type’:‘wes’],
[[/mnt/data1/users//nextflow/learn_nextflow/tumor_wes_fastq1.gz,
/mnt/data1/users//nextflow/learn_nextflow/tumor_wes_fastq2.gz]],
[/mnt/data1/users//nextflow/learn_nextflow/1_tumor.html],
[/mnt/data1/users//nextflow/learn_nextflow/1_tumor.json],
[‘batch’:‘SEMA-MM-001’, ‘timepoint’:‘MM-0487-T-01’, ‘tissue’:‘normal’, ‘sequencing_type’:‘wes’],
[[/mnt/data1/users//nextflow/learn_nextflow/normal_wes_fastq1.gz,
/mnt/data1/users//nextflow/learn_nextflow/normal_wes_fastq2.gz]],
[/mnt/data1/users//nextflow/learn_nextflow/1_normal.html],
[/mnt/data1/users//nextflow/learn_nextflow/1_normal.json],
[‘batch’:‘SEMA-MM-001’, ‘timepoint’:‘MM-0486-T-01’, ‘tissue’:‘rna’, ‘sequencing_type’:‘wes’],
[[/mnt/data1/users//nextflow/learn_nextflow/rna_fastq1.gz,
/mnt/data1/users//nextflow/learn_nextflow/rna_fastq2.gz]],
[/mnt/data1/users//nextflow/learn_nextflow/1_rna.html],
[/mnt/data1/users//nextflow/learn_nextflow/1_rna.json],
[‘batch’:‘SEMA-MM-001’, ‘timepoint’:‘MM-0487-T-01’, ‘tissue’:‘tumor’, ‘sequencing_type’:‘wes’],
[[/mnt/data1/users//nextflow/learn_nextflow/tumor_wes_fastq1.gz,
/mnt/data1/users//nextflow/learn_nextflow/tumor_wes_fastq2.gz]],
[/mnt/data1/users//nextflow/learn_nextflow/1_tumor.html],
[/mnt/data1/users//nextflow/learn_nextflow/1_tumor.json]]

What I’d have desired is, per sample RNA-WES_tumor-WES_normal or where RNA are missing WES_tumor-WES_normal

Second, I do not know how to pass this mixed channel, as the number of inputs could be variable.

There is already an nf-core module for the MultiQC tool. Can’t you use that to use MultiQC in your pipeline?

Also, you’re aware MultiQC expects tools outputs, right? What you’re sharing seems to be channels with input/custom files, which is not what MultiQC expects.