MultiQC fastp module issue with input filenames

ksankit4444 · October 5, 2025, 4:19am

Hello Seqera community,

I’ve been working on a Nextflow RNA-Seq pipeline on AWS Batch and encountered an issue with MultiQC’s fastp module that I wanted to discuss before proposing any changes.

The Problem:
When processing multiple samples in parallel where input fastq files have identical names (e.g., forward.fastq/reverse.fastq across samples), MultiQC only captures one fastp report instead of aggregating all samples.

I looked into the MultiQC source and found that the fastp module uses input filenames (‘-i’, ‘-I’, ‘–in1’, ‘–in2’) as detection keys, if configuration-yaml file is not provided. When these names are identical across samples, each subsequent report overwrites the previous one in the data dictionary.

Proposed Solution:
I’m considering modifying the detection logic to use output filenames ‘-o’, ‘-O’, ‘–out1’, ‘–out2’) instead, since output files make way more sense for sample naming than input files. Inputs usually come with generic names from sequencing facilities which don’t tell you much. Outputs, on the other hand, are what users actually name to reflect the real sample and the analysis they care about. They’re unique to each run and avoid collisions. Since outputs are what move forward into downstream analysis and reports, it’s logical to base sample names on them, not on the raw input files.

I’ve locally tested a fix that changes the regex pattern to look for output options, and it resolved the issue in my pipeline.

Thanks for your feedback!

ewels · October 5, 2025, 5:11am

Hi @ksankit4444,

Sorry to hear that you’re having problems and thanks for your post! Having clashing sample names is a very common problem for MultiQC users and not an easy one to fix. Using output file names is a solid idea, but I’m afraid that it’s not something that I would like to implement for now. Whilst it solves your issue, it’s not a universal solution - in other cases, folks may use generic file names for intermediate file names during processing and this would break their reports, for example. In short, MultiQC has so many users that there is very rarely a common usage pattern that you can rely on. MultiQC uses input files because we aim to share sample names across tools and the input file names are usually the most “pure” which give the best chance of this.

This said, you do have several options at your disposal. The docs has a section on clashing sample names, and you can also tell it to use log file names instead of input file names. If you have uninformative identifiers from your sequencing core then you can provide sample name replacement. If you really want to use the output file names then you can write a custom script to run MultiQC and have complete control for customisation of every aspect.

I hope that helps!

Phil

ksankit4444 · October 5, 2025, 5:26pm

Hi Phil,

Thank you so much for the detailed explanation and for pointing me to the right resources.

On a side note, I also wanted to mention that your Nextflow tutorial videos on YouTube were incredibly helpful for me when I was getting started. They really made the concepts click.

Thanks again for all your great work and contributions to the community!

system · November 5, 2025, 4:27pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

ewels · November 5, 2025, 4:27pm

Glad it was helpful, thanks for the kind words

Topic		Replies	Views
Fastp: not enough samples showing Ask for help multiqc , fastp	4	534	December 11, 2023
Parsing both lane split and full run sample names on fastp Ask for help multiqc , fastp	3	198	May 6, 2024
How to process multiqc per sample instead of generating single report all togethe for the given list of sample in a workflow? Ask for help nextflow , multiqc	1	426	October 20, 2023
MultiQC clean trim/regex/etc help with names like sample_1 getting a second _1 appended Ask for help multiqc	3	48	July 17, 2025
MultiQC on fastp results Ask for help multiqc , fastp	2	954	December 6, 2023

MultiQC fastp module issue with input filenames

Related topics