Parsing both lane split and full run sample names on fastp

Tony_Brooks · April 23, 2024, 4:56pm

Hello
We run our NovaSeq in both XP mode (split by lane) and standard mode (same samples over all lanes). This results in filenames in two basic syntaxes from BCL Convert:

SampleName_S123_R1_001.fastq.gz
SampleName_S123_L002_R1_001.fastq.gz

For some reason, I cannot get the sample name cleaning to work for fastp for the latter format. The non-lane samples work as expected, but in every case for files with the lane info, fastp puts the metrics on a separate line in the table with the “SampleName_S123_L002_R1_001” sample name. All other tools in the table are using “SampleName”. This is just an issue with fastp. I am using a custom config file provided with the -c flag when running MultiQC.

The output of fastp is set to create a file called SampleName.fastp.json which it does for both input file names.

I have tried including

use_filename_as_sample_name:
  - fastp

in the config file. This should work as the SampleName.fastp.json is the same regardless of input filename.

I have also tried adding a custom extra_fn_clean_exts with a regex expression (which correctly identifies the text to clean on both filename types when checked with regex101.com)

extra_fn_clean_exts:
  - type: regex
    pattern: "_S[0-9]+[_L[0-9]+]?_R[1-2]_001"
    module: fastp

What stupid thing am I missing here?

Thanks in advance

ewels · April 24, 2024, 6:36am

Hi @Tony_Brooks,

Please can you attach an example file which we can replicate this with?

Thanks!

Phil

Tony_Brooks · May 1, 2024, 5:24pm

Test.zip (643.5 KB)

I have attached two .fastp.json files generated from files with both types of file name (same fastq data, just re-named). I have also attached metrics from picard Collect RNASeq metrics and my config.yaml file. In the resultant report, Sample B lines up in the table, but Sample A is split.

ewels · May 6, 2024, 8:19pm

Hi @Tony_Brooks,

Thanks for this!

Couple of quick things to note:

The report you’ve generated is created with MultiQC v1.12 which is pretty old now - released over 2 years ago.
- The latest version is v1.21, I’d recommend always updating to the latest version if you ever hit problems as we are constantly fixing bugs and things are often already resolved.
You seem to have created a multiqc_config.yaml file based on the defaults from MultiQC with all attributes specified.
- I’d recommend against doing this, it effectively stops us from being able to ship config changes for you in MultiQC and can lead to unexpected behaviour. Better to only specify the attributes that you want to change (you can always leave the others there if you want, just comment them out with a #).
- The config has a defined order of parsing, but by putting everything in this file you’re making all the defaults have top priority which could cause problems.

It’s difficult to bugtest with all the config stuff there, so I tried on your logs without any config at all. As expected, I get the following:

SampleA
SampleA_S2__L001_R1_001
SampleB
SampleB_S2_R1_001

I made a minimal config with just your snippet above:

extra_fn_clean_exts:
  - type: regex
    pattern: "_S[0-9]+[_L[0-9]+]?_R[1-2]_001"
    module: fastp

That gives me the following:

SampleA
SampleB
SampleB_S2_R1_001

So Sample A is correctly collapsed but not Sample B. That’s expected, because it doesn’t have a Lane number in it.

I then tried again with the simplest config that I can think of:

extra_fn_clean_exts:
  - _S

And that correctly collapses the names:

SampleA
SampleB

All of these sample name patterns are working as I’d expect, so I think we’re ok here. I guess that there is some issue with config options fighting in your mega config, which might be causing your issue.

Let me know if you still have problems once you’ve cut your config down to just the things you want to change and I can take another look.

Phil

Topic		Replies	Views
MultiQC clean trim/regex/etc help with names like sample_1 getting a second _1 appended Ask for help multiqc	3	11	July 17, 2025
Fastp: not enough samples showing Ask for help multiqc , fastp	4	488	December 11, 2023
MultiQC on fastp results Ask for help multiqc , fastp	2	829	December 6, 2023
Fixing error: FastQ file for reads 1 must be provided, cannot contain spaces and must have extension Ask for help nextflow , nf-core	5	153	February 27, 2025
Collapse sample names in one table but not another Ask for help multiqc	9	430	October 27, 2023

Parsing both lane split and full run sample names on fastp

Related topics