Input file name collision - There are multiple input files

complexgenome · April 16, 2024, 7:07pm

Hi there,

I created a channel using:

mutation_burden.out.mutation_burden_txt | collect(flat: false) | transpose()| buffer(skip: 1)| map { it[0] } | set { mutation_burden_collected_files}

The file names are same. I’ve following collected file channel as:

[mnt/data1/users/sanjeev/nextflow/meta_structure/work/ab/7bb0748ddba567583c4236c0d33d1f/mutation_burden.txt, /mnt/data1/users/sanjeev/nextflow/meta_structure/work/88/47dba5991e1192651aa9aec4f7b6ef/mutation_burden.txt, /mnt/data1/users/sanjeev/nextflow/meta_structure/work/56/f9becc216df3f672527dea1170f756/mutation_burden.txt]

I pass then into process:

input:
    path(mut_files, stageAs: 'temp_merge_mut_burden/*')

When I pass this channel into a process, I get following error:

Error:

ERROR ~ Error executing process > ‘wes:merge_mutation_burden (1)’

Caused by:
Process wes:merge_mutation_burden input file name collision – There are multiple input files for each of the following file names: temp_merge_mut_burden/mutation_burden.txt

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

– Check ‘.nextflow.log’ file for details

Dummy channel:


ch_mutation=channel.of([[batch:"SEMA-MM-003", timepoint:"MM-0256-T-03", tissue:"normal", sequencing_type:"wes"],
file( "HRDresults.txt")],
[[batch:"SEMA-MM-004", timepoint:"MM-4607-T-01", tissue:"normal", sequencing_type:"wes"],
file("HRDresults.txt")],
[[batch:"SEMA-MM-002", timepoint:"MM-2692-T-01", tissue:"normal", sequencing_type:"wes"], 
file("HRDresults.txt")]
)

ch_mutation | collect(flat: false) | transpose()| buffer(skip: 1)| map { it[0] } | view

I cannot change the output file name HRDresults.txt as it’s hard-coded.

How do I resolve this when taking them as an input?

mribeirodantas · April 16, 2024, 7:11pm

Nextflow is complaining about input files, not output files. You’re passing many mutation_burden.txt files to the following process. Instead, make the previous process have different output filenames, e.g. mutation_burden_someIDhere.txt, so that when they’re symlinked to the work directory of the following process, there will be no filename collision.

complexgenome · April 16, 2024, 7:18pm

@mribeirodantas
I understand the issue of file name being same and complain of the nextflow.

is there a way to address this without touching the output file name in the previous step?

mribeirodantas · April 16, 2024, 7:30pm

Yes, touching the input file names in the next step, with stageAs, but it’s more complicated to do it like that. Is there a good reason for not renaming the output files in the previous step?

complexgenome · April 16, 2024, 7:38pm

OK. Let me see what can be done.
It’s an output of an in-house script that I didn’t write.

muffato · April 18, 2024, 3:08pm

@complexgenome

Can you also try this stageAs pattern ?

    path(mut_files, stageAs: 'temp_merge_mut_burden??/*')

This will put the input files in different directories and it shouldn’t cause the issue if they have the same name

system · April 25, 2024, 3:08pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sample names in value channel after collect() method Ask for help nextflow	4	268	January 6, 2024
Create channel after collect from channel that has multiple files? Ask for help	8	115	April 24, 2024
Input cardinality issue after collect Ask for help	5	179	February 22, 2024
Recommended way of passing a file that lists filenames as input to a process Ask for help	1	62	September 11, 2024
Writing multiple filenames to an output file Ask for help	1	26	March 20, 2025

Input file name collision - There are multiple input files

Related topics