Input file name collision - There are multiple input files

Hi there,

I created a channel using:

mutation_burden.out.mutation_burden_txt | collect(flat: false) | transpose()| buffer(skip: 1)| map { it[0] } | set { mutation_burden_collected_files}

The file names are same. I’ve following collected file channel as:

[mnt/data1/users/sanjeev/nextflow/meta_structure/work/ab/7bb0748ddba567583c4236c0d33d1f/mutation_burden.txt, /mnt/data1/users/sanjeev/nextflow/meta_structure/work/88/47dba5991e1192651aa9aec4f7b6ef/mutation_burden.txt, /mnt/data1/users/sanjeev/nextflow/meta_structure/work/56/f9becc216df3f672527dea1170f756/mutation_burden.txt]

I pass then into process:

    path(mut_files, stageAs: 'temp_merge_mut_burden/*')

When I pass this channel into a process, I get following error:


ERROR ~ Error executing process > ‘wes:merge_mutation_burden (1)’

Caused by:
Process wes:merge_mutation_burden input file name collision – There are multiple input files for each of the following file names: temp_merge_mut_burden/mutation_burden.txt

Tip: you can replicate the issue by changing to the process work dir and entering the command bash

– Check ‘.nextflow.log’ file for details

Dummy channel:

ch_mutation=channel.of([[batch:"SEMA-MM-003", timepoint:"MM-0256-T-03", tissue:"normal", sequencing_type:"wes"],
file( "HRDresults.txt")],
[[batch:"SEMA-MM-004", timepoint:"MM-4607-T-01", tissue:"normal", sequencing_type:"wes"],
[[batch:"SEMA-MM-002", timepoint:"MM-2692-T-01", tissue:"normal", sequencing_type:"wes"], 

ch_mutation | collect(flat: false) | transpose()| buffer(skip: 1)| map { it[0] } | view

I cannot change the output file name HRDresults.txt as it’s hard-coded.

How do I resolve this when taking them as an input?

Nextflow is complaining about input files, not output files. You’re passing many mutation_burden.txt files to the following process. Instead, make the previous process have different output filenames, e.g. mutation_burden_someIDhere.txt, so that when they’re symlinked to the work directory of the following process, there will be no filename collision.

I understand the issue of file name being same and complain of the nextflow.

is there a way to address this without touching the output file name in the previous step?

Yes, touching the input file names in the next step, with stageAs, but it’s more complicated to do it like that. Is there a good reason for not renaming the output files in the previous step?

OK. Let me see what can be done.
It’s an output of an in-house script that I didn’t write.


Can you also try this stageAs pattern ?

    path(mut_files, stageAs: 'temp_merge_mut_burden??/*')

This will put the input files in different directories and it shouldn’t cause the issue if they have the same name

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.