Duplicate file names

I have a pipeline where I map the same file on two different indices, resulting in two different bam files - which I name the same but put on different directories. It looks something like this (i.e. the files have the same name, although in different directories):


bam_index1 = COPTR_MAP1(ch_reads,[[id:'index1'],file(params.index1)])
bam_index2 = COPTR_MAP2(ch_reads,[[id:'index2'],file(params.index2)])

bam_index1.bam.join(bam_index2.bam)
    .groupTuple(
         by: [0]
    )
    .map { result ->
        [result[0],[result[1][0],result[2][0]]]
    }
    .set{ch_merged}

The problem is that this looks something like this:

[[id:ERR10889327, single_end:false], [/gpfs1/home/r/b/rbarrant/projects/coptrPipeline/deleteme_work/e1/267a274ba0b7b92d2506879a156de5/ERR10889327.bam, /gpfs1/home/r/b/rbarrant/projects/coptrPipeline/deleteme_work/0c/ae859977b9990136da664363126406/ERR10889327.bam]]
[[id:ERR10889525, single_end:false], [/gpfs1/home/r/b/rbarrant/projects/coptrPipeline/deleteme_work/d e/57db29853127258e6afcad4c3acf5e/ERR10889525.bam, /gpfs1/home/r/b/rbarrant/projects/coptrPipeline/deleteme_work/62/1652359efc2ff4578a3bdbc060c040/ERR10889525.bam]]

And the subsequent program, called COPTR_MERGE which merges the bam files, doesn’t seem to like it if the names are the same:

ERROR ~ Error executing process > ‘COPTR_MERGE (2)’
Caused by:
Process COPTR_MERGE input file name collision – There are multiple input files for each of the following file names: ERR10889525.bam

In terms of best practices, how can this best be solved? Should I change the name on the previous process (COPTR_MAP) or inside the COPTR_MERGE one? Are there any examples of doing this? (i.e. if filenames are the same change them?).

I can think of a few options but don’t want my second nf-core module to look too ugly!! My current thought is that, within the COPTR_MERGE module, to iterate over all the bams and add a number to each file.

You can solve this in many different ways, including in the next process where the name collision is happening. However, as this is a module and people will reuse it, I’d rather have it cleaned up in the module itself.

Based on that, I recommend you to not have files named the same way, even if they’re in different directories, if they’re going to be used by another process.

Ideally, modify COPTR_MERGE to stage each file in it’s own folder, as there’s no guarantee what the inputs will be like (Ideally the workflow developer will set prefix so the bams are uniquely named but one cannot rely on that). You do this with the stageAs: option in path.

See modules/modules/nf-core/biobambam/bamsormadup/main.nf at 033f2f25fa14ea81a4b93502d1dc6c2caf21cc92 · nf-core/modules · GitHub for an example.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.