I have a pipeline where I map the same file on two different indices, resulting in two different bam files - which I name the same but put on different directories. It looks something like this (i.e. the files have the same name, although in different directories):
bam_index1 = COPTR_MAP1(ch_reads,[[id:'index1'],file(params.index1)])
bam_index2 = COPTR_MAP2(ch_reads,[[id:'index2'],file(params.index2)])
bam_index1.bam.join(bam_index2.bam)
.groupTuple(
by: [0]
)
.map { result ->
[result[0],[result[1][0],result[2][0]]]
}
.set{ch_merged}
The problem is that this looks something like this:
[[id:ERR10889327, single_end:false], [/gpfs1/home/r/b/rbarrant/projects/coptrPipeline/deleteme_work/e1/267a274ba0b7b92d2506879a156de5/ERR10889327.bam, /gpfs1/home/r/b/rbarrant/projects/coptrPipeline/deleteme_work/0c/ae859977b9990136da664363126406/ERR10889327.bam]]
[[id:ERR10889525, single_end:false], [/gpfs1/home/r/b/rbarrant/projects/coptrPipeline/deleteme_work/d e/57db29853127258e6afcad4c3acf5e/ERR10889525.bam, /gpfs1/home/r/b/rbarrant/projects/coptrPipeline/deleteme_work/62/1652359efc2ff4578a3bdbc060c040/ERR10889525.bam]]
And the subsequent program, called COPTR_MERGE which merges the bam files, doesn’t seem to like it if the names are the same:
ERROR ~ Error executing process > ‘COPTR_MERGE (2)’
Caused by:
ProcessCOPTR_MERGE
input file name collision – There are multiple input files for each of the following file names: ERR10889525.bam
In terms of best practices, how can this best be solved? Should I change the name on the previous process (COPTR_MAP) or inside the COPTR_MERGE one? Are there any examples of doing this? (i.e. if filenames are the same change them?).
I can think of a few options but don’t want my second nf-core module to look too ugly!! My current thought is that, within the COPTR_MERGE module, to iterate over all the bams and add a number to each file.