We are trying to set up a process that used FastQ Screen to map a sample to a number of references for contamination screening. We want to be able to provide a bunch of different references (their respective name, path and preferred aligner), have their paths resolved and symlinked in the work dir of the process, and written to a config on the form
but we canât expand it to handle more than one entry, e.g. adding ["Scerevisiae","s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/NCBI/build3.1/Sequence/Bowtie2Index/","bowtie2"] to the list of lists.
This is a limitation of the process input/output syntax. You can have a path, a path list, a tuple with path elements, and even a tuple with path list elements, but you canât have a list of tuples containing paths, which seems to be what you want.
To make this work, youâll need to transpose the way you provide inputs to the process:
process TEST {
input:
val(db_names)
path(db_paths, name: "db_path*")
val(aligners)
// ...
}
You can then use multiMap (or just map) to split your ch_db into the three process inputs:
ch_db_multi = ch_db.multiMap { dbs ->
db_name: dbs.collect { db -> db[0] }
db_path: dbs.collect { db -> db[1] }
aligner: dbs.collect { db -> db[2] }
}
TEST (
ch_db_multi.db_name,
ch_db_multi.db_path,
ch_db_multi.aligner,
)
process OptionC {
input:
tuple val(db_names), path(db_paths, name: "db_path*"), val(aligners)
script:
"""
read -a species_array <<< '${db_names.join(' ')}'
read -a db_paths_array <<< '${db_paths.join(' ')}'
read -a tools_array <<< '${aligners.join(' ')}'
for i in "\${!species_array[@]}"; do
echo -e "DATABASE\t\${species_array[i]}\t\${db_paths_array[i]}\t\${tools_array[i]}" >> "fastq_screen.conf"
done
ls -lh
"""
}