Hello,
I have a process that works by chromosomes that output a tuple in the following format: tuple val(chr_id), path(chr_vcf). In the following process i want to concatenate all my vcfs into a full genome vcf but i want to make sure my chromosomes are in the correct order.
I managed to collect and sort by the key chr_id using either collect(flat: false, sort: {it[0]}) or toSortedList{ a, b -> a[0] <=> b[0] } but then I struggle with the cardinality or to get a list with only the vcfs paths for the following process.
A more concrete example:
Channel.of(["1","test_1.vcf.gz"], ["MU123", "test_MU123.vcf.gz"], ["JA01.1", "test_JA01.1.vcf.gz"], ["2", "test_2.vcf.gz"], ["10", "test_10.vcf.gz"], ["MT", "test_MT.vcf.gz"])
.view()
.set{ my_ch }
// sorting works
my_ch.collect(flat: false, sort: {it[0]})
.view()
// sorting works
my_ch.toSortedList{ a, b -> a[0] <=> b[0] }
.view()
// sorting works but how to get list of paths? => does nothing
my_ch.collect(flat: false, sort: {it[0]})
.map{ it[][1] }
.view()
process CONCAT_TEST{
input:
// or how to manage input cardinality?
tuple val(chr_id), path(chr_vcfs)
script:
"""
bcftools concat ...
"""
}
workflow{
CONCAT_TEST(my_ch.collect(flat: false, sort: {it[0]}))
}
Output:
Launching `example_nf.nf` [distracted_legentil] DSL2 - revision: 3470727da3
[1, test_1.vcf.gz]
[MU123, test_MU123.vcf.gz]
[JA01.1, test_JA01.1.vcf.gz]
[2, test_2.vcf.gz]
[10, test_10.vcf.gz]
[MT, test_MT.vcf.gz]
// collect:
[[1, test_1.vcf.gz], [10, test_10.vcf.gz], [2, test_2.vcf.gz], [JA01.1, test_JA01.1.vcf.gz], [MT, test_MT.vcf.gz], [MU123, test_MU123.vcf.gz]]
// toSortedList:
[['1', 'test_1.vcf.gz'], ['10', 'test_10.vcf.gz'], ['2', 'test_2.vcf.gz'], ['JA01.1', 'test_JA01.1.vcf.gz'], ['MT', 'test_MT.vcf.gz'], ['MU123', 'test_MU123.vcf.gz']]
WARN: Input tuple does not match tuple declaration in process `CONCAT_TEST` -- offending value: [[1, test_1.vcf.gz], [10, test_10.vcf.gz], [2, test_2.vcf.gz], [JA01.1, test_JA01.1.vcf.gz], [MT, test_MT.vcf.gz], [MU123, test_MU123.vcf.gz]]
ERROR ~ Error executing process > 'CONCAT_TEST'
Caused by:
Not a valid path value: '10'