Hi,
I am working on a Nextflow DSL2 pipeline and I am unsuccessful in my efforts to use a function in a .map{} operator to form a new channel. I was able to do this in Nextflow DSL1 where I formed a list of tuples that I returned but I have been unable to get this to work in DSL2.
The current problem involves merging bam files. A python script forms a JSON file that has contents
{
"bam_merge_list": [
{
"out_file": "133.019-001_001_000.merged.bam",
"in_file_list": [
"133.019-001_001_000.bam",
"133.019-002_001_000.bam",
"133.019-003_001_000.bam",
"133.019-004_001_000.bam"
]
},
{
"out_file": "133.020-001_001_000.merged.bam",
"in_file_list": [
"133.020-001_001_000.bam",
"133.020-002_001_000.bam",
"133.020-003_001_000.bam",
"133.020-004_001_000.bam"
]
}, ...
A channel with the input files is formed in the workflow and is passed to a function using the .map{} operator. The function reads the JSON file and makes a list of tuples where one of the tuple elements is a list of the input files and the second element is the out_file name. (The function also prepends the path to the input bam files.) So the function returns a list that has the form
[ ["133.019-001_001_000.merged.bam", ["/data/133.019-001_001_000.bam", "/data/133.019-002_001_000.bam", "/data/133.019-003_001_000.bam", "/data/133.019-004_001_000.bam"]], etc]
I have tried various ways to use this as input to the downstream process without success. For example,
process xx {
input:
tuple val('out_file'), path('in_file_list')
...
}
Nothing that I’ve tried works for me. Incidentally, the number of input bam values will vary as well as the naming convention, which is the reason that I use the JSON file to set up the channel.
I am hoping that I am missing a magical incantation that’s required to make this work.
I appreciate your consideration and guidance.
Thank you.
Ever grateful,
Brent
P.S. I stumbled on Nextflow training ‘Grouping and Splitting’. It looks like there may be a way to use the .splitJson() operator and, maybe, the .subMap() method although it’s not clear to me how I might use .subMap() in this case. I modified the python script to write absolute paths to the input bam files as a start.
P.P.S This closure appears to work
def closure01 = {
item →
meta = item.subMap(‘out_file’, ‘in_file_list’)
def out_name = item[‘out_file’]
def in_file_list =
for(in_file in item[‘in_file_list’]) {
in_file_list.add(file(in_file))
}
[out_name, in_file_list]
}
so I believe that I can proceed.
Thank you.