Form channel content using a function in the .map{} operator

brgew · December 10, 2024, 6:18pm

Hi,

I am working on a Nextflow DSL2 pipeline and I am unsuccessful in my efforts to use a function in a .map{} operator to form a new channel. I was able to do this in Nextflow DSL1 where I formed a list of tuples that I returned but I have been unable to get this to work in DSL2.

The current problem involves merging bam files. A python script forms a JSON file that has contents

{
  "bam_merge_list": [
    {
      "out_file": "133.019-001_001_000.merged.bam",
      "in_file_list": [
        "133.019-001_001_000.bam",
        "133.019-002_001_000.bam",
        "133.019-003_001_000.bam",
        "133.019-004_001_000.bam"
      ]
    },
    {
      "out_file": "133.020-001_001_000.merged.bam",
      "in_file_list": [
        "133.020-001_001_000.bam",
        "133.020-002_001_000.bam",
        "133.020-003_001_000.bam",
        "133.020-004_001_000.bam"
      ]
    }, ...

A channel with the input files is formed in the workflow and is passed to a function using the .map{} operator. The function reads the JSON file and makes a list of tuples where one of the tuple elements is a list of the input files and the second element is the out_file name. (The function also prepends the path to the input bam files.) So the function returns a list that has the form

[ ["133.019-001_001_000.merged.bam", ["/data/133.019-001_001_000.bam", "/data/133.019-002_001_000.bam", "/data/133.019-003_001_000.bam", "/data/133.019-004_001_000.bam"]], etc]

I have tried various ways to use this as input to the downstream process without success. For example,

process xx {
  input:
  tuple val('out_file'), path('in_file_list')
...
}

Nothing that I’ve tried works for me. Incidentally, the number of input bam values will vary as well as the naming convention, which is the reason that I use the JSON file to set up the channel.

I am hoping that I am missing a magical incantation that’s required to make this work.

I appreciate your consideration and guidance.

Thank you.

Ever grateful,
Brent

P.S. I stumbled on Nextflow training ‘Grouping and Splitting’. It looks like there may be a way to use the .splitJson() operator and, maybe, the .subMap() method although it’s not clear to me how I might use .subMap() in this case. I modified the python script to write absolute paths to the input bam files as a start.

P.P.S This closure appears to work

def closure01 = {
item →
meta = item.subMap(‘out_file’, ‘in_file_list’)
def out_name = item[‘out_file’]
def in_file_list =
for(in_file in item[‘in_file_list’]) {
in_file_list.add(file(in_file))
}
[out_name, in_file_list]
}

so I believe that I can proceed.

Thank you.

Topic		Replies	Views
Help needed with channels Ask for help nextflow	1	31	April 8, 2025
Invalid method invocation call with arguments Ask for help	5	314	August 3, 2024
Recommended way of passing a file that lists filenames as input to a process Ask for help	1	73	September 11, 2024
Flatten only the second item of a tuple Ask for help nextflow	3	26	April 2, 2025
Five files in to a process, but only 1 comes out for the next? Ask for help nextflow	9	476	October 6, 2023

Form channel content using a function in the .map{} operator

Related topics