Sample names in value channel after collect() method

I was advised to create tuples with the sample name and paths if their are processes that require more than 1 input queue channel. For example:

// Emitted queue channels
queue_channel_1:
[ sample_X, path_A ]
[ sample_Y, path_B]

queue_channel_2:
[ sample_X, path_C, path_D ]
[ sample_Y, path_E, path_F ]

etc... 
// The quantity of samples is unknown

// Wait for all the processes to finish the queue channels
value_channel = queue_channel_1.join(queue_channel_2).collect()

// This can't be used as path input now, how can I use the paths as input for the next process? Because the quantity of samples is unknown, the size of the value channel is also unknown.
collected:
[ sample_X, path_A, path_C, path_D, sample_Y, path_B, path_E, path_F ]

I haven’t been able to figure out how to remove the sample names from the list of paths.

Update
I’m wondering if this will work: utilize a process that doesn’t execute a script to emit paths without sample names as a workaround. This isn’t an elegant solution and now I’m not sure if the sample names in the tuple was the best practice.

input:
tuple val(sample_name), path(path_1)
tuple val(sample_name), path(path_2, path_3)

output:
path path_1, emit: path_1
path path_2, emit: path_2
path path_3, emit: path_3

Hi @Gableo,

I hope I can help with your question. If I understand correctly, the main problem you have is that you don’t know the number of input paths beforehand?

If this is the main issue, there are a couple of approaches how you could handle this.

Option 1 - Treating paths like a list and joining them

You could prodive your paths as a list in the input and then use something like join to concatenate the different paths together :

process processName {
    input:
    tuple val(sample_name), path(paths)

    script:
    def inputs = paths.join(" ")
    """
    your_command --inputs $inputs
    """
}

Option 2 - Treating paths like a list and using indices to access specific elements

process useSpecificPaths {
    input:
    tuple val(sample_name), path(paths)

    script:
    // Assuming you want the first and third paths
    def path1 = paths[0]
    def path3 = paths[2]
    """
    your_command --input1 $path1 --input3 $path3
    """
}

This should allow you to input a long list of unknown length of paths and use them all or a single one in your next process.

Let me know if this works or if there is a problem :slight_smile:!

1 Like

I tried this but it doesn’t work because the data structure of the tuple looks like this:

[ sample_name, path_1, path_2, ... ]

Nextflow complains if you only declare one path in the input as you have shown above.
Next, the sample_name occurs an unknown amount of times in the tuple because the quantity of samples is unknown:

[ sample_A, path_1, path_2, ..., sample_B, ... , sample_N, ..., ... ]

Nextflow will complain and throw a fatal error because sample_B is not a path.

I got around to trying the workaround I mentioned above and it did what I was expecting it to. I don’t think it’s the best practice though.

process Remove_Sample_Names_From_Queues {
  input:
  tuple val(sample_name), path(path_1)
  tuple val(sample_name), path(path_2), path(path_3)
  output:
  tuple path(path_1), emit: queue_chan_A
  tuple path(path_2), path(path_3), emit: queue_chan_B
  script:
  """
  #!/bin/bash
  echo "Removed sample names from queues."
  """
}

workflow {
  Remove_Sample_Names_From_Queues(
    queue_chan_A,
    queue_chan_B
  )
  Bac_Profile_Genome(
    Remove_Sample_Names_From_Queues.out.queue_chan_A.collect(),
    Remove_Sample_Names_From_Queues.out.queue_chan_B.collect()
  )
}

You can easily do that with channel operators such as map.

Channel
  .of(["sample_A", file('a1.txt'), file('b1.txt')],
      ["sample_B", file('a2.txt'), file('b2.txt'), file('c2.txt')])
  .map { values -> values[1..-1] }
  .view()

Output: