I was advised to create tuples with the sample name and paths if their are processes that require more than 1 input queue channel. For example:
// Emitted queue channels
queue_channel_1:
[ sample_X, path_A ]
[ sample_Y, path_B]
queue_channel_2:
[ sample_X, path_C, path_D ]
[ sample_Y, path_E, path_F ]
etc...
// The quantity of samples is unknown
// Wait for all the processes to finish the queue channels
value_channel = queue_channel_1.join(queue_channel_2).collect()
// This can't be used as path input now, how can I use the paths as input for the next process? Because the quantity of samples is unknown, the size of the value channel is also unknown.
collected:
[ sample_X, path_A, path_C, path_D, sample_Y, path_B, path_E, path_F ]
I haven’t been able to figure out how to remove the sample names from the list of paths.
Update
I’m wondering if this will work: utilize a process that doesn’t execute a script to emit paths without sample names as a workaround. This isn’t an elegant solution and now I’m not sure if the sample names in the tuple was the best practice.
I hope I can help with your question. If I understand correctly, the main problem you have is that you don’t know the number of input paths beforehand?
If this is the main issue, there are a couple of approaches how you could handle this.
Option 1 - Treating paths like a list and joining them
You could prodive your paths as a list in the input and then use something like join to concatenate the different paths together :
I tried this but it doesn’t work because the data structure of the tuple looks like this:
[ sample_name, path_1, path_2, ... ]
Nextflow complains if you only declare one path in the input as you have shown above.
Next, the sample_name occurs an unknown amount of times in the tuple because the quantity of samples is unknown: