Forcing processes to wait for each other

I have a situation where 2 processes use a gpu but are separated by a process that does not. I don’t want them to be going at the same time and fighting over the gpu. I know about making fake state dependencies but I don’t think that’s what I need here because the outputs do depend on one another. Consider this

process gpu1 {
    input:
    path sample

    output:
    tuple val("${sample}"), path(result1)

    script:
    """
    gpu using process "${sample}" > result1
    """
}

process no_gpu {
    input:
    tuple val(sample), path(result1)

    output:
    tuple val("${sample}"), path(result2)

    script:
    """
    non-gpu using process "${result1}"> result2
    """
}

process gpu2 {
    input:
    tuple val(sample), path(result2)

    output:
    tuple val("${sample}"), path(result3)

    script:
    """
    gpu using process 2 "${result2}" > result3
    """
}

workflow {
    gpu1(some_input_ch)

    no_gpu(gpu1.out).collect() | gpu2
}

I’d like gpu2 to wait until all gpu1 are done. no_gpu is fast so I equated that gpu1 being done so I thought I would use collect to wait until all no_gpu are done but I see it outputs a value channel and then I get an error Input tuple does not match tuple declaration in process gpu2. What is the move here? Is collect not what I want? Or should I be trying to transform the output of collect into a tuple?

The collect operator (or some other equivalent channel operator) is what you’re looking for but, as you mentioned, you need to either change the next process to accept the new channel format, or use other channel operators to convert this output channel into the format that the next process is waiting for.

I see. What is the correct way to do that? If two samples are run through, I get a list of

[sample1, path/to/sample1, sample2, path/to/sample2]

Should I use collate like this?

workflow {
    gpu1(some_input_ch)

    no_gpu(gpu1.out).collect().collate(2) | gpu2
}

That results in a list of tuples

[[sample1, path/to/sample1], [sample2, path/to/sample2]]

but I don’t think that’s the correct way to do it. Indeed it causes an error. I think I need map or multiMap but I’m not sure how to apply them here.

I think you need a flatMap operator there. See the example below:

Channel
  .of([[1, 2], [3, 4]])
  .flatMap()
  .view()

Output:

Ah my mistake, the output is actually just a single list of 4 things. What I was having trouble with is how to get these things into pairs, namely this

[sample1, path/to/sample1, sample2, path/to/sample2]

into a channel that contains individual tuples as outputs, which is what the next process is expecting. Thank you for the support, one day I’ll be good enough to give back to the community.

1 Like

Try no_gpu(gpu1.out).collect(flat: false).flatMap().

1 Like

Yep, that did the job. I didn’t think to try flat: false.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.