Forcing processes to wait for each other

jfnjdoh · August 7, 2024, 6:08pm

I have a situation where 2 processes use a gpu but are separated by a process that does not. I don’t want them to be going at the same time and fighting over the gpu. I know about making fake state dependencies but I don’t think that’s what I need here because the outputs do depend on one another. Consider this

process gpu1 {
    input:
    path sample

    output:
    tuple val("${sample}"), path(result1)

    script:
    """
    gpu using process "${sample}" > result1
    """
}

process no_gpu {
    input:
    tuple val(sample), path(result1)

    output:
    tuple val("${sample}"), path(result2)

    script:
    """
    non-gpu using process "${result1}"> result2
    """
}

process gpu2 {
    input:
    tuple val(sample), path(result2)

    output:
    tuple val("${sample}"), path(result3)

    script:
    """
    gpu using process 2 "${result2}" > result3
    """
}

workflow {
    gpu1(some_input_ch)

    no_gpu(gpu1.out).collect() | gpu2
}

I’d like gpu2 to wait until all gpu1 are done. no_gpu is fast so I equated that gpu1 being done so I thought I would use collect to wait until all no_gpu are done but I see it outputs a value channel and then I get an error Input tuple does not match tuple declaration in process gpu2. What is the move here? Is collect not what I want? Or should I be trying to transform the output of collect into a tuple?

mribeirodantas · August 8, 2024, 3:28am

The collect operator (or some other equivalent channel operator) is what you’re looking for but, as you mentioned, you need to either change the next process to accept the new channel format, or use other channel operators to convert this output channel into the format that the next process is waiting for.

jfnjdoh · August 8, 2024, 12:46pm

I see. What is the correct way to do that? If two samples are run through, I get a list of

[sample1, path/to/sample1, sample2, path/to/sample2]

Should I use collate like this?

workflow {
    gpu1(some_input_ch)

    no_gpu(gpu1.out).collect().collate(2) | gpu2
}

That results in a list of tuples

[[sample1, path/to/sample1], [sample2, path/to/sample2]]

but I don’t think that’s the correct way to do it. Indeed it causes an error. I think I need map or multiMap but I’m not sure how to apply them here.

mribeirodantas · August 8, 2024, 2:43pm

I think you need a flatMap operator there. See the example below:

Channel
  .of([[1, 2], [3, 4]])
  .flatMap()
  .view()

Output:

jfnjdoh · August 8, 2024, 3:11pm

Ah my mistake, the output is actually just a single list of 4 things. What I was having trouble with is how to get these things into pairs, namely this

[sample1, path/to/sample1, sample2, path/to/sample2]

into a channel that contains individual tuples as outputs, which is what the next process is expecting. Thank you for the support, one day I’ll be good enough to give back to the community.

Alexander_Nater · August 9, 2024, 10:40am

Try no_gpu(gpu1.out).collect(flat: false).flatMap().

jfnjdoh · August 9, 2024, 12:29pm

Yep, that did the job. I didn’t think to try flat: false.

system · August 16, 2024, 12:30pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to wait until a process is complete? Ask for help	22	580	April 23, 2024
Sample names in value channel after collect() method Ask for help nextflow	4	268	January 6, 2024
How to use collect on two process and do a join? Ask for help nextflow	4	167	March 13, 2024
Input cardinality issue after collect Ask for help	5	178	February 22, 2024
Collecting output from different processes to run another process Ask for help	3	108	July 23, 2024

Forcing processes to wait for each other

Related topics