Increasing throughput of Nextflow in GCP

Joe_Smith · September 6, 2024, 3:07pm

I have been running epi2me/wf-clone-validation in Google Cloud Batch but I’ve been observing a linear relationship between the number of samples and the time taken to run everything. My assumption was that running in GCP would allow for full horizontal scaling and I want to confirm if this is possible.

I’m launching the pipeline from a Google Cloud Run Job with the following resources:

8 vCPU
32GB RAM

Running a single sample takes around an hour whereas running 43 samples takes > 5 hours (I haven’t run it long enough to get an accurate duration). There is clearly some scalability but given these samples are independent, I would expect the ability to process them concurrently.

I tried to implement my own parallel processing using Python’s subprocess module but the pipeline recognised that one process had the lock file open and so some samples ended up not being processed.

I also tried using the --threads property and set it to 8 to match my vCPU count but this didn’t improve throughput.

Is my ability to scale limited by the resource constraints of my Cloud Run Job? Or is there something else I’m missing to be able to fully horizontally scale this workflow and run 43 samples at roughly the same speed as 1.

N.B: I’m also trying to achieve the same thing with epi2me/wf-amplicon

kenibrewer · September 9, 2024, 8:26pm

Welcome to the community Joe!

Can you provide some additional details about your setup? What’s the exact cli command that you’re running?

In the interim, I’m going to guess that you might be running Nextflow with the default local executor, where it only has access to the 8vCPUs and 32 GB of RAM of the machine you launched it on. It’s definitely possible to do “full horizontal scaling” on Google Cloud but it requires some additional configuration.

This involves adding some credentials (so Nextflow can start instances that can work on tasks), and then configuring some additional details such as the project. Everything should be covered here.

Let us know if you need any extra help!

Topic		Replies	Views
Unable to run parallel threads Ask for help nextflow , aws	4	178	April 23, 2024
Increasing allocation for running on google cloud Ask for help nextflow , nf-core	12	66	August 20, 2024
Nextflow Error Ask for help nextflow , nf-core , google-cloud , platform	5	427	July 1, 2024
Help with Optimizing Nextflow Pipeline for Large Datasets Ask for help nextflow , aws	1	99	February 24, 2025
Using queue size to parallelise local executor Ask for help	2	48	December 2, 2024

Increasing throughput of Nextflow in GCP

Related topics