I have been running epi2me/wf-clone-validation
in Google Cloud Batch but I’ve been observing a linear relationship between the number of samples and the time taken to run everything. My assumption was that running in GCP would allow for full horizontal scaling and I want to confirm if this is possible.
I’m launching the pipeline from a Google Cloud Run Job with the following resources:
- 8 vCPU
- 32GB RAM
Running a single sample takes around an hour whereas running 43 samples takes > 5 hours (I haven’t run it long enough to get an accurate duration). There is clearly some scalability but given these samples are independent, I would expect the ability to process them concurrently.
I tried to implement my own parallel processing using Python’s subprocess module but the pipeline recognised that one process had the lock file open and so some samples ended up not being processed.
I also tried using the --threads
property and set it to 8
to match my vCPU count but this didn’t improve throughput.
Is my ability to scale limited by the resource constraints of my Cloud Run Job? Or is there something else I’m missing to be able to fully horizontally scale this workflow and run 43 samples at roughly the same speed as 1.
N.B: I’m also trying to achieve the same thing with epi2me/wf-amplicon