Proper way to utilize >1 GPUs in a single machine?

For nanopore basecalling, I typically set maxForks to 1 because GPU usage changes and it generally causes failures when another sample attempts to start basescalling only to find that the amount of VRAM it thought was available is no longer. Lets say I have a machine with two GPUs in it, what’s the proper way to have sample 1 go to gpu 1 and sample 2 go to gpu 2? I was envisioning some system that labels samples as even or odd and all odd samples get sent to gpu 1 and all even samples to gpu 2 but I wonder if there’s an official way to do this?

The ideal scenario would be if your GPU-enabled application can simply detect and use all available GPUs (e.g. TensorFlow can do this). Then you can just send one task to each machine.

Failing that, you can use the accelerator directive with AWS Batch, Google Batch, and K8s, and they should be able to schedule individual GPUs for you. On HPC systems you might be able to use clusterOptions to schedule GPUs, as long as the scheduler knows how to use cgroups.

Failing that, it is very difficult to schedule GPUs properly with the local executor. It is generally much better to put your GPUs behind one of these GPU-aware executors and save yourself the headache.

I see. My organization isn’t particularly computer savvy so all I have are some desktop computers, each of which could have 2 GPUs in it. So I think I’m stuck with the local executor. I guess maybe I could make my own frakencluster out of some desktops, SLURM, and a laptop as the login node.

I have a similar issue, running nextflow with the local executor and docker on a single machine with 4 GPUs.

I can specify all GPUs using docker.runOptions but if 3 jobs are running they all use GPU 0, rather than use 1 GPU per job. This is expected, as the local executor is not aware of GPUs.

I found I can also select GPUs by ID at the process level with the process directive,

containerOptions "--gpus '\"device=${gpu_id}\"' "

(Resource constraints | Docker Docs).

My proposed solution for dynamically allocating 1 GPU per process is to create a Channel of the GPU IDs, then have each process take an ID from the Channel and then somehow return the ID it used to the Channel afterwards. There’s no operator to insert a value into a channel AFAIK.

The only way I can see to do this is using a Channel.watchPath('gpus/id_*') or similar, to allow the process to take an ID value from the Channel and write/touch a file in the watchPath to add the ID back into the channel.

Maybe there is a native Groovy/Java atomic list I could use instead that might be simpler than using a Channel here.

Any suggestions appreciated.

For the local executor we could probably have some custom logic that keeps track of which GPUs are in use and assign them to GPU tasks by setting the NVIDIA_VISIBLE_DEVICES flag. That would at least work for NVIDIA GPUs which is the most common by far.

You could also put this logic into a plugin as a custom “local gpu” executor. That might be a better way to experiment with this functionality.

In any case, the GPU scheduling should be handled by the scheduler rather than the pipeline code.