Managing GPU Resources in Nextflow Pipeline on PBSPro

Hello again,

I’m facing issues with my Nextflow pipeline where I need to process multiple files but have only one GPU available. When running the pipeline for a single file, everything works fine. However, with multiple files, I get the following error:

RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
  CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
  For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Steps Taken:

  1. Set CUDA_LAUNCH_BLOCKING=1 to force synchronous CUDA operations.
  2. Limited concurrency by setting maxForks = 1 for the GPU-intensive process in the Nextflow config.
  3. Adjusted the queue size to better manage job submissions.

Questions:

  1. Additional Configuration or Scripts:
  • Are there any specific configurations or scripts I can implement to ensure exclusive GPU access for each job?
  1. Best Practices:
  • What are the best practices for managing GPU resources in a multi-file Nextflow pipeline on a PBSPro scheduler?
  1. Insights on Concurrent GPU Usage:
  • Despite the settings to limit concurrency, I still encounter the CUDA error. What could be causing multiple tasks to attempt using the GPU simultaneously?

Any guidance or recommendations would be greatly appreciated!

Thank you

1 Like