Hello again,
I’m facing issues with my Nextflow pipeline where I need to process multiple files but have only one GPU available. When running the pipeline for a single file, everything works fine. However, with multiple files, I get the following error:
RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Steps Taken:
- Set
CUDA_LAUNCH_BLOCKING=1
to force synchronous CUDA operations. - Limited concurrency by setting
maxForks = 1
for the GPU-intensive process in the Nextflow config. - Adjusted the queue size to better manage job submissions.
Questions:
- Additional Configuration or Scripts:
- Are there any specific configurations or scripts I can implement to ensure exclusive GPU access for each job?
- Best Practices:
- What are the best practices for managing GPU resources in a multi-file Nextflow pipeline on a PBSPro scheduler?
- Insights on Concurrent GPU Usage:
- Despite the settings to limit concurrency, I still encounter the CUDA error. What could be causing multiple tasks to attempt using the GPU simultaneously?
Any guidance or recommendations would be greatly appreciated!
Thank you