Problems with GPU-enabled wave images

Nico_Trummer · January 9, 2025, 5:08pm

Hello,

I am trying to build a GPU-enabled image using wave. I encountered several problems:

Seqera containers (web UI) does not allow selecting a specific build

jaxlib has for each version multiple builds: Some with CUDA, some without. In order to make sure that a GPU (CUDA) version is enabled, I need to specify a specific build. This is not possible with the web tool. Thus I sticked with the wave CLI, which is not a big problem but I just wanted to mention. The yml file used with the CLI is shown here.

Installing cuda requires the virtual __cuda package

Conda virtual packages are explained here. As the wave servers do not have CUDA installed, it is necessary to set the CONDA_OVERRIDE_CUDA environment variable. This is neither possible with Seqera Containers, not in combination with --conda-file in the wave CLI (except if one has a --conda-base-image that already sets the environment variable.
Thus, I created a dockerfile that does this. Not straightforward to figure out, but not a huge problem.

Error 137 (out of memory)

You can see my final setup here. The wave build log can be accessed here. Apparently it fails due to an 137 (OOM) error. It is not uncommon for CUDA dependencies to be rather large (several gigabytes).

Wrapping up

Would love to see them fixed, but I understand if they are not top priority. But GPU computing is on a rise, so maybe these problems will become more prominent with time.

paolo · January 10, 2025, 1:39pm

I think it’s not an Out out memory issue. More likely it was killed because it reached the 15 minutes build timeout.

If you authenticate your request adding your tower (aka Platform) token, you will gain 25 build minutes timeout

Nico_Trummer · January 11, 2025, 2:41pm

Hey,
this is now the build log with a platform token provided: log

It appears to fail in the same way, again after around 15:30, so apparently the additional time limit did not help

Topic		Replies	Views
Wave and Singularity Ask for help	8	391	October 24, 2023
Containers search interface not finding conda-forge:duckdb version 1.0.0 Ask for help	5	39	August 14, 2024
Platform specificity Ask for help	5	56	August 19, 2024
Interproscan from Seqera Containers fails to build Ask for help	5	119	August 13, 2024
Wave error when using SHA256 as tag does not support the containerConfig Ask for help	2	144	April 29, 2024

Problems with GPU-enabled wave images

Seqera containers (web UI) does not allow selecting a specific build

Installing cuda requires the virtual __cuda package

Error 137 (out of memory)

Wrapping up

Related topics