Hello,
I am trying to build a GPU-enabled image using wave. I encountered several problems:
Seqera containers (web UI) does not allow selecting a specific build
jaxlib has for each version multiple builds: Some with CUDA, some without. In order to make sure that a GPU (CUDA) version is enabled, I need to specify a specific build. This is not possible with the web tool. Thus I sticked with the wave CLI, which is not a big problem but I just wanted to mention. The yml file used with the CLI is shown here.
Installing cuda requires the virtual __cuda package
Conda virtual packages are explained here. As the wave servers do not have CUDA installed, it is necessary to set the CONDA_OVERRIDE_CUDA
environment variable. This is neither possible with Seqera Containers, not in combination with --conda-file
in the wave CLI (except if one has a --conda-base-image
that already sets the environment variable.
Thus, I created a dockerfile that does this. Not straightforward to figure out, but not a huge problem.
Error 137 (out of memory)
You can see my final setup here. The wave build log can be accessed here. Apparently it fails due to an 137 (OOM) error. It is not uncommon for CUDA dependencies to be rather large (several gigabytes).
Wrapping up
Would love to see them fixed, but I understand if they are not top priority. But GPU computing is on a rise, so maybe these problems will become more prominent with time.