What are the best practices for using conda environments within singularity/apptainer containers with Nextflow?
For example, I make a container based on one of the condaforge Docker base containers, like: condaforge/miniforge3:24.7.1-0
For a while I was just installing my packages in the base environment, but recently I found a package conflict with the contents of the base environment, so I have to install my packages in a different environment.
The issue now is that I need to somehow tell nextflow to use the new environment.
I tried adding this to the end of my singularity.def file, but it didn’t help, because nextflow calls singularity using singularity exec not singularity run
I want to keep my nextflow scripts agnostic to the container/package environment, so I don’t want to add conda related commands to my nextflow processes.
Have you tried using Seqera containers? It makes the process of rapidly iterating and modifying your container image much easier. You can choose the list of conda packages you need and it will build the image and give you a permanent link you can pull into singularity.
Also, just in case you’re not doing this already, it’s a good idea to use different container images for each Nextflow process instead of trying to get them all the tools into one image. It makes package conflicts far less likely.
Thanks for your comments. It doesn’t really address my question though. I don’t think Seqera containers are suitable for my use case. The reason is that I use some custom/proprietary programs and scripts that are not available in any public package repositories. As far as I can tell Seqera containers are for loading various combinations of publicly available software?
Yes, sometimes I use different containers for different processes.
My issue is that one of my processes has a package that for some reason conflicts with one of the dependencies in the pre-installed in the conda base environment, it’s not that my programs conflict with each other.
Is it possible to split up the process which has the conflicting tools?
Best practice is hard to describe, but usually what I go with is to have one tool per process, so I don’t have to build my own containers, unless it’s a non-conda tool. Sometimes there are benefits though to having multiple tools per container for example when one writes huge files and piping the output to another tool compresses that significantly.
Libraries in scripts though is a harder problem as you’ll often intersperse the library calls, but sometimes it’s possible to refactor and separate the use of the libraries so they can be in separate processes by making some kind of intermediate output.
If there’s some public code, it might be easier to give more concrete advice on.