General question on cached Nextflow containers

I have a very general question. Many people use an HPC at our organization and dozens use nf-core/nextflow pipelines. We were considering creating a general cache directory, NXF_APPTAINER_CACHEDIR accessible to everyone, and we would tell people to use that. The idea being that then all the container images would go into one place, instead of each user having their own. Does this sound like a good idea or more like opening a can of worms?

Great question Ramiro

My recommendations would be to use both the cachedir and librarydir variables.

  • Use NXF_APPTAINER_CACHEDIR (per-user location with read+write) for a user’s containers.
  • Use NXF_APPTAINER_LIBRARYDIR (system-wide with users having read-only access) for cluster-wide containers.

Nextflow checks NXF_APPTAINER_LIBRARYDIR first, and if the sif file isn’t there, Nextflow checks NXF_APPTAINER_CACHEDIR. If the container doesn’t exist in either location, Nextflow grabs the container from wherever.com/my-container.sif and then writes it to NXF_APPTAINER_CACHEDIR. I think it makes more sense for sysadmins to manually copy across common containers to the LIBRARYDIR if they see lots of people using them.

You can pull the containers of all of your really common nf-core workflows (using nextflow inspect) and write them into NXF_APPTAINER_LIBRARYDIR as an elevated user to ensure that other users don’t download them.

1 Like

Thank you @robsyme, this is great advice

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.