Platform launch folders and concurrent Nextflow runs

Hello,
I’m trying to use the Seqera Platform to submit multiple instances of the same pipeline, with different inputs, over a SLURM cluster. So far, I’ve been able to successfully run the individual instances, but if I were to send more than one at the same time the SLURM jobs get stuck (they keep running, but indefinitely). I believe the issue is that the Nextflow run instances are executed from the same directory, and this messes up the .nextflow folder content and, hence, all the executions.

Now, I’ve tried several configurations for both Pipeline and Compute Environment, however, I’m a bit confused by the use of the Work and Launch directories. Considering that when I launch a pipeline, the latter gets cloned/downloaded into a “bucket” specified by the Pipeline’s “Work directory”, I thought that Nextflow would have been run from this directory, and so the .nextflow folder would not be affected by other concurrent runs; however, this actually goes into the Compute Environment “Work directory”, which is the same for runs that use the same Compute Environment, and cannot be changed within the Pipeline submission form, thus creating the issue of a shared .nextflow folder and SLURM getting stuck.

Is my reasoning sound or have I misunderstood the workflow structure and, therefore, wrongly configured the environment? Is there a work-around for running multiple pipeline instances that does not involve duplicating the same Compute Environment, and having each duplicate with a unique “Work directory”?

For context, these are the folders that the pipeline uses and, one question I have is if and how one can set the launchDir to the projectDir, which contains the cloned/downloaded pipeline.

launchDir       : /mnt/scratch/pipeline
workDir         : /mnt/scratch/seqera_buckets
projectDir      : /mnt/scratch/seqera_buckets/.nextflow/pipelines/9dccf661/pipeline

Thank you.

Best regards

Hi Federico,

To clarify the base concepts here:

  • The work directory is where Nextflow stores intermediate results per task. These have random-looking names, so you’ll see directories like /a5/0dd25a419a0d2cd58d2f80656eb4e9. It should not matter if you share the top-level work directory across runs, or even workflows, people choose how to do that based on their cleanup strategies. You should actually be able to set the working directory per workflow run (via the ‘launch settings’) if you have sufficient permissions, but you probably don’t need to.
  • The launch directory is where Nextflow is ‘sitting’ when it launches the workflow, and where the log files get placed for a run. You will see randomly named files in that directory that allow multiple things to be run there concurrently- like ‘nf-3pKOpV0D0nhxsb.launcher.sh’.

Nextflow will clone the workflow into a local directory (later accessible via projectDir) to run it, as you have seen.

It’s better to separate the work and launch directories so that you can easily clean up intermediate files from a run without impacting on your logs etc, but it’s not absolutely required.

I don’t see anything inherently problematic in the way you describe your setup, and I’ve just double checked that we can run multiple things concurrently via our Slurm setup (we can).

Could you tell us a little more about your setup, and the problems you’re having?

  • If you submit multiple runs of the workflow, how long are you leaving things before you conclude they’ve been running too long?
  • Can you provide nextflow logs from ‘stalled’ runs?
  • What does the compute infrastructure look like? e.g. is this local HPC, or cloud deployed Slurm? What sort of file system is in use?

Hopefully with some more information we can figure out what’s going on here.