S3 work folder

Hi, when trying to use a s3 folder as a work dir when launching a pipeline from seqera, I get this error:

Pipeline work directory must start with slash character - offending value: s3://scratch/work 

I tried defining workDir from the nextflow.config but it is still using the one defined during the lunch form (where I had to put a non-s3 one)

Hi Miguel

I’m not sure I totally understand your situation. Some quick questions:

  1. These are runs launched from Seqera Platform, correct?
  2. What is the work directory defined in the Compute Environment (CE) in which you are executing the run?
  3. What was the value you trying to set the work directory to?
  4. It sounds like you’d like to override that work directory for a particular pipeline or a particular run. Is that correct?

In general, if you want to override the CE for a particular run, it would be sensible to use the “Work directory” text field as shown below. Is that an option for your situation?

Hi robsyme,

  1. Yes.
  2. It is a general work folder in the netapp
  3. The value is something similar to your screenshot: s3://scratch/work/
  4. Yes
    That’s exactly my situation. When I try overriding using a s3 folder, I get the error: Pipeline work directory must start with slash character - offending value: s3://scratch/work and I can’t submit it. I also tried not overriding the work folder in the “Run parameters” tab but in the Advanced settings, in the nextflow.config, but it doesn’t work either.

Thanks for the info, Miguel.

Are you launching the run using a compute environment configured with an HPC executor (SLURM/PBS/SGE, etc)?

Using an S3 path as the work directory only makes sense when using the AWS Batch executor. It would be more efficient to use a networked filesystems on your HPC rather than reading from and writing to a (potentially) remote object store like S3.

Hi robsyme,

  • Yes, we are using SLURM.
  • We are using minio configured in a local S3. Since it is local, the performance could be enough, right?

Just to clarify, I’m using Slurm with the Tower Agent

Hi @robsyme. Any update on this?

Apologies, Miguel.

When using a non-awsbatch executor, the work directory must be a path on a shared filesystem rather than an S3 bucket. It is difficult for Platform to verify that both the head and compute nodes will have the required permissions to read from and write to that remote location, and is likely to cause errors - even if the performance is sufficient.

Is there a shared filesystem that you can use on this cluster?

I have also created a feature request on your behalf at: Demote message about workdir prefix from error to warning when using s3 buckets as workdir on SLURM | Voters | Seqera Feedback Forum

You can upvote and/or follow this feature for updates.

Thanks, Rob! Already upvoted :slight_smile:
Yes, we have a shared filesystem for the work folder (which we are using as the default one) but our input data is on a local s3 minio. It is a huge amount of data in some runs, so we were interested in using fusion to avoid the staging step and speed up the pipeline execution in general.

Never used neither S3 nor the Tower environment, but did you try the obvious? Did you tried converting the relative path you are trying to use to an absolute one (one that starts with a slash)?

I am not using relative paths, @Poshi. These are s3 storage paths.

Taking the first message you posted I saw:

Pipeline work directory must start with slash character - offending value: s3://scratch/work

That is a relative path. Or at least, it is a relative path according to the standard. Maybe S3 works differently and don’t follow the standards, but I don’t think so.

Check how a s3 path looks, please. e.g. File paths in Amazon S3 - Media2Cloud on AWS

Oh! Ok, I assumed that “scratch” was part of the path, not the bucket name. Then everything seems fine in the naming. Forgot my comment :frowning: