Forwarding AWS credentials to local container?

I want to handle the scenario where process execution is dispatched to local or cluster resources but data are in S3 intelligent tiering. To accommodate this, my first process checks whether the input are in archive and restores them if needed; the implementation details are not relevant to this post. My problem is that the script is unable to locate S3 credentials specifically when running in a container.

sample main.nf:

process CheckS3Glacier {
  conda 'awscli=2.17.46'
  container 'amazon/aws-cli:2.17.46'
  input:
    val objects  // Making this a path tries to stage.
                 // Attempting to stage objects from
                 // Glacier is an error.
  output:
    stdout

  script:
  """
  aws sts get-caller-identity
  """
}

workflow {
  Channel.of(
    file(params.INPUT)
      .listFiles()
      .collect{ fl -> fl.toUriString() }
  )
  | CheckS3Glacier
  | view
}

nextflow.config:

aws {
  profile = 'my_credentials_profile'
}

profile {
  conda {
    conda.enabled = true
  }
  singularity {
    singularity.enabled = true
    process.containerOptions '-B /home'
  }
  docker {
    docker.enabled = true
    process.containerOptions = '--net host'
  }
}

Observed output with -profile conda: a pretty-printed JSON with keys “UserId”, “Account”, and “Arn”, and 100% of tasks completed successfully.

Observed output with -profile singularity:

ERROR ~ Error executing process > 'CheckS3Glacier (1)'

Caused by:
  Process `CheckS3Glacier (1)` terminated with an error exit status (253)


Command executed:

  aws sts get-caller-identity

Command exit status:
  253

Command output:
  (empty)

Command error:
  
  Unable to locate credentials. You can configure credentials by running "aws configure".

Work dir:
  <redacted>

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

I’ve got a temporary workaround using secrets, but this puts the onus on the user to define those secrets themselves and, when they rotate, manually update them separately in every place they have nextflow installed.

Hi Scott!

I was going to suggest the nextflow secrets approach but it sounds like you’ve already found it. You’re right that it definitely puts a lot of the burden on the users to manage those credentials.

Another option that you might explore on the open source side is using the aws batch executor. You would start by creating an AWS batch compute environment in your AWS account, and then you simply reference the aws batch queue in your pipeline config. In this scenario, nextflow will use the permissions of the role assumed by the instances in your aws batch compute environment, so there’s no need to make secret credentials available.

For the aws batch option, your users would just need the appropriate permissions for submitting jobs to aws batch and all of the permissions of the head node and worker nodes will be handled by nextflow.

As an aside, has your team explored Seqera Platform as an option for your needs? One of the things where it really excels is central management of AWS credentials so there’s no need for users to deploy there own for use with nextflow secrets or aws batch.

Hi Ken.

With awsbatch, I still have to explicitly forward the worker instance’s network to the job container (process.containerOptions = '--net host'). As far as I can tell, there is no equivalent option with singularity. I’m not sure if I was clear but part of my requirements is to support on-premises batch executors in addition to awsbatch.

I’m not at liberty to publicly discuss my company’s evaluation of Tower and Seqera Platform.