Shared memory bug?

We are getting the error "Error executing request, Exception : Invalid shared memory size: 0.", which is appearing for code that was working fine just a few days ago. Our code sets the shared memory for the docker instance using the process directive:

process {
    withName: 'TRAIN' {
        container = "blahblah"
        containerOptions = '--shm-size 32768'
    }
}

Was some new code pushed that resulted in an issue with how shared memory is parsed? This is a major blocker on all of our jobs.

Welcome to the community forum, @Eric_Kofman.

There was this PR two weeks ago fixing a bug. It’s the only recent change I can remember.

Where are you running this pipeline? AWS Batch?

Yes: Service: AWSBatch; Status Code: 400; Error Code: ClientException; Request ID: 25a12649-b96c-4dc3-92e0-b61bc5a42bf5; Proxy: null).

Hi Eric

This is unexpected. Can you share the Nextflow versions of failed and successful runs?

You can also specify containerOptions as a map. Can you try this:

process {
    withName: 'TRAIN' {
        container = "blahblah"
        containerOptions = 'shm-size': 32768
    }
}

For the one that worked it was version 24.04.3, and for the failed job it was nextflow version 24.04.4

How can we specify which version of Nextflow we want our Towers-launched jobs to use?

Figured it out in the pre-run script, and when we use the old version it works again.

This looks a bit worrying. @robsyme are you able to replicate a 0 shared size with the OP config?

@Eric_Kofman is there anything else you can help us with to get a minimal reproducible example of this?

Phil

I think the issue is that the shm-size is interpreted differently on AWS batch and on Docker.

When running with Docker, the --shm-size can take a unit suffix, but by default is measured in bytes.
On AWS Batch, the shm-size is interpreted as MiB (docs).

In 24.04.3, we just passed the argument directly through to the executor, which has two problems:

  1. The number would be interpreted differently on each executor, giving inconsistent behaviours between executors and breaking portability of the workflow.
  2. Docker can accept unit suffixes, but AWS cannot. If a unit suffix was applied, it would work on Docker but break on AWS Batch.

To resolve these issues, 24.04.4 will interpret a unit-less integer as a number of bytes (as per the Docker standard). If you specify a unit suffix, Nextflow will now do the conversion for you and turn the number into MiB.

I would recommend adding a unit suffix, and Nextflow will turn this into the relevant number of MiB for AWS and pass the string directly to Docker (which will do the unit conversion natively)

process {
    withName: 'TRAIN' {
        container = "blahblah"
        containerOptions = '--shm-size 32768M'
    }
}

Makes sense :+1:

Any idea why the error message returns 0 as an invalid memory value, instead of 32768 bytes? I guess they could just be rounding the float number, 0.032768 mb is pretty small…

@Eric_Kofman let us know if the above solves your problem. If so then perhaps we can add a note about this to the Nextflow docs.

The AWS API only accepts integer values in MiB. A request for 0.03 MiB would be invalid.

1 Like