Resume failing due to change in wave container path

Hi there,

We’ve been having some trouble with the resume functionality when launching pipelines on platform. We don’t experience the same issue locally.

When resuming a pipeline on an updated version of the branch with a new commit, the caches fail and the pipeline re-starts from the top. We would expect the caches of all processes upstream of the change introduced by the commit to be unchanged, and for the resume to start from the first changed process. When resuming a pipeline without incorporating any new commits (ie on an identical version of the branch), the resume functionality works as expected.

I have been investigating this issue via the cache hashes, and have narrowed the difference down to the container fingerprint hash, which differs in the resumed run. We have not made any changes to our containers between runs. When I look at the individual processes on platform, under the ‘resources requested’ header, I can see that the path to the container differs in the resumed run. The format of the path is
wave.seqera.io/wt/some_string/biocontainers/gtfparse:1.2.1--pyh864c0ab_0. It is the string in the middle of the path that is changing.

Our containers are all in quay.io. I don’t know much about the wave/fusion system and how this path is constructed. Is the commit hash used to construct this path?

I’ve had a look through existing issues and couldn’t spot any similar posts.

Any help in diagnosing this issue is much appreciated, as we are losing a lot of time without resume working correctly when we make pipeline updates!

2 Likes

Hello, what version of Nextflow are you using ?

Morning Paolo. I’m using v 23.10.1. I’ll try updating this to Nextflow 24.04.2 and test the resume failure again and get back to you.

Hi again Paolo. We set up a new compute environment using ‘Batch Forge’ which we expected would install the latest version of Nextflow, however the version is still appearing as 23.10.1. Are there any additional config options that I have overlooked that will force install v24?

Yes, this issues has been fixed in Nextflow 24.04.x, however Platform is still using 23.10.x. You can bump the nextflow version by adding in the launch pre-run script field the following environment variable

export NXF_VER=24.04.0

We included the version environment variable in the pre-run script field like so:

However the launch failed. I am unable to download the logs so I have pasted the output from the GUI below.

Downloading nextflow dependencies. It may require a few seconds, please wait ..

2CAPSULE: Downloading dependency ch.qos.logback:logback-core:jar:1.4.14

3CAPSULE: Downloading dependency com.fasterxml.jackson.core:jackson-databind:jar:2.17.0

4CAPSULE: Downloading dependency org.yaml:snakeyaml:jar:2.2

5CAPSULE: Downloading dependency org.eclipse.jgit:org.eclipse.jgit:jar:6.6.1.202309021850-r

6CAPSULE: Downloading dependency org.apache.ivy:ivy:jar:2.5.2

7CAPSULE: Downloading dependency com.google.guava:guava:jar:33.0.0-jre

8CAPSULE: Downloading dependency com.google.errorprone:error_prone_annotations:jar:2.23.0

9CAPSULE: Downloading dependency org.apache.groovy:groovy-templates:jar:4.0.21

10CAPSULE: Downloading dependency com.google.guava:failureaccess:jar:1.0.2

11CAPSULE: Downloading dependency io.nextflow:nextflow:jar:24.04.0

12CAPSULE: Downloading dependency com.googlecode.javaewah:JavaEWAH:jar:1.2.3

13CAPSULE: Downloading dependency org.apache.groovy:groovy-xml:jar:4.0.21

14CAPSULE: Downloading dependency ch.qos.logback:logback-classic:jar:1.4.14

15CAPSULE: Downloading dependency com.fasterxml.jackson.core:jackson-annotations:jar:2.17.0

16CAPSULE: Downloading dependency net.bytebuddy:byte-buddy:jar:1.14.9

17CAPSULE: Downloading dependency org.apache.groovy:groovy-yaml:jar:4.0.21

18CAPSULE: Downloading dependency org.pf4j:pf4j:jar:3.10.0

19CAPSULE: Downloading dependency org.apache.groovy:groovy-json:jar:4.0.21

20CAPSULE: Downloading dependency org.apache.groovy:groovy-nio:jar:4.0.21

21CAPSULE: Downloading dependency org.checkerframework:checker-qual:jar:3.41.0

22CAPSULE: Downloading dependency io.nextflow:nf-httpfs:jar:24.04.0

23CAPSULE: Downloading dependency io.nextflow:nf-commons:jar:24.04.0

24CAPSULE: Downloading dependency ch.artecat.grengine:grengine:jar:3.0.2

25CAPSULE: Downloading dependency com.fasterxml.jackson.core:jackson-core:jar:2.17.0

26CAPSULE: Downloading dependency com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:jar:2.17.0

272/7424 KB

28Downloading plugin nf-amazon@2.5.1

29WARN: Unable to start plugin 'nf-amazon' required by s3://csg-tower-bucket/scratch/36QWyueAawA304

30ERROR ~ Missing plugin 'nf-amazon' required to read file: s3://csg-tower-bucket/scratch/36QWyueAawA304

Same issue here using platform and the wave system. nf-amazon@2.5.1 missing plugin.

Same log for each task:

I think I have a solution for it. It is just change the containers from x86_64 to ARM64, if we don’t have this change we will get this exec format error. Is platform prepared to handle gravitron 3 or 4? Thanks guys for your outstanding work leading seqera. I hope this may help.