Frequently failing runs using AWS Batch Spot

sameer_abraham · October 24, 2024, 1:30pm

Hi,

I’m relatively new to Nextflow/Seqera and I’m hoping to get some help using AWS Batch compute environments for running pipelines.

I am currently using a spot provisioning model to run the nf-corre/methylseq pipeline and I find my runs failing because of instances being taken away from me (see screenshot). I understand that this is part of the deal with the spot model but it is now happening frequently enough that it’s affecting project timelines.

Reading around, it looks like batch parameters like max number of retries and spot price bid could help a bit. However, I’m hoping that there may some better solutions out there to reduce the frequency of such events

Appreciate any help/insight that you can provide!

mribeirodantas · November 15, 2024, 12:13am

Hi @sameer_abraham,

Options will vary depending on your cloud provider.

In Nextflow, the standard approach to handle spot instance interruptions is to automatically retry the task, which means the task will restart from the beginning each time it’s interrupted. While effective, this can add extra time and resource usage if interruptions happen frequently.

For more advanced handling, solutions like Fusion Snapshots in our enterprise offerings allow tasks to resume from their last checkpoint instead of starting over, significantly reducing the impact of interruptions.

Unfortunately, we can’t control how often spot instances are reclaimed; this depends on the policies of the specific cloud provider. You may want to check with them for any options to manage or minimize these interruptions.

Let us know if you have further questions!

Topic		Replies	Views
How does aws.batch.maxSpotAttempts work? Ask for help nextflow , aws	1	48	February 4, 2025
AWS Batch: Spot Instance - How does Nextflow log an interruption? Ask for help nextflow , aws	4	112	February 4, 2025
Nextflow pipeline stalling in the middle Ask for help	0	18	November 28, 2024
Host EC2 Terminated Ask for help nextflow , platform	0	154	July 17, 2024
Unable to run parallel threads Ask for help nextflow , aws	4	181	April 23, 2024

Frequently failing runs using AWS Batch Spot

Related topics