AWS Batch: Spot Instance - How does Nextflow log an interruption?

General question. I am trying to make the decision about which processes in my nextflow pipeline should utilize spot instances. As a part of this I would like to better understand how Nextflow logs spot instance interruptions? Is there specific information I can look for in the .command.log or trace file that would clue be into these interruptions? Any Nextflow documentation link to back up your thoughts would be appreciated. Sincerely appreciate your time.

Iam trying to make the decision about which processes in my nextflow pipeline should utilize spot instances.

In theory, one should pick fault-tolerant workloads that can handle interruptions gracefully, or that are not expensive to fail. If you have an extremely long task that will cost a lot of money for each sample, using spot machines may not be the best approach. However, with Fusion Snapshots, one shouldn’t worry about this, as when the task is retried, Fusion will make it start where it stopped before the interruption.

how Nextflow logs spot instance interruptions? Is there specific information I can look for in the .command.log or trace file that would clue be into these interruptions?

The cloud provider is the one killing the task, so we rely on them to give us enough information to identify the task was finished due to the spot machine being reclaimed. For AWS Batch, there is a Nextflow configuration that sets the number of times Nextflow can retry a task run in a spot machine that was reclaimed, which is aws.batch.maxSpotAttempts. You can read more about it here.

Be aware this is different from other cloud providers. For Google Cloud Batch, for example, you have to turn on spot machines in your Nextflow configuration and watch for a specifi exit status to retry.

1 Like

Following up on Revant’s questions about spot instances & Nextflow, I have a few questions about how aws.batch.maxSpotAttempts works.

If I

  1. set aws.batch.maxSpotAttempts to 0,
  2. launch a Nextflow pipeline run with AWS Batch, and then
  3. manually initiate a Spot Instance interruption (AWS documentation link) while a Nextflow task is running on a Spot Instance worker,

what should I expect to see happen? Should the Nextflow task fail and require resubmission? Do those sorts of manual spot instance interruptions fall under “Spot Attempts” for Nextflow and therefore trigger use of aws.batch.maxSpotAttempts? What information would be captured in the Nextflow log files?

Thanks in advance! - Deepank

Please do not ask new questions in other questions. I posted your reply as a new question authored by you here. I will close this topic now.