How do Nextflow and AWS Batch are working together on an architectural level?

I am new to both AWS Batch and nf and I have some doubts. I have a question about the interaction between nf and aws. Think that you want to run the nf horse process locally and you want to run the single processes on aws. Nf will pull images from docker and AWS Batch will create copies of the custom AMI I have to run those processes. Correct? If so, how do nf and AWS Batch communicate. How does batch know which docker image should be run on the ec2 instances allocated for the single processes.

I understood that head processes also read and write files from/on the working directory. So there should also be some kind of communication between the processes on aws and the local horse process. It’s this infrastructural part that I am not quite well understanding. Thank you everyone for the attention. Any suggestion is welcomed!

With AWS Batch you create a job queue. When you set up your nextflow config you tell it which job queue to use (among other details). Then when the workflow runs and needs to submit a task, it creates a job specification and submits it to the job queue. The job specification contains all the information batch needs to run the task – docker image, user, etc in addition to the actual bash script to be run.

You also specify an s3 bucket in your nextflow config to serve as the work directory. So the nextflow process will be able to create working directories in there and access the various output files to keep tabs on things.

Hope this helps!

1 Like