Adapting existing AWS Batch infrastructure to integrate the Fusion file system

Hi there, I would like to switch to the Fusion file system in my existing AWS Batch infrastructure configuration (which includes specific IAM roles with appropriate policies, custom AMIs with and without EBS-autoscaling, computer environments and job queues able to use SPOT instances or EC2 on demand, etc. ) but I don’t fully understand how to change all this to make it compatible with Fusion. Are there specific instances I should use? And what should I do about EBS autoscaling? It is also not obvious to me how to specify the tmp in the NVMe storage. Thank you for your help!

If you use Fusion and set process.scratch = false then you don’t need EBS-autoscaling.

You can use Fusion with EBS or with NVMe disk (recommended way).

  • With EBS: you only need to provide a good enough EBS boot disk. Good enough depends on the load and kind of pipelines that you run, but I’d recommend to test with a ~325MB/s throughput and ~100GB size EBS gp3 disk.

  • With NVMe: highly recommended for cost efficiency (specially when using big files). Then you need three things:

    1. To use EC2 families with NVMe disk (recommended s6id, m6id, r6id families)
    2. To format and mount all disks at the EC2 launch template into a host path (ex: /mynvmedisks)
    3. To mount that host path inside the containers as /tmp. (ex: aws.batch.volumes = '/mynvmedisks:/tmp')
1 Like

Thank you very much @jordeu for the reply, that makes it much clearer! Can I just ask you what you mean in the point 2 when you say format all disks? You mean running something like the following in the launch template?

sudo file -s /dev/nvme1n1
sudo mkfs -t xfs /dev/nvme1n1
sudo mkdir /mynvmedisks
sudo mount /dev/nvme1n1 /mynvmedisks

Thanks!

Here a more complete example that can manage multiple NVMe disks:

mkdir -p /scratch/fusion
NVME_DISKS=($(nvme list | grep 'Amazon EC2 NVMe Instance Storage' | awk '{ print $1 }'))
NUM_DISKS=${#NVME_DISKS[@]}
if (( NUM_DISKS > 0 )); then
  if (( NUM_DISKS == 1 )); then
    mkfs -t xfs ${NVME_DISKS[0]}
    mount ${NVME_DISKS[0]} /scratch/fusion
  else
    pvcreate ${NVME_DISKS[@]}
    vgcreate scratch_fusion ${NVME_DISKS[@]}
    lvcreate -l 100%FREE -n volume scratch_fusion
    mkfs -t xfs /dev/mapper/scratch_fusion-volume
    mount /dev/mapper/scratch_fusion-volume /scratch/fusion
  fi
fi
chmod a+w /scratch/fusion

You need to install yum install nvme-cli

1 Like