Nextflow config to switching to GPU in nf-core/sarek pipeline

I’m testing nf-core/sarek deepvariant pipeline on AWS batch and want it to choose GPUs on the “run deepvariant” step, however, the pipeline keeps choosing CPU instances (e.g. r6). I’ve tried a variety of different things, but it always chooses CPU

Here is my nextflow config file, please could you take a look and suggest a solution?

aws {
  batch {
    maxSpotAttempts = 2
  }
}

docker {
  enabled = true
  // Remove global runOptions, use process-specific instead
}

params {
  deepvariant_container = 'docker.io/google/deepvariant:1.6.1-gpu'
  deepvariant_options = "--use_gpu --num_shards 8"
}

process {
  executor = 'awsbatch'
  maxRetries = 2

  errorStrategy = {
    (!task.exitStatus || task.exitStatus == 143) ? 'retry' : 'terminate'
  }

  // Default CPU/general-purpose queue routing
  queue = {
    task.attempt <= 2
      ? 'TowerForge-xxx-work' // CPU spot queue
      : 'TowerForge-xxx' // CPU on-demand queue
  }

  // GPU-specific queue logic
  withLabel: 'gpu' {
    queue = {
      task.attempt <= 2
        ? 'TowerForge-xxx-work'    // GPU spot queue
        : 'TowerForge-xxx'         // GPU on-demand queue
    }
  }

  // Configuration for the DeepVariant GPU process
  withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_DEEPVARIANT:DEEPVARIANT_RUNDEEPVARIANT' {
    label = 'gpu'
    container = params.deepvariant_container
    time = '6h'

    beforeScript = '''
      echo "Checking GPU availability..."
      nvidia-smi || { echo "No GPU found!" >&2; exit 1; }
    '''

    containerOptions = '--gpus all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all'
  }
}

Hi William,

Great question! The key issue here is that you need to use Nextflow’s accelerator directive to request GPU instances from AWS Batch.

When you add the accelerator directive, Nextflow will augment the AWS Batch SubmitJob API call with a GPU resource requirement, which tells AWS Batch to specifically select GPU-enabled instances. Without this directive, AWS Batch doesn’t know you need a GPU and defaults to CPU instances.

Here’s what you need to add to your config:

process {  
  withLabel: 'gpu' {
    accelerator = 1
  }
}

A few additional tips:

  • Queue sharing: You can actually use the same AWS Batch queues for both GPU and CPU jobs - just make sure your Batch Compute Environments have both GPU and CPU instance types available. AWS Batch will automatically select the appropriate instance type based on the job’s resource requirements.

  • Container setup: If you’re using the GPU-optimized AMI, it includes the NVIDIA Container Toolkit which automatically handles GPU driver mounting and sets the NVIDIA_DRIVER_CAPABILITIES. This means you can likely remove the containerOptions configuration.

  • Debugging: Your beforeScript with nvidia-smi is a clever debugging approach - definitely keep that while testing.

So your simplified config for the DeepVariant process would look like:

withName: 'DEEPVARIANT_RUNDEEPVARIANT' {
  label = 'gpu'
  accelerator = 1
  container = params.deepvariant_container
  time = '6h'
  
  beforeScript = '''
    echo "Checking GPU availability..."
    nvidia-smi || { echo "No GPU found!" >&2; exit 1; }
  '''
}

Let me know if this resolves the issue!

2 Likes

I just wanted to add that it looks like the accelerator directive isn’t in the module itself or in any of the config, so we’ll need to add that.

I went ahead and opened a bug report on the Sarek repo!

1 Like

thanks @robsyme , with your configuration changes

+ an increase in our AWS quota for g5 series GPUs

+ a configuration to the compute environment (CE) to have 200B on the boot up disk for the GPU CE

we were able to get it running