Help to understand nf-core base.config

Hi,

I am trying to run the nf-core/mnaseseq pipeline on my local machine.

My PC has 32 CPU and 32 GB RAM. In the nextflow.config file I adjusted following params:

max_memory = 30.GB
max_cpus = 28
max_time = 20.h

15 paired-end fastq files are given as input. I also supplied all relevant references as local files; hence, there is no need to pull them from the Internet. But yet, the pipeline is running quite slow. I don’t know why, it does not use all CPUs and memory.

I found a base.config file which contains following code:

process {

  cpus = { check_max( 1 * task.attempt, 'cpus' ) }
  memory = { check_max( 7.GB * task.attempt, 'memory' ) }
  time = { check_max( 4.h * task.attempt, 'time' ) }

  errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
  maxRetries = 1
  maxErrors = '-1'

  // Process-specific resource requirements
  withLabel:process_low {
    cpus = { check_max( 2 * task.attempt, 'cpus' ) }
    memory = { check_max( 14.GB * task.attempt, 'memory' ) }
    time = { check_max( 6.h * task.attempt, 'time' ) }
  }
  withLabel:process_medium {
    cpus = { check_max( 6 * task.attempt, 'cpus' ) }
    memory = { check_max( 42.GB * task.attempt, 'memory' ) }
    time = { check_max( 8.h * task.attempt, 'time' ) }
  }
  withLabel:process_high {
    cpus = { check_max( 12 * task.attempt, 'cpus' ) }
    memory = { check_max( 84.GB * task.attempt, 'memory' ) }
    time = { check_max( 16.h * task.attempt, 'time' ) }
  }
  withLabel:process_long {
    time = { check_max( 20.h * task.attempt, 'time' ) }
  }
  withName:get_software_versions {
    cache = false
  }

}

I am a new nextflow user. As I understand, this configuration file controls how many resources a process will use. But I don’t understand what exactly this check_max() function does?

Your operating system controls how much resources are available for software. Nextflow requests resources to the operating system based on the process directives, but it doesn’t control how much is really used.

If you’re using container technology, maybe it is capping your available resources too, so you gotta configure it to not do that. Check Docker Desktop’s print-screen below, for an example.

It’s also useful if you share with us what you mean by “quite slow”, what speed you expected, and the .nextflow.log.

Answering your last question, the check_max function checks that the process directive won’t ask for values larger than what you set as your max. Let’s say you’re requesting 2GB of memory, but twice that for every new attempt after a failure. With many failures, this 2GB of memory may become 200 GB, but you only have 32 GB. So it will make sure to set memory to 32 for any value larger than 32.