Get the number of available cpus in a process

Gullumluvl · October 10, 2024, 3:14pm

Hi,

I would like to tell a task to use as many CPUs as available (using the local executor).

When setting a cpus directive higher than available, Nextflow errors out with:

ERROR ~ Error executing process > 'use_many_cpus'
Caused by:
  Process requirement exceeds available CPUs -- req: 164; avail: 112

So I am guessing a variable like available_cpus might be accessible somewhere, and then I could use it to set cpus:

process use_many_cpus {
    cpus task.available_cpus
    // ...
}

or with more control:

    cpus { [task.available_cpus, params.max_cpus].min() }

Can we do that?

Thanks a lot!

mahesh.binzerpanchal · October 11, 2024, 7:39am

Normally Nextflow detects quite well the number of available cpus and memory when running locally. However in the cases where it doesn’t you should specify it in a config:

executor {
    name             = 'local'
    cpus             = 8
    memory           = 30.GB
}

You can also set the resourceLimits directive to set the maximum possible resources a process should allow.

process {
    resourceLimits = [ cpu: 8, memory: 30.GB, time: 1.d ]
}

This means even if a user config sets a task to use 10 cpus
e.g.

process {
    withName: 'TASK' {
        cpus = 10
    }
}

Then the process will not use more than 8 cpus because of the resourceLimits directive.
You should also ensure that your processes use the task.cpus variable to set the number of cpus to use, e.g.,

process TASK {
    script:
    """
    command --threads ${task.cpus} --in $input --out somefile.ext
    """

    output:
    path "somefile.ext", emit: ext
}

Gullumluvl · October 16, 2024, 12:04pm

Thanks!

resourceLimits is useful in the sense that the process cpus are automatically downscaled as needed.

One extra thing would be nice though: how do we automatically set the number of host CPUs?
An equivalent way of doing it would be to have a configuration setting for the local executor, asking it to downscale the number of requested CPUs instead of failing when that number is too high.

mahesh.binzerpanchal · October 16, 2024, 12:34pm

I’m not quite clear what you’re asking. The host CPU’s are automatically detected, so there’s no normally no need to set anything. Sometimes it does get it wrong though ( I’m not sure which files/commands are checked to get the cores, e.g. lscpu ), and that’s when you supply the executor config as described above.

Ideally you would put this all in a profile, e.g.

profiles {
    standard {
        // default settings
    }

    local {
        executor {
            name   = 'local'
            cpus   = 8
            memory = 30.GB
        }
        process {
            resourceLimits = [ cpu: 8, memory: 30.GB, time: 1.d ]
        }
    }
}

and then call your workflow with

nextflow run <pipeline> -profile local

The profile could be in the nextflow.config with the pipeline, or you can also have one just for your machine or in your launch directory so only you have access to it.

Gullumluvl · October 16, 2024, 1:55pm

Some context: sometimes I run the workflow on a machine with 64 cores, sometimes on one with 112. It would be nice to have the resourceLimits automatically adjusted.

For example in the config file, I can probably do something like this (untested):

process {
    resourceLimits=[ cpu: { 'lscpu -p=CPU | grep -v "^#" | wc -l'.execute().text as Integer },
                  memory: 30.GB,
                  time: 1.d ]
}

The advantage is that I can then set a process task.cpus to its maximum available.

but it’s not so pretty and I would prefer to let Nextflow call the proper cross-platform code.

mahesh.binzerpanchal · October 16, 2024, 2:12pm

Why not use profiles for this?

profiles {
    local64 {
        ...
    }
    local112 {
       ...
    }
}

I also tend to use launch scripts to run nextflow which include clean up and the like, so you could use your bash code to select the profile.

Gullumluvl · October 16, 2024, 2:33pm

Thanks for the suggestion.
It may be just a personal opinion here, but I don’t find this more convenient than defining a command line --host_cpus to my workflow. Also if we want to generalize to include the host RAM, we have to define as many profiles than hosts… I am being fussy here (sorry!) because I expect some automation to be possible…

mahesh.binzerpanchal · October 17, 2024, 6:59am

If it’s your own workflow you can include a parameter to set this. I think EPI2ME workflows do this, e.g. wf-artic/nextflow.config at 21a482fb480df508e85b08a747028aa313444da6 · epi2me-labs/wf-artic · GitHub

But you’re right, if you want to generalise, then yes you need to specify just as many profiles. However, that, to me, is not a task for the workflow developer. The users should be defining profiles for their own environments because they know them best, and they can be so varied. It’s why nf-core gets users to contribute their own config profiles and only provides a few basic ones. Also combined with the wonderful feature that Nextflow configs can be layered, means a user can really control what they’re doing without relying on the developer to predict everyone’s usage.

Topic		Replies	Views
Nextflow CPU limit does not take effect Ask for help nextflow , hpc , nf-core	0	38	March 25, 2025
Using queue size to parallelise local executor Ask for help	2	84	December 2, 2024
Help to understand nf-core base.config Ask for help	1	185	May 1, 2024
Local executor processor utilisation Ask for help	2	200	February 26, 2024
Limiting parallelism with max memory cap? Ask for help	2	172	March 19, 2024

Get the number of available cpus in a process

Related topics