Using queue size to parallelise local executor

I’m running a workflow with a lot of very small steps. In order to avoid lsf scheduler overflow, I’m submitting a single job with 16 cpus and 32GB memory, which runs nextflow with local executor. Here’s the approximate config setup:

singularity {
    enabled = true
}

process {
    withName: '.*' {
        executor = "local"
        cpus = 1
        memory = "1GB"
        }
}

executor {
    name = "local"
    cpus = 14
    memory = "20 GB"
}

This was working OK up to some point, but then nextflow started submitting only a single job at once. After trying multiple options, I was able to re-enable parallelism by adding -qs 14 to nextflow command. The documentation suggests that local executor does not have queue size default value, so I’m not sure how reliable this solution would be.

My intuition is that something in our cluster setup messes up estimation of available cpus by nextflow. The result of check similar to what nextflow uses here

public class CpuCheck {
    public static void main(String[] args) {
        int availableProcessors = Runtime.getRuntime().availableProcessors();
        System.out.println("Available processors (cores): " + availableProcessors);
    }
}

was correctly estimated as 16 cores available for the setup I was using.

Here are some other ideas that did not work for me:

  • switching from singularity to conda
  • specifying -ps pool size parameter to nextflow run command - it seems that pool size for executor gets re-estimated at some point (or maybe I mis-understood what it does in general)
  • playing around with process cpus/memory setting through config or CLI - the above config is setting the limits correctly
  • switching nextflow and workflow versions - these are linked: ampliseq 2.4.0 runs on nextflow 22.10 and ampliseq 2.11.0 runs on nextflow 24.04
  • setting maxForks = 14 in process section of config - not sure this worked at all; no errors were raised though

PS after reading the community forum, it seems that people use additional resource management systems like flux in similar situations - I’m trying to avoid this at the moment for the sake of simplicity

Hi @amakunin,

I’m not sure but I think Nextflow’s recent support to job arrays can be useful here. You can read about it here.

Let me know if that solves your problem.

1 Like

@mribeirodantas, thank you for job arrays hint - did not know this existed. I agree, switching back to job scheduler should be a more sustainable solution, but it would require for me to take a closer look into resource allocation - I might try to do this later on.

1 Like