Unable to run parallel threads

Hi,

I am trying to run a nextflow script on a samplesheet of 96 samples. I want to parallelize the pipeline run on multiple samples but on seqera the pipeline runs only a few(1-5) samples at a time. It is random how many concurrent tasks it will run at a time. I have assigned it a compute env with 5000 cpus on-demand and one of the processes in the pipeline is blastn and i have assigned 64cpus, 10gb disk and 10gb memory to the blastn process. With 5000 cpu on-demand available, why am i not able to run multiple samples in parallel?

Nextflow is capable of submitting thousands of jobs, so there’s no reason it can’t do this. We need a few more details to find out what is wrong:

  • What does the pipeline look like? Is there a bottleneck in the pipeline preventing more than 5 jobs being submitted?
  • What other configuration items have you got set?
  • What executor are you using? Local? AWS Batch?
  • What are the resource requirements of each job? How many CPUs, memory, storage etc?

Hi Adam,

Thanks for responding. Let me try and explain better.

My pipeline has multiple processes. For all the processes we have set the follow comp requirements in the config file:

process {
    cpus = 1
    memory = '4 GB'
}

One of the tasks has higher comp requirement and we have set that accordingly:

process BLASTN {
    tag "${sample_id}"
    container "${params.docker_frum}"
    // publishDir "${params.outdir}/${params.internal}/${order_id}/${sample_id}/${params.flow_cell_run_id}/FrumOutput/${params.run_id}/" , mode: 'copy'
    beforeScript 'chmod o+rw .'
    maxRetries 1
    errorStrategy  { task.attempt <= maxRetries  ? 'retry' : 'ignore' }
    cpus 64
    disk '10 GB'
    memory '10 GB'

There is no bottleneck in the pipeline. The run will submit multiple of the BLASTN tasks but only run one at a time.

There arent any other config items set.

we are using aws batch

in the compute environment we have set config mode to “batch forge”, on-demand model, max cpus 5000 and thats all the settings we have applied to the compute env. Is there anything else we need to set in the compute env to allow for parallel tasks?

There is nothing in your example that indicates why your process would only run one at a time. In my experience, it is likely to be the following options:

  • Something in your AWS Batch set up is preventing more than 1 job running at a time, e.g. your AWS Account may have a limit on the number of CPUs in use at one time.
  • The structure of your pipeline prevents more than 1 process being submitted at a time. You haven’t shared the details of your pipeline so I can’t tell from the example provided.
  • You have an additional configuration item such as process.maxForks which is preventing more than 1 process being submitted at a time.

Unfortunately without further details I can’t see what the problem is.

1 Like

it was in fact the aws account. changing the limit on the account solved it, thank you!

2 Likes