Nf-core/rnaseq v3.18.0 – Minimum‑55‑CPU rule, account suspensions & debug‑queue limits

Hello everyone,

I am struggling to run nf‑core/rnaseq v3.18.0 (Nextflow v25.04.2) on the HPC system—specifically the ORF queue, which enforces 55 ≤ CPU ≤ 110 for every Slurm job. Light steps (FastQC, gunzip, Trim Galore, MultiQC) request far fewer CPUs, so my jobs are rejected or my user account is temporarily suspended for violating utilisation/QoS rules.

Below I provide **the full nextflow.config profile and the corresponding **params.json so that you can see exactly what I am submitting. Any suggestions—whether batching patterns, queue settings, or general experience running Nextflow on similarly restrictive queues—would be greatly appreciated.

1 • Cluster / Queue Environment

Item Value
Scheduler SLURM
Queue (production) ORF
ORF policy Exactly 55 or 110 CPU (multiples of 55), ≤ 2 GB × CPU, -C weka , max 72 h
Debug/test queue test → max 4 h walltime, queueSize hard‑capped at 8
Container runtime Apptainer
Nextflow module not available as a cluster module; I load my own binary
Dataset 246 paired‑end samples → 492 FASTQ files, ≈ 594 GB
not available as a cluster module; I load my own binary

Practical consequences

  • Jobs that request < 55 CPU are rejected instantly (“Invalid qos/partition”), or run but trigger QoS action → account temporarily suspended.
  • The test queue (4 h / queueSize 8) is too short to complete even a minimal run; cannot debug on head node either.
  • Admins are aware but, so far, cannot relax the 55‑CPU floor.

2 • Full nextflow.config

// nextflow.config

// Scratch directories
def USER_SCRATCH   = System.getenv('PWD')  ?: "/arf/scratch/$USER"
def HOME_DIR       = System.getenv('HOME') ?: "/arf/home/$USER"
def LOCAL_SCRATCH  = System.getenv('LOCAL_SCRATCH') ?: "/tmp/$USER"

cleanup = true

params {
    config_profile_description = 'The TRUBA cluster profile'
    config_profile_contact     = ''
    config_profile_url         = ''
}

env {
    APPTAINER_TMPDIR   = "${LOCAL_SCRATCH}/apptainer"
    APPTAINER_CACHEDIR = "${HOME_DIR}/.apptainer"
    OMP_NUM_THREADS    = 1
}

apptainer {
    cacheDir        = "${HOME_DIR}/.apptainer/NXF_CacheDir"
    libraryDir      = "${HOME_DIR}/.apptainer/NXF_LibDir"
    enabled         = true
    autoMounts      = true
}

executor {
    name                = 'slurm'
    queueSize           = 50   // production queue; debug queue hard-cap 8
    pollInterval        = '30 sec'        
    exitReadTimeout     = '10 min'
    jobName             = { "${task.process.split(':').last()}" }
    submitRateLimit     = '60/1min'
    queueGlobalStatus   = false
    perCpuMemAllocation = false
}

process {
    scratch        = "${LOCAL_SCRATCH}"
    executor       = 'slurm'
    stageInMode    = 'symlink'
    stageOutMode   = 'rsync'
    errorStrategy  = {
        sleep( (Math.pow(2, (task.attempt ?: 1)) * 200) as long )
        return 'retry'
    }
    maxRetries     = 4
    cache          = 'lenient'
    afterScript    = 'sleep 20'
}

profiles {
  standard {
    process.executor = 'local'
  }

  arf {
    process {
      resourceLimits = [cpus:110, memory:256.GB, time:24.h]
      cpus   = { task.attempt > 1 ? (task.previousTrace.cpus + 2) : 3 }
      memory = { task.attempt > 1 ? (task.previousTrace.memory * 1.5) : 6.GB }

      withLabel:process_single {
        cpus   = 3
        memory = { 6.GB * task.attempt }
        time   = { 4.h * task.attempt }
        queue  = 'ORF'
        clusterOptions = ['-A $USER', '-N 1', '-n 1', '-C weka']
      }

      withLabel:process_low {
        cpus   = { 6 * task.attempt }
        memory = { 12.GB * task.attempt }
        time   = { 4.h * task.attempt }
        queue  = 'ORF'
        clusterOptions = ['-A $USER', '-N 1', '-n 1', '-C weka']
      }

      withLabel:process_medium {
        cpus   = { 18 * task.attempt }
        memory = { 36.GB * task.attempt }
        time   = { 8.h * task.attempt }
        queue  = 'ORF'
        clusterOptions = ['-A $USER', '-N 1', '-n 1', '-C weka']
      }

      withLabel:process_high {
        cpus   = { 36 * task.attempt }
        memory = { 72.GB * task.attempt }
        time   = { 16.h * task.attempt }
        queue  = 'ORF'
        clusterOptions = ['-A $USER', '-N 1', '-n 1', '-C weka']
      }

      withLabel:process_long {
        time   = { 20.h * task.attempt }
        queue  = 'ORF'
        clusterOptions = ['-A $USER', '-N 1', '-n 1', '-C weka']
      }
    }
  }
}

report {
    enabled = true
    dag.enabled = true
}

3 • Full params.json

{
  "input": "samplesheet.csv",
  "outdir": "/NXF-StarRsem",
  "multiqc_title": "GSEXX",
  "fasta": "GRCh38.primary_assembly.genome.fa.gz",
  "gtf": "gencode.v48.primary_assembly.annotation.gtf.gz",
  "transcript_fasta": "gencode.v48.transcripts.fa.gz",
  "gencode": true,
  "gtf_extra_attributes": "gene_name,gene_id",
  "gtf_group_features": "gene_id,gene_name",
  "extra_trimgalore_args": " --quality 20 --length 30 --stringency 3",
  "min_trimmed_reads": 20000000,
  "remove_ribo_rna": true,
  "aligner": "star_rsem",
  "min_mapped_reads": 50,
  "save_merged_fastq": true,
  "save_non_ribo_reads": true,
  "save_reference": true,
  "save_trimmed": true,
  "save_align_intermeds": true,
  "save_unaligned": true,
  "save_kraken_assignments": true,
  "save_kraken_unassigned": true,
  "contaminant_screening": "kraken2_bracken",
  "kraken_db": "bb",
  "skip_umi_extract": true,
  "skip_linting": true,
  "skip_pseudo_alignment": true
}

4 • Key Problems

  1. Job rejections & suspensions for any light process requesting < 55 CPU.
  2. Debug queue (test ) caps walltime at 4 h and queueSize at 8 → impossible to test full pipeline.
  3. Head‑node debugging disallowed; interactive srun needs ≥ 55 CPU which is not feasible for small tests.
  4. Nextflow module is not provided by the HPC; admins have not added it despite requests.

5 • Questions to the Community

ORF queue recap : CPU request must be exactly 55 or 110 (multiples of 55), 2 GB × CPU RAM cap, -C weka , 24 h walltime.

  1. Full‑parallel single‑job strategy (without local executor): Has anyone succeeded in running both the Nextflow master process and all tasks inside a single 55 / 110‑CPU Slurm allocation while still using the Slurm executor (no executor = 'local' fallback)? For example, by submitting one job with --ntasks /--cpus-per-task that maps to internal process needs, setting queueSize = 50 , and packaging steps via DSL 2 groupKey so that every core is utilised. I need full node parallelisation and cannot risk the reduced throughput of a local executor.
  2. Reliable Nextflow launch methods: Login/head nodes kill long‑running processes, and interactive srun demands ≥ 55 CPUs. Which launch pattern actually works on ORF? Examples: sbatch --wrap 'nextflow run …' , a tiny “driver” task inside the same 55/110‑CPU allocation, or any custom nf‑launcher script used successfully under strict minimum‑CPU rules.
  3. Efficient debugging : How do you validate a subset (< 19 samples) when even the debug queue rejects small CPU requests and Nextflow’s head‑node timeouts kick in?
  4. maxForks / array jobs : What is the best pattern for setting maxForks , queueSize , and DSL 2 groupKey so that each Slurm submission scales resource requests proportionally (55 or 110 CPU) rather than fixed tiny chunks, thereby avoiding utilisation warnings and quota downgrades?
  5. Last‑wave batching : How can I automatically adapt the sample‑per‑job count so the final wave still requests ≥ 55 CPU?
  6. Real‑world success stories of running Nextflow on clusters with a strict minimum CPU rule—any shared modules, pilot job scripts, or custom clusterOptions tricks would help.

Any guidance on these points will be invaluable—thank you!

In your case, I would go for the approach of requesting 55 CPUs in a single node and running Nextflow there with the local executor. Otherwise, you would need to create a new executor that instead of launching jobs with sbatch launches job steps with srun while inside the allocation, but I’m not aware of something like that already created.
In the worst case, always ask for 55 CPUs for every job, despite using only one, but that will be a huge waste of resources.

At the end, the issue is that you are trying to use a tool that is not right for your task. You need a different cluster/QoS/partition more suited to your needs.

Than your for your response @Poshi

Could it be sensible to set maxForks to the core count minus 2 (e.g., 53 on a 55-CPU allocation) so that, under the local executor, no cores remain idle?

I would check the resourceLimits directive rather than limiting the number of forks. But I’m not sure if that is the best option.