Hello everyone,
I am struggling to run nf‑core/rnaseq v3.18.0 (Nextflow v25.04.2) on the HPC system—specifically the ORF queue, which enforces 55 ≤ CPU ≤ 110 for every Slurm job. Light steps (FastQC, gunzip, Trim Galore, MultiQC) request far fewer CPUs, so my jobs are rejected or my user account is temporarily suspended for violating utilisation/QoS rules.
Below I provide **the full nextflow.config
profile and the corresponding **params.json
so that you can see exactly what I am submitting. Any suggestions—whether batching patterns, queue settings, or general experience running Nextflow on similarly restrictive queues—would be greatly appreciated.
1 • Cluster / Queue Environment
Item | Value |
---|---|
Scheduler | SLURM |
Queue (production) | ORF |
ORF policy | Exactly 55 or 110 CPU (multiples of 55), ≤ 2 GB × CPU, -C weka , max 72 h |
Debug/test queue | test → max 4 h walltime, queueSize hard‑capped at 8 |
Container runtime | Apptainer |
Nextflow module | not available as a cluster module; I load my own binary |
Dataset | 246 paired‑end samples → 492 FASTQ files, ≈ 594 GB |
not available as a cluster module; I load my own binary |
Practical consequences
- Jobs that request < 55 CPU are rejected instantly (“Invalid qos/partition”), or run but trigger QoS action → account temporarily suspended.
- The
test
queue (4 h / queueSize 8) is too short to complete even a minimal run; cannot debug on head node either. - Admins are aware but, so far, cannot relax the 55‑CPU floor.
2 • Full nextflow.config
// nextflow.config
// Scratch directories
def USER_SCRATCH = System.getenv('PWD') ?: "/arf/scratch/$USER"
def HOME_DIR = System.getenv('HOME') ?: "/arf/home/$USER"
def LOCAL_SCRATCH = System.getenv('LOCAL_SCRATCH') ?: "/tmp/$USER"
cleanup = true
params {
config_profile_description = 'The TRUBA cluster profile'
config_profile_contact = ''
config_profile_url = ''
}
env {
APPTAINER_TMPDIR = "${LOCAL_SCRATCH}/apptainer"
APPTAINER_CACHEDIR = "${HOME_DIR}/.apptainer"
OMP_NUM_THREADS = 1
}
apptainer {
cacheDir = "${HOME_DIR}/.apptainer/NXF_CacheDir"
libraryDir = "${HOME_DIR}/.apptainer/NXF_LibDir"
enabled = true
autoMounts = true
}
executor {
name = 'slurm'
queueSize = 50 // production queue; debug queue hard-cap 8
pollInterval = '30 sec'
exitReadTimeout = '10 min'
jobName = { "${task.process.split(':').last()}" }
submitRateLimit = '60/1min'
queueGlobalStatus = false
perCpuMemAllocation = false
}
process {
scratch = "${LOCAL_SCRATCH}"
executor = 'slurm'
stageInMode = 'symlink'
stageOutMode = 'rsync'
errorStrategy = {
sleep( (Math.pow(2, (task.attempt ?: 1)) * 200) as long )
return 'retry'
}
maxRetries = 4
cache = 'lenient'
afterScript = 'sleep 20'
}
profiles {
standard {
process.executor = 'local'
}
arf {
process {
resourceLimits = [cpus:110, memory:256.GB, time:24.h]
cpus = { task.attempt > 1 ? (task.previousTrace.cpus + 2) : 3 }
memory = { task.attempt > 1 ? (task.previousTrace.memory * 1.5) : 6.GB }
withLabel:process_single {
cpus = 3
memory = { 6.GB * task.attempt }
time = { 4.h * task.attempt }
queue = 'ORF'
clusterOptions = ['-A $USER', '-N 1', '-n 1', '-C weka']
}
withLabel:process_low {
cpus = { 6 * task.attempt }
memory = { 12.GB * task.attempt }
time = { 4.h * task.attempt }
queue = 'ORF'
clusterOptions = ['-A $USER', '-N 1', '-n 1', '-C weka']
}
withLabel:process_medium {
cpus = { 18 * task.attempt }
memory = { 36.GB * task.attempt }
time = { 8.h * task.attempt }
queue = 'ORF'
clusterOptions = ['-A $USER', '-N 1', '-n 1', '-C weka']
}
withLabel:process_high {
cpus = { 36 * task.attempt }
memory = { 72.GB * task.attempt }
time = { 16.h * task.attempt }
queue = 'ORF'
clusterOptions = ['-A $USER', '-N 1', '-n 1', '-C weka']
}
withLabel:process_long {
time = { 20.h * task.attempt }
queue = 'ORF'
clusterOptions = ['-A $USER', '-N 1', '-n 1', '-C weka']
}
}
}
}
report {
enabled = true
dag.enabled = true
}
3 • Full params.json
{
"input": "samplesheet.csv",
"outdir": "/NXF-StarRsem",
"multiqc_title": "GSEXX",
"fasta": "GRCh38.primary_assembly.genome.fa.gz",
"gtf": "gencode.v48.primary_assembly.annotation.gtf.gz",
"transcript_fasta": "gencode.v48.transcripts.fa.gz",
"gencode": true,
"gtf_extra_attributes": "gene_name,gene_id",
"gtf_group_features": "gene_id,gene_name",
"extra_trimgalore_args": " --quality 20 --length 30 --stringency 3",
"min_trimmed_reads": 20000000,
"remove_ribo_rna": true,
"aligner": "star_rsem",
"min_mapped_reads": 50,
"save_merged_fastq": true,
"save_non_ribo_reads": true,
"save_reference": true,
"save_trimmed": true,
"save_align_intermeds": true,
"save_unaligned": true,
"save_kraken_assignments": true,
"save_kraken_unassigned": true,
"contaminant_screening": "kraken2_bracken",
"kraken_db": "bb",
"skip_umi_extract": true,
"skip_linting": true,
"skip_pseudo_alignment": true
}
4 • Key Problems
- Job rejections & suspensions for any light process requesting < 55 CPU.
- Debug queue (
test
) caps walltime at 4 h and queueSize at 8 → impossible to test full pipeline. - Head‑node debugging disallowed; interactive
srun
needs ≥ 55 CPU which is not feasible for small tests. - Nextflow module is not provided by the HPC; admins have not added it despite requests.
5 • Questions to the Community
ORF queue recap : CPU request must be exactly 55 or 110 (multiples of 55), 2 GB × CPU RAM cap, -C weka
, 24 h walltime.
- Full‑parallel single‑job strategy (without
local
executor): Has anyone succeeded in running both the Nextflow master process and all tasks inside a single 55 / 110‑CPU Slurm allocation while still using the Slurm executor (noexecutor = 'local'
fallback)? For example, by submitting one job with--ntasks
/--cpus-per-task
that maps to internal process needs, settingqueueSize =
50 , and packaging steps via DSL 2groupKey
so that every core is utilised. I need full node parallelisation and cannot risk the reduced throughput of a local executor. - Reliable Nextflow launch methods: Login/head nodes kill long‑running processes, and interactive
srun
demands ≥ 55 CPUs. Which launch pattern actually works on ORF? Examples:sbatch --wrap 'nextflow run …'
, a tiny “driver” task inside the same 55/110‑CPU allocation, or any custom nf‑launcher script used successfully under strict minimum‑CPU rules. - Efficient debugging : How do you validate a subset (< 19 samples) when even the debug queue rejects small CPU requests and Nextflow’s head‑node timeouts kick in?
- maxForks / array jobs : What is the best pattern for setting
maxForks
,queueSize
, and DSL 2groupKey
so that each Slurm submission scales resource requests proportionally (55 or 110 CPU) rather than fixed tiny chunks, thereby avoiding utilisation warnings and quota downgrades? - Last‑wave batching : How can I automatically adapt the sample‑per‑job count so the final wave still requests ≥ 55 CPU?
- Real‑world success stories of running Nextflow on clusters with a strict minimum CPU rule—any shared modules, pilot job scripts, or custom
clusterOptions
tricks would help.
Any guidance on these points will be invaluable—thank you!