Hi Everyone …
Issue with Capturing Exit Status in a Piped Command for Memory-Intensive Process
Environment:
Nextflow version: 24.10.0
Executor: local (running on an EC2 machine)
Docker: Enabled (process.container = ‘fragment_env:latest’)
Configuration:
I have the following main.nf script with nextflow.enable.dsl=2:
groovy
process fragmentfilter {
cpus { 4 * task.attempt }
memory { 16.GB * task.attempt }
errorStrategy { task.exitStatus in 137..141 ? 'retry' : 'terminate' }
maxRetries 3
debug true
publishDir "${params.pubdir}/fragment"
input:
file fragBed
file targetBed
output:
path "filtered_file.bed"
script:
"""
echo "Shell options: \$SHELLOPTS"
sort -k1,1V -k2,2n ${fragBed} | intersectBed -sorted -wa -a stdin -b ${targetBed} -f 0.5 | \\
awk -F"\t" -v MQ=30 '(\$2 >= 6) && (\$5 >= MQ)' | cut -f 1,2,3,6 > filtered_file.bed
"""
}
workflow {
fragmentfilter(file(params.fragBed), file(params.targetBed))
}
And here is my nextflow.config:
params {
pubdir = "placeholder for absolute path to results"
fragBed = "placeholder for absolute path to frag file"
targetBed = "placeholder for absolute path to bedfile file"
}
docker {
enabled = true
}
process.shell = ['/bin/bash', '-euo', 'pipefail']
process.container = 'fragment_env:latest'
Problem Description
In the fragmentfilter process, I have a command chain involving sort, intersectBed, awk, and cut. When I run this pipeline, it fails due to an out-of-memory (OOM) error in the first command, sort. Through debugging, I found that sort alone requires around 30 GB of memory to handle the input file (${fragBed}). The issue is that Nextflow doesn’t seem to capture the exit status 137 from the OOM failure within the piped command. As a result, the errorStrategy retry mechanism does not trigger, and Nextflow doesn’t retry the process with increased memory.
According to previous help in the community, I added process.shell = ['/bin/bash', '-euo', 'pipefail']
in nextflow.config to ensure that Nextflow captures the first failed command’s exit status in a pipeline. Despite this setting, Nextflow doesn’t appear to recognize the 137 exit code, and the process does not automatically retry with increased memory.
Observations and Debugging
When I run the sort
command separately, it fails with exit code 137 (confirming an OOM error).
However, within the pipeline, the failure exit code is not properly captured.
I have set the retry strategy as follows to handle OOM errors:
I have set the retry strategy as follows to handle OOM errors:
errorStrategy { task.exitStatus in 137…141 ? ‘retry’ : ‘terminate’ }
My expectation is that, with pipefail enabled, Nextflow should detect the 137 exit status and automatically retry the process with double the memory (as defined in the memory directive).
Question: Why does the pipeline not capture the 137 exit status within the piped command sequence, and what adjustments should I make to ensure that Nextflow retries the process with increased memory upon an OOM error?
Thank you in advance for any guidance!