One of our pipelines uses the metagenomes assembler SPAdes. This is a memory-hungry tool depending on the sample being assembled, with the complexity of the sample being the main driver for the memory usage (soil samples require large amounts of memory >1TB).
It’s common for SPAdes jobs to fail due to memory problems, in which case the built-in checkpointing mechanism is very useful as it can save a lot of compute by restarting the assembler from where it failed.
The problem is that for the resume/restart mechanism to work, SPAdes needs the precomputed files to be available. This doesn’t play along with Nextflow as each retry gets a different working directory. The bit of the process that handles retries (assuming the working directory is kept) would look like this:
....
// Handle retries
def restart = ""
if (task.attempt > 1) {
// Set of extra flags to restart the assembly process
restart = "--restart-from last"
reads = "" // --restart doesn't allow basic flags to be submitted
}
"""
spades.py \\
$args \\
$metaspades_arg \\
--threads $task.cpus \\
--memory $maxmem \\
$custom_hmms \\
$reads \\
$restart \\
-o ./
...
Is there any way to keep the working directory between retries?