Re-use a task working directory on retries

One of our pipelines uses the metagenomes assembler SPAdes. This is a memory-hungry tool depending on the sample being assembled, with the complexity of the sample being the main driver for the memory usage (soil samples require large amounts of memory >1TB).

It’s common for SPAdes jobs to fail due to memory problems, in which case the built-in checkpointing mechanism is very useful as it can save a lot of compute by restarting the assembler from where it failed.

The problem is that for the resume/restart mechanism to work, SPAdes needs the precomputed files to be available. This doesn’t play along with Nextflow as each retry gets a different working directory. The bit of the process that handles retries (assuming the working directory is kept) would look like this:

    // Handle retries
    def restart = ""
    if (task.attempt > 1) {
        // Set of extra flags to restart the assembly process
        restart = "--restart-from last"
        reads = "" // --restart doesn't allow basic flags to be submitted
    """ \\
        $args \\
        $metaspades_arg \\
        --threads $task.cpus \\
        --memory $maxmem \\
        $custom_hmms \\
        $reads \\
        $restart \\
        -o ./

Is there any way to keep the working directory between retries?