Prevent nextflow from running a process if the output file exists

Imagine you’re working on a project where you need to download many files and store them in a specific directory structure. Before downloading the files, you want to ensure that the target directory exists. If it doesn’t, you create it. If it does, you simply copy the existing directory to the output location.

params.output_dir = "./results"
process makedir {
    publishDir "${params.output_dir}/", mode: 'copy', overwrite: false

    output:
    path "newdir", emit: reference_fasta

    script:
    // Resolve the absolute path of the output directory
    String absOutputDir = file(params.output_dir).toAbsolutePath().toString()
    String targetFile = "${absOutputDir}/newdir"

    if (!file(targetFile).exists()) {
        """
        mkdir newdir
        """
    } else {
        """
        echo "directory  already exists. Copying to output directory."
        echo "absolute path : $absOutputDir"
        echo "target path   : $targetFile"
        cp -r "${targetFile}" newdir
        """
    }
}
 
workflow {
    makedir()
}

case non existing directory:

$ nextflow run makedir.nf
Launching `makedir.nf` [pedantic_galileo] DSL2 - revision: 21e8d35aad

executor >  local (1)
[33/7d1734] process > makedir [100%] 1 of 1 ✔

$ tree 
.
├── makedir.nf
├── results
│   └── newdir
└── work
    └── 33
        └── 7d1734ec6d63226811a47161d58ff7
            └── newdir

case directory exists : If you run the script again, you’ll see that it detects the existing directory and copies it:

$ nextflow run makedir.nf
Launching `makedir.nf` [shrivelled_austin] DSL2 - revision: 21e8d35aad

executor >  local (1)
[b8/6b399a] process > makedir [100%] 1 of 1 ✔

$ cat work/b8/6b399a999c79e57e8208225833dbeb/.command.out 
directory  already exists. Copying to output directory.
absolute path : /home/firas/post/results
target path   : /home/firas/post/results/newdir

$ tree 
.
├── makedir.nf
├── results
│   └── newdir
└── work
    ├── 33
    │   └── 7d1734ec6d63226811a47161d58ff7
    │       └── newdir
    └── b8
        └── 6b399a999c79e57e8208225833dbeb
            └── newdir


Lovely write-up Firas. I have a small recommendation though:

This use-case is exactly what the storeDir process directive is for (documentation).
It is important to note that Nextflow only checks that element (file, files, or directory) specified in the output block exist before deciding whether to execute the task. This is the same situation your have in your example. My recommendation would be to significantly simplify the process with this directive. I would try something like:

params.output_dir = "./results"
 
workflow {
    makedir()
}

process makedir {
    storeDir params.output_dir

    output:
    path "newdir", emit: reference_fasta

    script:
    """
    mkdir -p newdir
    echo "science goes here" > newdir/task.data
    """
}

Note that you don’t need to insert any groovy variables or if/else conditionals into the script block when using the storeDir directive.

An example run:

$ # Clear the results directory
$ rm -rf results

$ # Run the workflow
$ nextflow run main.nf

 N E X T F L O W   ~  version 24.12.0-edge

Launching `main.nf` [happy_faggin] DSL2 - revision: 397e39ec23

executor >  local (1)
[d7/9d91fd] makedir [100%] 1 of 1 ✔

$ # The process runs and creates the directory:
$ tree results 
results
└── newdir
    └── task.data

2 directories, 1 file

$ # Re-run the workflow
$ #   note the "stored" notation (existing dir used):
$ nextflow run main.nf

 N E X T F L O W   ~  version 24.12.0-edge

Launching `main.nf` [magical_lagrange] DSL2 - revision: 397e39ec23

[skipped  ] makedir [100%] 1 of 1, stored: 1 ✔

2 Likes