Prevent nextflow from running a process if the output file exists

Lovely write-up Firas. I have a small recommendation though:

This use-case is exactly what the storeDir process directive is for (documentation).
It is important to note that Nextflow only checks that element (file, files, or directory) specified in the output block exist before deciding whether to execute the task. This is the same situation your have in your example. My recommendation would be to significantly simplify the process with this directive. I would try something like:

params.output_dir = "./results"
 
workflow {
    makedir()
}

process makedir {
    storeDir params.output_dir

    output:
    path "newdir", emit: reference_fasta

    script:
    """
    mkdir -p newdir
    echo "science goes here" > newdir/task.data
    """
}

Note that you don’t need to insert any groovy variables or if/else conditionals into the script block when using the storeDir directive.

An example run:

$ # Clear the results directory
$ rm -rf results

$ # Run the workflow
$ nextflow run main.nf

 N E X T F L O W   ~  version 24.12.0-edge

Launching `main.nf` [happy_faggin] DSL2 - revision: 397e39ec23

executor >  local (1)
[d7/9d91fd] makedir [100%] 1 of 1 ✔

$ # The process runs and creates the directory:
$ tree results 
results
└── newdir
    └── task.data

2 directories, 1 file

$ # Re-run the workflow
$ #   note the "stored" notation (existing dir used):
$ nextflow run main.nf

 N E X T F L O W   ~  version 24.12.0-edge

Launching `main.nf` [magical_lagrange] DSL2 - revision: 397e39ec23

[skipped  ] makedir [100%] 1 of 1, stored: 1 ✔

2 Likes