Resume workflow based on files in publishDir (or other external directory)

Hello – I am considering using Nextflow to process files and save them into a large complex directory structure – our “curated datasets”. Is there a way to have Nextflow’s resume functionality determine whether to run a process based on the presence (and size/modified date) of files in an external directory (not work)? I’d like my input, intermediate, and final output files to all be stored in a file structure that I specify outside of work, but to still use the same resume logic as though those files were in the work directory managed by Nextflow.

I think I could use the publishDir directive to duplicate (copy) the work files outside of work in a specified subfolder structure. However, my files are very large and numerous and I can’t reserve the space for multiple copies. I can’t use the symlink mode of publishDir either because I need the external data folder to contain the actual data (so that it will persist even if I clear the Nextflow cache in work to free space).

Are there any Nextflow settings or tips to accomplish what I’m after? Thanks in advance!

Thinking more about this, it seems that using the hard link (link) mode of publishDir may roughly achieve what I’m after, because the file in publishDir (and its contents) should persist even if the original file in work is deleted.

I suspect what I would give up, though , is the ability to resume, because if the file in work is deleted, even if the rest of the cache remains, the workflow would think the file needs to be recreated.

Thus, I’m still interested to know if there is any way to have the canonical location of the output files (the files that are checked when determining how to resume) be in a custom directory structure outside of work.

In order for the resume feature to work, you need both work and .nextflow. If any of them is corrupted due to some change you made, resume won’t work accordingly. You can change the location of these folders, though, but no, you can’t make resume work based on some arbitrary folder structure that you created.