Nextflow process directive 'scratch' cleans but does not remove scratch dir

Running Nextflow on SLURM, I use the scratch directive to create a scratch directory. Nextflow successfully creates and uses the scratch directory and cleans up its contents afterward, but doesn’t actually remove the scratch directory. Is there a way to make Nextflow remove the scratch directory when the process finishes? I could potentially use the afterScript directive, but the pipeline is containerized and I’m not sure what would happen if I tried to cd out of the scratch dir and remove it in afterScript while a Docker container was running.

Generally speaking, the HPC system is responsible for cleaning up scratch after jobs exit (which is why Nextflow doesn’t do this for you automatically). Is there a reason that you need to do this?

I’m sure that this question has come up before. I started digging for an answer, went down a bit of a rabbit hole to this Google Groups question in 2016, which took me to issue #230 and eventually to this chunk of code in current Nextflow:

I think that this is deleting files from the scratch drive if cleanup is specified (see docs). If I’m right then this is an undocumented and unknown feature that’s been lost in the sands of time (even @bentsherman and @emiller didn’t know about it). I’ll see if we can get it added to the docs :+1:

Now waiting for one of the above mentioned to reply to my thread and say that I’m wrong about how this works :laughing: It’s possible that it runs at the end of the workflow and not when the process finishes for example (?) or there might be something else I’ve overlooked.

Note that another different approach would be to set stageOutMode to move - I guess this would have a similar effect (though maybe not intermediate files that aren’t outputs).

Phil

Can confirm that Phil was wrong about how this works :laughing:

But I did have to do some digging. When you enable scratch, the .command.run script will have some code to delete the scratch directory. It can be disabled through a hidden option that is unrelated to the cleanup config option.

So check your .command.run script for one of these snippets:

# no docker
rm -rf $NXF_SCRATCH || true

# docker
sudo -n true && sudo rm -rf "$NXF_SCRATCH" || rm -rf "$NXF_SCRATCH")&>/dev/null || true

It may be that this command is silently failing. You could try running the script directly without the || true to see what error comes up.

1 Like

It appears that Nextflow doesn’t stage the files to the specified scratch directory, but instead creates a temporary subdirectory and stages the files there. So if I specify the process directive scratch /mnt/scratch/1234567, Nextflow might stage the files to /mnt/scratch/1234567/nxf.7oUjxLpgyV. It deletes the nxf.7oUjxLpgyV directory, but the /mnt/scratch/1234567 directory remains. Now that I reread the documentation, I can see that this behavior is accurately described, and I think I can solve the problem by just using a generic /mnt/scratch for all processes, which will actually simplify and improve portability of the pipeline.

1 Like

Great stuff! Thanks both, nice to get this cleared up and get a better understanding about how this works. I suspect that a small addition to the docs could be helpful to clarify a bit about directory creation and the fact that Nextflow is doing cleanup automatically (so that we don’t have to dig again next time this comes up).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.