@Kanna_Dhasan were you able to get this working?
I have a similar objective and currently, this appears to be the only question about automatically cleaning the Nextflow work folder in AWS in the Seqera forum.
As described here, I have a slightly different use case though. I would like to automatically clean up the Nextflow work folders when runs finish successfully while keeping all (typically small) Nextflow metadata files. I saw that the nextflow clean command appears to support this with the -keep-logs option but I was wondering if there’s a way to do it with the cleanup = true config setting (or any other way) so that I don’t have to run a separate Nextflow command outside of the pipeline.
Note, as mentioned here and here, I am aware that currently, the Nextflow clean options don’t work on S3 objects. Given that, I’m hoping that they will add support for S3 objects soon or that you’ve found an alternative approach for doing this.
I’ll also mention another approach I came across that doesn’t really work for me (but might work for other people who read this).
For any given bucket in AWS, you can create Lifecycle Rules to automatically manage or delete files. If you only use a single work folder for Nextflow processes, you could specify a hard-coded Prefix so that the rule only applies to the work folder. You can also set how many days to wait after object creation before deleting the file. So, assuming that you’ll be able to correct any pipeline failures within a few days, you could create a lifecycle rule that auto-deletes the work files say 10 days after they are created. It’s not the most efficient option, but S3 storage is generally pretty cheap.
Unfortunately, this solution doesn’t work for me because I want to create distinct work folders for each project. As described in this article, it’s possible to create lifecycle rules that delete objects with specific Object tags, which would be perfect for my use case. Plus, Nextflow already automatically tags the metadata files with nextflow.io/metadata = true e.g.:
Unfortunately (and this seems like an oversight), Nextflow fails to tag the non-metadata files (i.e. the process input files) it creates in the work folder. If this were fixed/updated, I’d recommend tagging those files differently so that they could be handled distinctly from the metadata files by lifecycle rules (as I want to do).
EDIT: I am currently using Nextflow v24.04.4 with Fusion v2.3.8-e3aab5d.
