-resume not effective with S3 input files?

Andrew_Crabb · December 6, 2024, 9:23pm

My workflow reads input files either from S3 or local disk. When the input files are local, -resume works as expected, and successful steps are not re-run. When the input files are in S3, repeating the workflow submission with -resume downloads the files from S3 again, and re-runs all steps including those that ran successfully on the prior run.

I’m thinking that the repeated S3 download results in local files that differ in timestamp from those in the previous run, so they are assumed to be different input files. Is there anything I can do here? Thanks!

mribeirodantas · December 8, 2024, 3:00am

Welcome to the community forum, @Andrew_Crabb

For inconsistent file timestamps, which can invalidate the cache, you can avoid it by using the 'lenient' caching mode, which ignores the last modified timestamp and uses only the file path and size.

process MY_PROCESS_NAME {
  cache 'lenient'
  ...
}

Topic		Replies	Views
Resume not loading retries from cache Ask for help nextflow	8	118	January 16, 2025
Which features of a task must be unchanged for resuming to work? Ask for help	2	56	September 26, 2024
Caching doesn't work always \|\| already processed data fails Ask for help	1	198	February 14, 2024
Resume workflow based on files in publishDir (or other external directory) Ask for help	9	264	September 27, 2024
Task.hash on resume null Ask for help	8	174	February 19, 2024

-resume not effective with S3 input files?

Related topics