Is it possible to recover cached processes without reading the gzipped files from the disk?

amakunin · October 17, 2024, 3:24pm

Hi all, I am running nf-core ampliseq v2.11.0 and encountering very slow recovery of cached processes. Inspection of .nextflow.log suggests that for two of the processes, RENAME_RAW_DATA_FILES and CUTADAPT, the message appears saying something like DEBUG nextflow.splitter.AbstractSplitter - Creating gzip splitter for: /path/to/some.fastq.gz. As I understand, this indicates that the gzipped file is being read from the disk. This makes sense when the process is being run, but I am also getting the same messages when I already have cached processes and run nextflow with -resume flag use. Unfortunately, this becomes a bottleneck in re-running later stages of the workflow as our filesystem IO sometimes becomes quite slow.

Thus, my question is - is it possible to recover cached processes without reading the gzipped files from the disk?

Topic		Replies	Views
Show cached tasks in nextflow run preview Ask for help nextflow	8	54	December 4, 2024
Caching doesn't work always \|\| already processed data fails Ask for help	1	198	February 14, 2024
Inability to parallelize sequential processes Ask for help nextflow	4	44	December 3, 2024
General question on cached Nextflow containers Ask for help nextflow	3	92	February 28, 2025
Resume not loading retries from cache Ask for help nextflow	8	119	January 16, 2025

Is it possible to recover cached processes without reading the gzipped files from the disk?

Related topics