Question about nf-core/nextflow reuse of data

Uri_David_Akavia · May 13, 2025, 4:34pm

Hi

Thank you for setting up the Seqera platform.
I have a question which is about nf-core/nextflow behavior, and how Seqera fusion deals with it.
nf-core seems to have one task per command, so we have a samtools sort, samtools index, samtools stats task.
Each of these tasks needs the BAM file as input.

Does this mean that nextflow spins up multiple instance, and each instance downloads the 5GB BAM file from S3 each time? Looking at the timeline, it seems so, but that seems inefficient.
The nextflow docs (Working with files — Nextflow documentation) seem to indicate that nextflow can download the file once, and reusue it, but if each process is running in a different AWS EC2 instance, doesn’t that force re-downloading?

When I wrote pipelines in WDL, I usually had one task for samtools, that would run all the commands, and therefore download the BAM file once. Does the Fusion filesystem make the downloading only once?

Am I missing something here?

Thank you,

Uri David

Topic		Replies	Views
Does Nextflow avoid redundancy when fetching same HTTP/FTP files 2+ times? Ask for help	2	89	June 4, 2024
Inability to parallelize sequential processes Ask for help nextflow	4	46	December 3, 2024
Help with Optimizing Nextflow Pipeline for Large Datasets Ask for help nextflow , aws	1	106	February 24, 2025
Pipeline getting frozen (nf-core) Ask for help nextflow , nf-core	3	247	September 2, 2024
Getting Started: Disk and Memory Management Questions Ask for help	1	264	September 25, 2024

Question about nf-core/nextflow reuse of data

Related topics