With fusion based on (unlimited) object storage, how much of space is needed on the main OS itself?

abhi18av · November 8, 2023, 3:16pm

I’m curious regarding the use of /tmp storage in the context of fusionfs and just thinking out loud here

Since fusion is effectively powered by the (unlimited) object storage backends, what is the maximum storage needed on the host itself?

jordeu · November 9, 2023, 4:38am

I understand you’re referring to the minimum storage required on the host system. Fusion does not need the capability to store the entire file in /tmp as it splits the files into chunks of no more than 250MB. This means that your /tmp directory can be smaller than the largest file you intend to process.

The recommended size for /tmp depends on the types of pipelines you are running. For instance, we have observed optimal results with a 100GB /tmp on large EC2 instances handling approximately 10 concurrent tasks for the standard nf-core/rnaseq (full test profile) pipeline. Since all tasks share the same /tmp, this allocation amounts to about 10GB per task.

Nonetheless, it is important to note that the Fusion garbage collector is not fully efficient yet, and there is potential for enhancement. We plan to focus on improving this aspect in the coming year.

abhi18av · November 14, 2023, 12:32pm

Thanks Jordi! This does shed more light upon the curiosity I had about fusion.

One suggestion (as a nice-to-have feature), is to implement a fusion benchmark command which can test these metrics on a specific infrastructure since the plan is to support multiple s3 API providers, which opens up the pathway for using different cloud providers.

It’d be nice to know what’s the capability of a node (+ internet + s3 backend) on a non-AWS infrastructure.

Looking anxiously forward to what fusion can enable for cloud-agnostic infra!

Topic		Replies	Views
Adapting existing AWS Batch infrastructure to integrate the Fusion file system Ask for help nextflow , fusion	3	378	November 2, 2023
Getting Started: Disk and Memory Management Questions Ask for help	1	246	September 25, 2024
Fusion support for Azure Blob Storage Ask for help azure , azure-batch	3	67	July 26, 2024
Handling lots of IO (large files or lots of small files) on HPC systems Ask for help hpc	1	289	December 18, 2023
Problem with disk space when running on nf tower Ask for help nextflow , tower , nf-core , platform	3	212	August 19, 2024

With fusion based on (unlimited) object storage, how much of space is needed on the main OS itself?

Related topics