DeepSeek AI recently released its Fire-Flyer File System (3FS) as open source, presenting a distributed storage solution specifically engineered for the high-throughput, low-latency demands of artificial intelligence training and inference.
Made available on GitHub under an MIT license, the release occurred as part of DeepSeek’s late February/early March 2025 “Open Source Week” initiative.
Designed explicitly to leverage modern SSDs and RDMA networks, 3FS aims to aggregate storage resources from potentially hundreds of nodes into a unified pool accessible via a standard file system interface. According to DeepSeek’s design notes, this allows applications running on compute nodes to interact with petabyte-scale storage in a locality-oblivious manner, simplifying development for large-scale distributed tasks while aiming for high aggregate performance and fault tolerance.
Inside 3FS: Architecture and Consistency Mechanisms
The 3FS architecture relies on four key components. Metadata operations (like file creation, lookups, and attribute management) are handled by Meta nodes. Crucially, these nodes are designed to be stateless, offloading durability and consistency to an external FoundationDB cluster – Apple’s open-source distributed transactional key-value store.
While powerful, FoundationDB has historically presented some operational complexities, particularly regarding Kubernetes integration, though dedicated operators now aim to ease deployment. A central Mgmtd node serves as the cluster’s brain, tracking the health and location of all Meta and Storage nodes via heartbeats and managing system configuration, including data replication layouts.
Actual file data is managed by the Storage nodes. These nodes utilize a custom, Rust-based `ChunkEngine` to handle data blocks on physical disks, using LevelDB by default for storing chunk metadata locally.
For disk I/O, they leverage Linux’s high-performance asynchronous interface, io_uring. To ensure data integrity across nodes, 3FS employs Chain Replication with Apportioned Queries (CRAQ).
The protocol arranges replicas in chains and ensures strong consistency by carefully managing write propagation and commit acknowledgments. Reads of committed (“clean”) data can be served by any replica, improving performance for read-heavy workloads common in AI, while reads of uncommitted (“dirty”) data are directed to the authoritative tail replica.
A known trade-off of chain-based protocols like CRAQ is that write latency can be bounded by the slowest node in the replication chain.
Performance Claims and Target Workloads
DeepSeek highlights several AI-centric use cases for 3FS: managing large datasets for data preparation, enabling direct random access for training dataloaders (potentially reducing the need for complex prefetching), high-throughput parallel model checkpointing, and serving inference KVCache from lower-cost, high-capacity SSDs.
Performance figures shared by DeepSeek, reportedly from internal use dating back to at least 2019 and tested on their large-scale “Fire-Flyer” AI-HPC infrastructure (specifically, a 180-storage-node cluster serving 10,000 GPUs), claim an aggregate read throughput reaching approximately 6.6 TiB/s during stress testing.
This figure compares favorably to benchmarks cited for other systems like Ceph on different hardware configurations. Additionally, using their open-source `smallpond` sorting tool, the company reported sorting 110.5 TiB in just over 30 minutes on a 25-node storage cluster. For KVCache reads, peak client throughput was cited as up to 40 GiB/s. A custom FIO engine is provided for benchmarking.
Strategic Context: Efficiency and Openness
The release of 3FS fits into a pattern of recent activity showcasing DeepSeek’s focus on architectural efficiency. Their “Open Source Week” ultimately resulted in eight repositories being shared, including the April 18th release of FlashMLA, an optimized attention kernel.
This followed the March 24th open-weight release of the DeepSeek-V3-0324 model checkpoint and the April publication detailing their Self-Principled Critique Tuning (SPCT) research.
This strategy appears partly driven by necessity; tech giant Tencent, confirming its use of DeepSeek models in March 2025, noted the wider trend among Chinese firms adapting to hardware limitations.
Tencent stated on their recent earnings call: “Chinese companies are generally prioritizing efficiency and utilization — efficient utilization of the GPU servers… DeepSeek’s success really sort of symbolize and solidify — demonstrated that — that reality.”
This need for efficiency is amplified by ongoing U.S. export controls affecting access to cutting-edge GPUs.
Availability and Community Engagement
The 3FS source code, along with build instructions and documentation like a setup guide, is available on the project’s GitHub repository. Building the system requires specific versions of `libfuse`, FoundationDB, and the Rust toolchain, among other dependencies outlined for various Linux distributions.
The repository quickly gained traction after its release, accumulating over 8,700 stars and 860 forks, indicating strong interest from the developer community. Users encountering issues are directed to the repository’s issue tracker.