We used GlusterFS for the past decade or so in HPC but it seems to be abandoned now. Need to see whether I switch to Ceph or something else.
Gluster was OK. We never pushed it very hard but it mostly just worked. Performance wasn't great but we encouraged users to use scratch space that was local to the node where their job was running anyway.
> TernFS is designed for XTX data center needs of maxing out at around 10EB of logical file storage, around one trillion files and 100 billion directories with around one million clients. All running atop commodity hardware and Ethernet networking.
Some notable constraints: files are immutable, write-once update never. Designed for files at least 2MB in size. Slow at directory creation/deletion. No permissions/access control.
These limits aren't quite as strict as they first seem.
Our median file size is 2MB, which means 50% of our files are <2MB. Realistically if you've got an exabyte of data with an average file size of a few kilobytes then this is the wrong tool for the job (you need something more like a database), but otherwise it should be just fine. We actually have a nice little optimisation where very small files are stored inline in the metadata.
It works out of the box with "normal" tools like rsync, python, etc despite the immutability. The reality is that most things don't actually modify files, even text editors tend to save a new version and rename over the top. We had to update relatively little of our massive code base when switching over to this. For us that was a big win, moving to an S3-like interface would have required updating a lot of code.
Directory creation/deletion is "slow", currenly limited to about 10,000 operations per second. We don't current need to create more than 10,000 directories per second so we just haven't prioritised improving that. There is an issue open, #28, which would get this up to 100,000 per second. This is the sort of thing that, like access control, I would love to have had in an initial open source release, but we prioritised open sourcing what we have over getting it perfect.
The reality is that most things don't actually modify files, even text editors tend to save a new version and rename over the top.
it is essentially copy-on-write exposed to the user level. the only issue is that this breaks hard links, so tools that rely on that are going to break. but yes, custom code should be easy to adapt.
> XTX developed TernFS for distributed storage after they outgrew their original NFS usage and other file-system alternatives.
So... call me old and crotchety, but i'm not sure I trust someone to write a DFS like this that once thought NFS a good idea. I'm sure its fine, I just have bad memories.
It was a long long time ago that we were only using NFS, it ran on top of a Solaris machine running ZFS. It did its job at the very beginning, but you don't build up hundreds of petabytes of data on an NFS server.
We did try various solutions in between NFS and developing TernFS, both open source and properietary. However we didn't name these specifically in the blog post because there's little point in bad mouthing what didn't work out for us.
Historically NFS has had many flaws on different O/S-es. Many of these issues appear to have been resolved over time and I have not seen it being referred to as "Nightmare File System" for decades.
However, depending on many factors NFS may still be a bad choice. In our setup, for example, using a large SQLite database through NFS turns out to be up to 10 times as slow as using a "real" disk.
No, they aren't. Especially not distributed filesystems which really aren't yet a "solved problem", which in part explains why there are all these proprietary competing ones still around and companies everywhere using all different ones. NFS, BeeGFS, Weka, Ceph, Lustre, GPFS, GoogleFS, Coda/AFS, and more, each with their own flavor of crap.
For local filesystems, the average PC user shouldn't really care though. Just use whatever your installer defaults. But this story is about a distributed filesystem.
I don't have great hopes for one capable of such massive scale being good and usable (low overhead, low complexity, low adminst cost) in very small configurations, but we can always hope.
There’s definitely a space for these highly-specialised filesystems. You wouldn’t want to use this as your /home FS, nor would you want to use ext4 or something similar for what they’re trying to do.
We used GlusterFS for the past decade or so in HPC but it seems to be abandoned now. Need to see whether I switch to Ceph or something else.
Gluster was OK. We never pushed it very hard but it mostly just worked. Performance wasn't great but we encouraged users to use scratch space that was local to the node where their job was running anyway.
> TernFS is designed for XTX data center needs of maxing out at around 10EB of logical file storage, around one trillion files and 100 billion directories with around one million clients. All running atop commodity hardware and Ethernet networking.
Good lord.
This feels less like a gift to the community and more like the world's most impressive job ad to attract top-tier kernel developers.
Previously:
TernFS – An exabyte scale, multi-region distributed filesystem, 247 points, 4 days ago, https://news.ycombinator.com/item?id=45290245
There was also an introductory blog post submitted 4 days ago. 245 points, 108 comments. https://www.xtxmarkets.com/tech/2025-ternfs/ https://news.ycombinator.com/item?id=45290245
Some notable constraints: files are immutable, write-once update never. Designed for files at least 2MB in size. Slow at directory creation/deletion. No permissions/access control.
(disclaimer: CTO of XTX)
These limits aren't quite as strict as they first seem.
Our median file size is 2MB, which means 50% of our files are <2MB. Realistically if you've got an exabyte of data with an average file size of a few kilobytes then this is the wrong tool for the job (you need something more like a database), but otherwise it should be just fine. We actually have a nice little optimisation where very small files are stored inline in the metadata.
It works out of the box with "normal" tools like rsync, python, etc despite the immutability. The reality is that most things don't actually modify files, even text editors tend to save a new version and rename over the top. We had to update relatively little of our massive code base when switching over to this. For us that was a big win, moving to an S3-like interface would have required updating a lot of code.
Directory creation/deletion is "slow", currenly limited to about 10,000 operations per second. We don't current need to create more than 10,000 directories per second so we just haven't prioritised improving that. There is an issue open, #28, which would get this up to 100,000 per second. This is the sort of thing that, like access control, I would love to have had in an initial open source release, but we prioritised open sourcing what we have over getting it perfect.
The reality is that most things don't actually modify files, even text editors tend to save a new version and rename over the top.
it is essentially copy-on-write exposed to the user level. the only issue is that this breaks hard links, so tools that rely on that are going to break. but yes, custom code should be easy to adapt.
thanks for the open-sourcing!
So, it competes more with S3/minio than NFS it seems ?
> XTX developed TernFS for distributed storage after they outgrew their original NFS usage and other file-system alternatives.
So... call me old and crotchety, but i'm not sure I trust someone to write a DFS like this that once thought NFS a good idea. I'm sure its fine, I just have bad memories.
(disclaimer: CTO of XTX)
It was a long long time ago that we were only using NFS, it ran on top of a Solaris machine running ZFS. It did its job at the very beginning, but you don't build up hundreds of petabytes of data on an NFS server.
We did try various solutions in between NFS and developing TernFS, both open source and properietary. However we didn't name these specifically in the blog post because there's little point in bad mouthing what didn't work out for us.
Nfs is cheap and simple. We are using it for over 15 years in our business. Sering 10s of million daily users. I yet have to find a replacement.
What's wrong with NFS?
It.. depends.
Historically NFS has had many flaws on different O/S-es. Many of these issues appear to have been resolved over time and I have not seen it being referred to as "Nightmare File System" for decades.
However, depending on many factors NFS may still be a bad choice. In our setup, for example, using a large SQLite database through NFS turns out to be up to 10 times as slow as using a "real" disk.
The SQLite FAQs warn about bigger problems than slowness: https://www.sqlite.org/faq.html#q5
So there's nothing wrong with NFS: people just remember old, buggy implementations. Do you think TernFS is somehow with these old bugs?
Eh, aren't all FSs the same, essentially? Can't we just configure the limits during the OS installation and be done with gazillion FSs?
No, they aren't. Especially not distributed filesystems which really aren't yet a "solved problem", which in part explains why there are all these proprietary competing ones still around and companies everywhere using all different ones. NFS, BeeGFS, Weka, Ceph, Lustre, GPFS, GoogleFS, Coda/AFS, and more, each with their own flavor of crap.
For local filesystems, the average PC user shouldn't really care though. Just use whatever your installer defaults. But this story is about a distributed filesystem.
I don't have great hopes for one capable of such massive scale being good and usable (low overhead, low complexity, low adminst cost) in very small configurations, but we can always hope.
There’s definitely a space for these highly-specialised filesystems. You wouldn’t want to use this as your /home FS, nor would you want to use ext4 or something similar for what they’re trying to do.