The sad part, that despite the years of the development BTRS never reached the parity with ZFS. And yesterday's news "Josef Bacik who is a long-time Btrfs developer and active co-maintainer alongside David Sterba is leaving Meta. Additionally, he's also stepping back from Linux kernel development as his primary job." see https://www.phoronix.com/news/Josef-Bacik-Leaves-Meta
There is no 'modern' ZFS-like fs in Linux nowadays.
There's literally ZFS-on-linux and it works great. And yes, I will once again say Linus is completely wrong about ZFS and the multiple times he's spoken about it, it's abundantly clear he's never used it or bothered to spend any time researching its features and functionality.
ZFS deserves an absolutely legendary amount of respect for showing us all what a modern filesystem should be - the papers they wrote, alone, did the entire filesystem world such a massive service by demonstrating the possibilities of full data integrity and why we want it, and then they showed it could be done.
But there's a ton of room for improvement beyond what ZFS did. ZFS was a very conservative design in a lot of ways (rightly so! so many ambitious projects die because of second system syndrome); notably, it's block based and doesn't do extents - extents and snapshots are a painfully difficult combination.
Took me years to figure that one out.
My hope for bcachefs has always been to be a real successor to ZFS, with better and more flexible management, better performance, and even better robustness and reliability.
> But there's a ton of room for improvement beyond what ZFS did.
Say more? I can't say I've really thought that much about filesystems and I'm curious in what direction you think they could be taken if time and budget weren't an issue.
It's an entirely clean slate design, and I spent years taking my time on the core planning out the design; it's as close to perfect as I can make it.
The only things I can think of that I would change or add given unlimited time and budget:
- It should be written in Rust, and even better a Rust + dependent types (which I suspect could be done with proc macros) for formal verification. And cap'n proto for on disk data structures (which still needs Rust improvements to be as ergonomic as it should be) would also be a really nice improvement.
- More hardening; the only other thing we're lacking is comprehensive fault injection testing of on disk errors. It's sufficiently battle hardened that it's not a major gap, but it really should happen at some point.
- There's more work to be done in bitrot prevention: data checksums really need to be plumbed all the way into the pagecache
I'm sure we'll keep discovering new small ways to harden, but nothing huge at this point.
Some highlights:
- It has more defense in depth than any filesystem I know of. It's as close to impossible to have unrecoverable data loss as I think can really be done in a practical production filesystem - short of going full immutable/append only.
- Closest realization of "filesystem as a database" that I know of
- IO path options (replication level, compression, etc.) can be set on a per file or directory basis: I'm midway through a project extending this to do some really cool stuff, basically data management is purely declarative.
- Erasure coding is much more performant than ZFS's
- Data layout is fully dynamic, meaning you can add/remove devices at will, it just does the right thing - meaning smoother device management than ZFS
- The way the repair code works, and tracking of errors we've seen - fantastic for debugability
- Debugability and introspection are second to none: long bug hunts really aren't a thing in bcachefs development because you can just see anything the system is doing
There's still lots of work to do before we're fully at parity with ZFS. Over the next year or two I should be finishing erasure coding, online fsck, failure domains, lots more management stuff... there will always be more cool projects just over the horizon
Thanks for bcachefs and all the hard work you’ve put in it. It’s truly appreciated and hope you can continue to march on and not give up on the in-kernel code, even if it means bowing to Linus.
On a different note, have you heard about prolly trees and structural sharing? It’s a newer data structure that allows for very cheap structural sharing and I was wondering if it would be possible to build an FS on top of it to have a truly distributed fs that can sync across machines.
"Like its predecessor, OFS (Old Be File System, written by Benoit Schillings - formerly BFS), it includes support for extended file attributes (metadata), with indexing and querying characteristics to provide functionality similar to that of a relational database."
What BFS did is very cool, and I hope to add that to bcachefs someday.
But I'm talking more about the internals than external database functionality; the inner workings are much more fundamental.
bcachefs internally is structured more like a relational database than a traditional Unix filesystem, where everything hangs off the inode. In bcachefs, there's an extents btree (read: table), an inodes btree, a dirents btree, and a whole bunch of others - we're up to 20 (!).
There's transactions, where you can do arbitrary lookups, updates, and then commit, with all the database locking hidden from you; lookups within a transaction see uncommitted updates from that transaction. There's triggers, which are used heavily.
We don't have the full relational model - no SELECT or JOIN, no indices on arbitrary fields like with SQL (but you can do effectively the same thing with triggers, I do it all the time).
All the database/transactional primitives make the rest of the codebase much smaller and cleaner, and make feature development a lot easier than what you'd expect in other filesystems.
I happen to work at a company that uses a ton of capnp internally and this is the first time I've seen it mentioned much outside of here. Would you mind describing what about it you think would make it a good fit for something like bcachefs?
Cap'n proto is basically a schema language that gets you a well defined in-memory representation that's just as good as if you were writing C structs by hand (laboriously avoiding silent padding, carefully using types with well defined sizes) - without all the silent pitfalls of doing it manually in C.
It's extremely well thought out, it's minimalist in all the right ways; I've found the features and optimizations it has to be things that are borne out of real experience that you would want end up building yourself in any real world system.
E.g. it gives you the ability to add new fields without breaking compatibility. That's the right way to approach forwards/backwards compatibility, and it's what I do in bcachefs and if we'd been able to just use cap'n proto it would've taken out a lot of manual fiddly work.
The only blocker to using it more widely in my own code is that it's not sufficiently ergonomic in Rust - Rust needs lenses, from Swift.
I'm saddened by this turn of event, but I hope this won't deter you from working on bcachefs on your own term and eventually see a reconciliation into the kernel at one point.
> - Erasure coding is much more performant than ZFS's
any plans for much lower rates than typical raid?
Increasingly modern high density devices are having block level failures at non-trivial rates instead of or in addition to whole device failures. A file might be 100,000 blocks long, adding 1000 blocks of FEC would expand it 1% but add tremendous protection against block errors. And can do so even if you have a single piece of media. Doesn't protect against device failures, sure, though without good block level protection device level protection is dicey since hitting some block level error when down to minimal devices seems inevitable and having to add more and more redundant devices is quite costly.
It's been talked about. I've seen some interesting work to use just a normal checksum to correct single bit errors.
If there's an optimized implementation we can use in the kernel, I'd love to add it. Even on modern hardware, we do see bit corruption in the wild, it would add real value.
It's pretty straight forward to use a normal checksum to correct single or even more bit errors (depending on the block size, choice of checksum, etc). Though I expect those bit errors are bus/ram, and hopefully usually transient. If there is corruption on the media, the whole block is usually going to be lost because any corruptions means that its internal block level FEC has more errors than it can fix.
I was more thinking along the lines of adding dozens or hundreds of correction blocks to a whole file, along the lines of par (though there are much faster techniques now).
You'd think that, wouldn't you? But there are enough moving parts in the IO stack below the filesystem that we do see bit errors. I don't have enough data to do correlations and tell you likely causes, but they do happen.
I think SSDs are generally worse than spinning rust (especially enterprise grade SCSI kit), the hard drive vendors have been at this a lot longer and SSDs are massively more complicated. From the conversations I've had with SSD vendors, I don't think they've put the some level of effort into making things as bulletproof as possible yet.
One thing to keep in mind is that correction always comes as some expense of detection.
Generally a code that can always detect N errors can only always correct N/2 errors. So you detect an errored block, you correct up to N/2 errors. The block now passes but if the block actually had N errors, your correction will be incorrect and you now have silent corruption.
The solution to this is just to have an excess of error correction power and then don't use all of it. But that can be hard to do if you're trying to shoehorn it into an existing 32-bit crc.
How big are the blocks that the CRC units cover in bcachefs?
bcachefs checksums (and compresses) at extent granularity, not block; encoded extents (checksummed/compressed) are limited to 128k by default.
This is a really good tradeoff in practice; the vast majority of applications are doing buffered IO, not small block O_DIRECT reads - that really only comes up in benchmarks :)
And it gets us better compression ratios and better metadata overhead.
We also have quite a bit of flexibility to add something bigger to the extent for FEC, if we need to - we're not limited to a 32/64 bit checksum.
Can you explain your definition of "extent"? Because under every definition I dealt with in filesystems before, ZFS is extent based at the lower layer, and flat out object storage system (closer to S3) at upper layer.
I've recently started using OpenZFS after all these years, and after weighting all the pros and cons of BTRFS, mdadm, etc, ZFS is clearly on top for availability and resiliency.
Hopefull we can get to a point where Linux has a native, and first-class modern alternative to ZFS with BcacheFS.
Sometimes I wonder how someone so talented could be so wrong about ZFS, and it makes me wonder if his negative responses to ZFS discussions could be a way of creating plausible deniability in case Oracle's lawyers ever learn how to spell ZFS.
As far as I know, the license incompatibility is on the GPL side of the equation. As in, shipping a kernel with the ZoL functionality is a violation of the GPL, not the CDDL. Thus, Oracle would not be able to sue Canonical (Edit: or, rather, have any reasonable expectation of winning this battle), as they have no standing. A copyright holder of some materially significant portion of the GPL code of the kernel would have to sue Canonical for breaching the GPL by including CDDL code.
How many years has it been since Ubuntu started shipping ZFS, purportedly in violation of whatever legal fears the kernel team has? Four years? Five years?
I obviously have nothing like inside knowledge, but I assume the reason there have not been lawsuits over this, is that whoever could bring one (would it be only Oracle?) expects there are even-odds that they would lose? Thus the risk of setting an adverse precedent isn't worth the damages they might be awarded from suing Canonical?
The legal issues between Linux kernel and ZFS are that Linux license does not allow incorporating licenses with more restrictions - including anything that puts protections against being sued for patented code contributed by license giver.
I am aware of that. I did a bad job phrasing my post, and it came off sounding more confident than I actually intended. I have two questions: (1) What are the expected consequences of a violation? (2) Why haven't any consequences occurred yet?
My understanding is that Canonical is shipping ZFS with Ubuntu. Or do I misunderstand? Has Canonical not actually done the big, bad thing of distributing the Linux kernel with ZFS? Did they find some clever just-so workaround so as to technically not be violation of the Linux kernel's license terms?
Otherwise, if Canonical has actually done the big, bad thing, who has standing to bring suit? Would the Linux Foundation sue Canonical, or would Oracle?
I ask this in all humility, and I suspect there is a chance that my questions are nonsense and I don't know enough to know why.
It could be a long term strategy by Oracle to be able to sue IBM and other big companies distributing Linux with ZFS built in. If Oracle want people to use ZFS they can just relicense the code they have copyright on.
Sure, but Oracle cannot retroactively relicense the code already published before then. The cat's already out of the bag, and as long as the code from before the fork is used according to the original license, it's legal.
But Sun ensured that they can only gnash their teeth.
The source of "license incompatibility" btw is the same as from using GPLv3 code in kernel - CDDL adds an extra restriction in form of patent protections (just like Apache 2)
To me, ZFS on Linux is extremely uninteresting except for the specific use case of a NAS with a bunch of drives. I don't want to deal with out-of-tree filesystems unless I absolutely have to. And even on a NAS, I would want the root partition to be ext4 or btrfs or something else that's in the kernel.
I will not use or recommend ZFS on _any_ OS until they solve the double page cache problem. A filesystem has no business running its own damned page cache that duplicates the OS one. I don't give a damn if ZFS has a fancy eviction algorithm. ARC's patent is expired. Go port it to mainline Linux if it's not that good. Just don't make inner platform.
Suse Linux Enterprise still uses Btrfs as the Root-FS, so it can't be that bad, right? What is Chris Mason actually doing these days? I did some googling and only found out that he was working on a tool called "rsched".
btrfs is fine for single disks or mirrors. In my experience, the main advantages of zfs over btrfs is that ZFS has production ready raid5/6 like parity modes and has much better performance for small sync writes, which are common for databases and hosting VM images.
Context: I mostly dealt with RAID1 in a home NAS setup
A ZFS pool will remain available even in degraded mode, and correct me if I'm wrong but with BTRFS you mount the array through one of the volume that is part of the array and not the array itself.. so if that specific mounted volume happens to go down, the array becomes unavailable unmounted until you remount another available volume that is part of the array which isn't great for availability.
I thought about mitigating that by making an mdadm RAID1 formatted with BTRFS and mount the virtual volume instwad, but then you lose the ability to prevent bit rot, since BTRFS lose that visibility if it doesn't manage the array natively.
Thanks for sharing! I just setup a fs benchmark system and I'll run your fio command so we can compare results. I have a question about your fio args though. I think "--ioengine=sync" and "--iodepth=16" are incompatible, in the sense that iodepth will only be 1.
"Note that increasing iodepth beyond 1 will not affect synchronous ioengines"[1]
Is there a reason you used that ioengine as opposed to, for example, "libaio" with a "--direct=1" flag?
This might not be directly about btrfs but bcachefs zfs and btrfs are the only filesystems for Linux that provide modern features like transparent compression, snapshots, and CoW.
zfs is out of tree leaving it as an unviable option for many people. This news means that bcachefs is going to be in a very weird state in-kernel, which leaves only btrfs as the only other in-tree ‘modern’ filesystem.
This news about bcachefs has ramifications about the state of ‘modern’ FSes in Linux, and I’d say this news about the btrfs maintainer taking a step back is related to this.
Meh. This war was stale like nine years ago. At this point the originally-beaten horse has decomposed into soil. My general reply to this is:
1. The dm layer gives you cow/snapshots for any filesystem you want already and has for more than a decade. Some implementations actually use it for clever trickery like updates, even. Anyone who has software requirements in this space (as distinct from "wants to yell on the internet about it") is very well served.
2. Compression seems silly in the modern world. Virtually everything is already compressed. To first approximation, every byte in persistent storage anywhere in the world is in a lossy media format. And the ones that aren't are in some other cooked format. The only workloads where you see significant use of losslessly-compressible data are in situations (databases) where you have app-managed storage performance (and who see little value from filesystem choice) or ones (software building, data science, ML training) where there's lots of ephemeral intermediate files being produced. And again those are usages where fancy filesystems are poorly deployed, you're going to throw it all away within hours to days anyway.
Filesystems are a solved problem. If ZFS disappeared from the world today... really who would even care? Only those of us still around trying to shout on the internet.
For me bcachefs provides a feature no other filesystem on Linux has: automated tiered storage. I've wanted this ever since I got an SSD more than 10 years ago, but filesystems move slow.
A block level cache like bcache (not fs) and dm-cache handles it less ideally, and doesn't leave the SSD space as usable space. As a home user, 2TB of SSDs is 2TB of space I'd rather have. ZFS's ZIL is similar, not leaving it as usable space. Btrfs has some recent work in differentiating drives to store metadata on the faster drives (allocator hints), but that only does metadata as there is no handling of moving data to HDDs over time. Even Microsoft's ReFS does tiered storage I believe.
I just want to have 1 or 2 SSDs, with 1 or 2 HDDs in a single filesystem that gets the advantages of SSDs with recently used files and new writes, and moves all the LRU files to the HDDs. And probably keep all the metadata on the SSDs too.
> automated tiered storage. I've wanted this ever since I got an SSD more than 10 years ago, but filesystems move slow.
You were not alone. However, things changed, namely SSD continued to become cheaper and grew in capacity. I'd think most active data is these days on SSDs (certainly in most desktops, most servers which aren't explicit file or DB servers and all mobile and embedded devices), the role of spinning rust being more and more archiving (if found in a system at all).
> Compression seems silly in the modern world. Virtually everything is already compressed.
IIRC my laptop's zpool has a 1.2x compression ratio; it's worth doing. At a previous job, we had over a petabyte of postgres on ZFS and saved real money with compression. Hilariously, on some servers we also improved performance because ZFS could decompress reads faster than the disk could read.
> we also improved performance because ZFS could decompress reads faster than the disk could read
This is my favorite side effect of compression in the right scenarios. I remember getting a huge speed up in a proprietary in-memory data structure by using LZO (or one of those fast algorithms) which outperformed memcpy, and this was already in memory so no disk io involved! And used less than a third of the memory.
The performance gain from compression (replacing IO with compute) is not ironic, it was seen as a feature for the various NAS that Sun (and after them Oracle) developped around ZFS.
I know my own personal anecdote isn’t much, but I’ve noticed pretty good space savings on the order of like 100 GB from zstd compression and CoW on my personal disks with btrfs
As for the snapshots, things like LVM snapshots are pretty coarse, especially for someone like me where I run dm-crypt on top of LVM
I’d say zfs would be pretty well missed with its data integrity features. I’ve heard that btrfs is worse in that aspect, so given that btrfs saved my bacon with a dying ssd, I can only imagine what zfs does.
> And the ones that aren't are in some other cooked format.
Maybe, if you never create anything. I make a lot of game art source and much of that is in uncompressed formats. Like blend files, obj files, even DDS can compress, depending on the format and data, due to the mip maps inside them. Without FS compression it would be using GBs more space.
I'm not going to individually go through and micromanage file compression even with a tool. What a waste of time, let the FS do it.
> Filesystems are a solved problem. If ZFS disappeared from the world today... really who would even care? Only those of us still around trying to shout on the internet.
Yeah nah, have you tried processing terabytes of data every day and storing them? It gets better now with DDR5 but bit flips do actually happen.
And once more, you're positing the lack of a feature that is available and very robust (c.f. "yell on the internet" vs. "discuss solutions to a problem"). You don't need your filesystem to integrate checksumming when dm/lvm already do it for you.
i'm not one for internet arguments and really just want solutions. maybe you could point me at the details for a setup that worked for you?
based on my own testing, dm has a lot of footguns and, with some kernels, as little as 100 bytes of corruption to the underlying disk could render a dm-integrity volume completely unusable (requiring a full rebuild) https://github.com/khimaros/raid-explorations
Well the intention of the integrity things is to preserve integrity that is an explicit choice, in particular for encrypted data. You definitely need a backup strategy.
Backups are great, but don't help much if you backup corrupted data.
You can certainly add verification above and below your filesystem, but the filesystem seems like a good layer to have verification. Capturing a checksum while writing and verifying it while reading seems appropriate; zfs scrub is a convenient way to check everything on a regular basis. Personally, my data feels important enough to make that level of effort, but not important enough to do anything else.
FWIW, framed the way you do, I'd say the block device layer would be an *even better* place for that validation, no?
> Personally, my data feels important enough to make that level of effort, but not important enough to do anything else.
OMG. Backups! You need backups! Worry about polishing your geek cred once your data is on physically separate storage. Seriously, this is not a technology choice problem. Go to Amazon and buy an exfat stick, whatever. By far the most important thing you're ever going to do for your data is Back. It. Up.
Filesystem choice is, and I repeat, very much a yell-on-the-internet kind of thing. It makes you feel smart on HN. Backups to junky Chinese flash sticks are what are going to save you from losing data.
I apprechiate the argument. I do have backups. Zfs makes it easy to send snapshots and so I do.
But I don't usually verify the backups, so there's that. And everything is in the same zip code for the most part, so one big disaster and I'll lose everything. C'est la vie.
Ok I think you're making a well-considered and interesting argument about devicemapper vs. feature-ful filesystems but you're also kind of personalizing this a bit. I want to read more technical stuff on this thread and less about geek cred and yelling. :)
I wouldn't comment but I feel like I'm naturally on your side of the argument and want to see it articulated well.
I didn't really think it was that bad? But sure, point taken.
My goal was actually the same though: to try to short-circuit the inevitable platform flame by calling it out explicitly and pointing out that the technical details are sort of a solved problem.
ZFS argumentation gets exhausting, and has ever since it was released. It ends up as a proxy for Sun vs. Linux, GNU vs. BSD, Apple vs. Google, hippy free software vs. corporate open source, pick your side. Everyone has an opinion, everyone thinks it's crucially important, and as a result of that hyperbole everyone ends up thinking that ZFS (dtrace gets a lot of the same treatment) is some kind of magically irreplaceable technology.
And... it's really not. Like I said above if it disappeared from the universe and everyone had to use dm/lvm for the actual problems they need to solve with storage management[1], no one would really care.
[1] Itself an increasingly vanishing problem area! I mean, at scale and at the performance limit, virtually everything lives behind a cloud-adjacent API barrier these days, and the backends there worry much more about driver and hardware complexity than they do about mere "filesystems". Dithering about individual files on individual systems in the professional world is mostly limited to optimizing boot and update time on client OSes. And outside the professional world it's a bunch of us nerds trying to optimize our movie collections on local networks; realistically we could be doing that on something as awful NTFS if we had to.
On urging from tptacek I'll take that seriously and not as flame:
1. This is misunderstanding how device corruption works. It's not and can't ever be limited to "files". (Among other things: you can lose whole trees if a directory gets clobbered, you'd never even be able to enumerate the "corrupted files" at all!). All you know (all you can know) is that you got a success and that means the relevant data and metadata matched the checksums computed at write time. And that property is no different with dm. But if you want to know a subset of the damage just read the stderr from tar, or your kernel logs, etc...
2. Metadata robustness in the face of inconsistent updates (e.g. power loss!) is a feature provided by all modern filesystems, and ZFS is no more or less robust than ext4 et. al. But all such filesystems (ZFS included) will "lose data" that hadn't been fully flushed. Applications that are sensitive to that sort of thing must (!) handle this by having some level of "transaction" checkpointing (i.e. a fsync call). ZFS does absolutely nothing to fix this for you. What is true is that an unsynchronized snapshot looks like "power loss" at the dm level where it doesn't in ZFS. But... that's not useful for anyone that actually cares about data integrity, because you still have to solve the power loss problem. And solving the power loss problem obviates the need for ZFS.
1 - you absolutely can and should walk reverse mappings in the filesystem so that from a corrupt block you can tell the user which file was corrupted.
In the future bcachefs will be rolling out auxiliary dirent indices for a variety of purposes, and one of those will be to give you a list of files that have had errors detected by e.g. scrub (we already generally tell you the affected filename in error messages)
2 - No, metadata robustness absolutely varies across filesystems.
From what I've seen, ext4 and bcachefs are the gold standard here; both can recover from basically arbitrary corruption and have no single points of failure.
Other filesystems do have single points of failure (notably btree roots), and btrfs and I believe ZFS are painfully vulnerable to devices with broken flush handling. You can blame (and should) blame the device and the shitty manufacturers, but from the perspective of a filesystem developer, we should be able to cope with that without losing the entire filesystem.
XFS is quite a bit better than btrfs, and I believe ZFS, because they have a ton of ways to reconstruct from redundant metadata if they lose a btree root, but it's still possible to lose the entire filesystem if you're very, very unlucky.
On a modern filesystem that uses b-trees, you really need a way of repairing from lost b-tree roots if you want your filesystem to be bulletproof. btrfs has 'dup' mode, but that doesn't mean much on SSDs given that you have no control over whether your replicas get written to the same erase unit.
Reiserfs actually had the right idea - btree node scan, and reconstruct your interior nodes if necessary. But they gave that approach a bad name; for a long time it was a crutch for a buggy b-tree implementation, and they didn't seed a filesystem specific UUID into the btree node magic number like bcachefs does, so it could famously merge a filesystem from a disk image with the host filesystem.
bcachefs got that part right, and also has per-device bitmaps in the superblock for 'this range of the device has btree nodes' so it's actually practical even if you've got a massive filesystem on spinning rust - and it was introduced long after the b-tree implementation was widely deployed and bulletproof.
> XFS is quite a bit better than btrfs, and I believe ZFS, because they have a ton of ways to reconstruct from redundant metadata if they lose a btree root
As I understand it ZFS also has a lot of redundant metatdata (copies=3 on anything important), and also previous uberblocks[1].
In what way is XFS better? Genuine question, not really familiar with XFS.
I can't speak with any authority on ZFS, I know its structure the least out of all the major filesystems.
I do a ton of reading through forums gathering user input, and lots of people chime in with stories of lost filesystems. I've seen reports of lost filesystems with ZFS and I want to say I've seen them at around the same frequency of XFS; both are very rare.
My concern with ZFS is that they seem to have taken the same "no traditional fsck" approach as btrfs, favoring entirely online repair. That's obviously where we all want to be, but that's very hard to get right, and it's been my experience that if you prioritize that too much you miss the "disaster recovery" scenarios, and that seems to be what's happened with ZFS; I've read that if your ZFS filesystem is toast you need to send it to a data recovery service.
That's not something I would consider acceptable, fsck ought to be able to do anything a data recovery service would do, and for bcachefs it does.
I know the XFS folks have put a ton of outright paranoia into repair, including full on disaster recovery scenarios. It can't repair in scenarios where bcachefs can - but on the other hand, XFS has tricks that bcachefs doesn't, so I can't call bcachefs unequivocally better; we'd need to wait for more widespread usage and a lot more data.
The lack of traditional 'fsck' is because its operation would be exact same as normal driver operation. The most extreme case involves a very obscure option that lets you explicitly rewind transactions to one you specify, which I've seen used to recover a broken driver upgrade that led to filesystem corruption in ways that most FSCK just barf on, including XFS'
For low-level meddling and recovery, there's a filesystem debugger that understands all parts of ZFS and can help for example identifying previous uberblock that is uncorrupted, or recovering specific data, etc.
> What happens on ZFS if you lose all your alloc info?
According to this[1] old issue, it hasn't happened frequently enough to prioritize implementing a rebuild option, however one should be able to import the pool read-only and zfs send it to a different pool.
As far as I can tell that's status quo. I agree it is something that should be implemented at some point.
That said, certain other spacemap errors might be recoverable[2].
One feature I like about ZFS and have not seen elsewhere is that you can have each filesystem within the pool use its own encryption keys but more importantly all of the pool's data integrity and maintenance protection (scrubs, migrations, etc) work with filesystems in their encrypted state. So you can boot up the full system and then unlock and access projects only as needed.
The dm stuff is one key for the entire partition and you can't check it for bitrot or repair it without the key.
> The dm layer gives you cow/snapshots for any filesystem you want already and has for more than a decade. Some implementations actually use it for clever trickery like updates, even.
O_o
Apparently I've been living under a rock, can you please show us a link about this? I was just recently (casually) looking into bolting ZFS/BTRFS-like partial snapshot features to simulate my own atomic distro where I am able to freely roll back if an update goes bad. Think Linux's Timeshift with something little extra.
DM has targets that facilitate block-level snapshots, lazy cloning of filesystems, compression, &c. Most people interact with those features through LVM2. COW snapshots are basically the marquee feature of LVM2.
Does btrfs still eat your data if you try to use its included RAID featureset? Does it still break in a major way if you're close to running out of disk space? What I'm seeing is that most major Linux distributions still default to non-btrfs options for their default install, generally ext4.
Anecdotal but btrfs is the only filesystem I've lost data with (and it wasn't in a RAID configuration). That combined with the btrfs tools being the most aggressively bad management utilities out there* ensure that I'm staying with ext4/xfs/zfs for now.
*Coming from the extremely well thought out and documented zfs utilities to btrfs will have you wondering wtf fairly frequently while you learn your way around.
Since the existing bcachefs driver will not be removed, and the problem is the bcachefs developer not following the rules, I wonder if someone else could take on the role of pulling bcachefs changes into the mainline, while also following the merge window rules.
The patch that kicked off the current conflict was the 'journal_rewind' patch; we recently (6.15) had the worst bug in the entire history upstream - it was taking out entire subvolumes.
The third report got me a metadata dump with everything I needed to debug the issue, thank god, and now we have a great deal of hardening to ensure a bug like this can never happen again. Subsequently, I wrote new repair code, which fully restored the filesystem of the 3rd user hit by the bug (first two had backups).
Linus then flipped out because it was listed as a 'feature' in the pull request; it was only listed that way to make sure that users would know about it if they were affected by the original bug and needed it. Failure to maintain your data is always a bug for a filesystem, and repair code is a bugfix.
In the private maintainer thread, and even in public, things went completely off the rails, with Linus and Ted basically asserting that they knew better than I do which bcachefs patches are regression risks (seriously), and a page and a half rant from Linus on how he doesn't trust my judgement, and a whole lot more.
There have been many repeated arguments like this over bugfixes.
The thing is, since then I started perusing pull requests from other subsystems, and it looks like I've actually been more conservative with what I consider a critical bugfix (and send outside the merge window) than other subsystems. The _only_ thing that's been out of the ordinary with bcachefs has been the volume of bugfixes - but that's exactly what you'd expect to see from a new filesystem that's stabilizing rapidly and closing out user bug reports - high volume of pure bugfixing is exactly what you want to see.
So given that, I don't think having a go-between would solve anything.
It's so sad to see an excellent engineer such as yourself, building what seems like an excellent filesystem that has the potential to be better than everything else available for Linux for many use cases, completely fail to achieve your goals because you lack the people skills to navigate working as a part of a team under a technical leader. Every comment and e-mail I've seen from you has demonstrated an impressive lack of understanding with regard to why you're being treated as you are.
You don't have to agree with all other maintainers on everything, but if you're working on Linux (or any other major project that's owned, run and developed by other people), you need to have the people skills to at a minimum avoid pissing everyone else off. Or you need to delegate the communication work to someone with those skills. It's a shame you don't.
Pointing the finger at the skills I lack and my inability, while ignoring the wider picture, of the kernel burning out maintainers and not doing well on filesystems.
You get a ton of comments like this because it's true. There are real problems in the kernel, I've seen how hostile it can be to people who are just trying to do the right thing and upstream their changes etc. But your case isn't that. Your behavior would get you in trouble at any job where you have to follow rules set by other people. Your refusal to treat your part of the kernel as anything other than your personal pet project has destroyed your project's potential.
If this was a month or two ago, I would've written something vaguely optimistic here about how you could still turn this around somehow, about what lessons you could learn and move forward with. But that ship has sailed. Your project is no longer the promising next generation filesystem which could replace ext4 as the default choice. Your role is now that of the developer of some small out-of-tree filesystem for a small group of especially interested users. Nobody wanted this for you, including myself. But you have refused to listen to anyone's advice, so now you're here.
Kent has gotten this same feedback across practically every single platform that has discussed his issues. He is unable to take critique and will instead just continue to argue and be combative, therefore proving yet again why he is in this situation in the first place
That's because it is my project, and my responsibility.
I can't be bowing to the demands of any one person; I have to balance the wants and needs of everyone and prioritize shipping something that works above all else.
Repeatedly we've seen that those priorities are not shared, unfortunately.
Arguments are just as heated as they ever were, but now instead of arguing over the actual issues - does this work, are we doing this right - people jump to arguing over language and conduct and demanding apologies or calling for people to be expelled.
But my core mission is just shipping a reliable trustworthy filesystem, and that's what I'm going to stick to.
> I can't be bowing to the demands of any one person
This right here is the core of the issue. When you're working as a part of a larger organizational structure, you have to bow down to your boss. When your software is a part of the kernel, it's not your project anymore; it's just one part of Linus's project. You're a contributor, not a leader. Just like I would not control Bcachefs's development process even if I contributed some small but important part to it, you do not control Linux's development process even though you contributed some small but important part to it.
Your core mission is evidently not shipping a reliable trustworthy filesystem. You say that, but your actions speak louder than your words. You know just as well as I do that a filesystem being in-tree rather than out-of-tree makes it significantly more reliable and trustworthy, which is why you chose to get Bcachefs merged into the kernel in the first place. Instead of working within the well-defined boundaries that's necessary to keep Bcachefs in the kernel, you've repeatedly pushed against those boundaries, belittled fellow maintainers, and in general worked hard to make yourself a persona non grata within the kernel community. The predictable outcome is that continued development of Bcachefs will have to happen out-of-tree, and your users won't gain the major reliability and trustworthiness benefits of using an in-tree filesystem. People will warn against using Bcachefs as their root filesystem, since every kernel upgrade will now carry some risk that DKMS or whatever mechanism is used to install the out-of-tree Bcachefs kernel module doesn't work with the new kernel.
And, to be honest, it doesn't matter whether or not you're "right" or "wrong" here. Maybe you're completely correct about absolutely everything and Linus, Greg, Ted, Miguel, Sasha, Josef, and everyone else involved are stupid and don't understand what it takes to develop reliable software. So what? They're your colleagues, some of them are your bosses. Everyone on Hacker News could take your side here and think you've been mistreated, it doesn't help. You'd still be thrown out of the kernel. You'd still be failing your users by not maintaining a good enough relationship with your colleagues and bosses to stay in-tree. You could be completely right on every technical matter and it does not matter.
If you play your cards right, you could maybe end up in a situation where you run the Bcachefs project entirely out-of-tree, with yourself as the supreme leader who doesn't bow down to the demands of anyone, with your own development and release process; and then someone else takes responsibility for pushing your code into the upstream kernel, following Linus's rules. They would dissect your releases and backport bug fixes while leaving out important features, in accordance with Linus's rules. Time will tell if you can find anyone to do that. And time will tell if you posess the humility necessary to let someone else ultimately control the experience most of your users will have.
Which also, at times, means appeasing people even when you are confident that they are wrong because you need their cooperation in the future. In a large complicated system, being able to work together is often more important to the system's reliability, performance, etc. than being as right as possible.
Plus even when you're confident you are in the right you might still be in the wrong. After all, the people you are disagreeing with are also superbly competent and they believe they're in the right just as you do. There can be hills worth dying on, but they ought to be very rare.
> Which also, at times, means appeasing people even when you are confident that they are wrong because you need their cooperation in the future.
Being unwilling to follow basic QA processes in preparation of a release candidate, and then doubling down by attacking the release engineer with claims the QA process doesn't apply to you because you know better, is something that is far more serious than lacking basic soft skills. It's a fireable offense in most companies.
In a company there are other employees who have your success as part of their job function. People to train you, to talk you down off a ledge, people to step in and guard you against misunderstanding or criticisms. People to advocate for you or send you home before a dispute crosses a point of no return. You're also paid to be there, to put up with the companies BS, .. the project isn't yours, it's not usually your reputation that's hurt when the company wants to make a decision you don't agree with and it goes poorly.
The context is so different, I don't think it's really comparable.
> In a company there are other employees who have your success as part of their job function.
Yes, and they enforce basic relase processes to ensure you don't break releases by skipping QA processes or introducing untested and unverified features in release candidates.
And you sure as hell don't have primadonna developers stay in the payroll for long if they start throwing tantrums and personal attacks towards fellow engineers when they are asked to follow the release process or are called out for trying to sneak untested changes in mission-critical components.
Exactly. An extremely important part of working in some hierarchical organizational structure, be that as a Linux kernel developer or as an employee at a company, is the ability to disagree with a superior's decision yet acquiesce and go along with it. Good organizations leave room for disagreement, but there always comes a point where someone in a leadership position has made a final decision and the time for debate is over.
1. Regardless of whether correct or not, it's Linus that decides what's a feature and what's not in Linux. Like he has for the last however many decades. Repair code is a feature if Linus says it is a feature.
2. Being correct comes second to being agreeable in human-human interactions. For example, dunking on x file system does not work as a defense when the person opposite you is a x file system maintainer.
3. rules are rules, and generally don't have to be "correct" to be enforced in an organization
I think your perceived "unfairness" might make sense if you just thought of these things as un-workaroundable constraints, Just like the fact that SSDs wear out over time.
When rules and authority start to take precedence over making sure things work, things have gone off the rails and we're not doing engineering anymore.
> When rules and authority start to take precedence over making sure things work, (...)
Didn't Linus lambast you for "lack of testing and collaboration before submitting patches", to the point the patches you were trying to push weren't even building?
Linus has broken the build more recently than I have. (In the time since bcachefs went upstream, we've both done that once, that I've seen).
Linus doesn't seem to believe in automated testing. He just seems to think that there's no way I could QA code as quickly as I do, but that's because I've invested heavily in automated testing and building up a community of people doing very good testing and QA work; bcachefs's automated testing is the best of any upstream filesystem that I've seen (there's a whole cluster of machines dedicated to this), and I have people running my latest branch on a daily basis.
Nearly all of the collaboration just happens on IRC.
For big changes I wait for explicit acks from testers that they've ran it and things look good; a lot of people read and review my code too, it's just typically less formal than the rest of the kernel.
> Linus has broken the build more recently than I have.
Even taking your claims at face value (which from this thread alone is a heck of a leap) I'm baffled by the way you believe this holds any relevance.
I mean, the kernel project has in place a quality assurance process designed to minimize the odds of introducing problems when preparing a release. You were caught purposely ignoring any QA process in place and trying to circumvent the whole quality assurance process and sneak into a RC features that were untested and unverified.
There is a QA process, and you purposely decided to ignore it and plow away. And then your best argument for purposely ignoring any semblance of QA is that others may or may not have broken a build before?
Come on, man. You know better than this. How desperate are you to avoid any accountability to pull these gaslighting stunts?
Yeah but you don’t get to make the calls. Linus does and your “well kernel daddy does it too” and “actually I’m doing it better than my critics understand” don’t play well with the kernel daddy (or really any bdfl). Do you not see your comment as dismissive?
All your comments are dismissive of the criticisms so far and you’re shrugging your shoulders as to why.
It’s great you’re able to reason and defend yourself but Linux as a whole is larger than you and refusing to submit to their ways will make technology move no where.
Collaborative projects don't work on pure engineering. There are significant resource management components that basically amount to therapy, psychiatry, and side show entertainment because the most critical resources are human minds.
Excellent engineering management largely isolates engineers from having to deal with this non-engineering stuff (except for the subset that is specifically for their own personal benefit)-- but open source tends to radically flatten organizations that produce software, such that every contributor must also be their own manager to a great degree.
In a well run project you don't necessarily have to be good at or even interested in all the more socially oriented components of the project organization. But if you're not you must be willing to let someone else handle that stuff and go along with their judgements even if they seem suboptimal from the narrower perspective you've adopted. If you can't then from a "collaborative development as a system" view you're a faulty component that doesn't provide the right interface for the system's requirements (and are gonna get removed!). :)
Another way to look at it is that it would be ideal if every technical element were optimal at all times. In small systems with well understood requirements this can be possible or at least close to possible. But in big complex and poorly scoped systems it's just not possible: We have imperfect information, there are conflicting requirements, we have finite time, and so on. The system as a whole will always be far from perfect. If anyone tried to make it all perfect it would just fail to make progress, deadlock, or otherwise. The management of the project is always trying to balance the imperfections. They know that their decisions are often making things worse for a local concern, but they do so with belief that over time the decisions result in a better system overall. Linux has a good reputation in large part due to a long history of making good decisions about the flaws to accept or even introduce, which issues to gloss over vs debate to death.
There are import differences between small scale (individual or a few people) engineering and larger scale engineering.
For many humans to work together over time on something very complex is hard. Structure and process are required. And sometimes they come at the expense of what some might call “pure” engineering. But they are the right trade offs to optimize for the actual goal.
I think this attitude is exactly why this happened. I would have done the same thing.
Do you argue with your school teachers that your book report shouldn't be due on Friday because it's not perfect yet?
I read several of your response threads across different websites. The most interesting to me was LWN, about the debian tools, where an actual psychologist got involved.
All the discussions seem to show the same issue: You disagree with policies held by people higher up than you, and you struggle with respecting their decisions and moving on.
Instead you keep arguing about things you can't change, and that leads people to getting frustrated and walking away from you.
It really doesn't matter how "right" you may be... not your circus, not your monkeys.
> All the discussions seem to show the same issue: You disagree with policies held by people higher up than you, and you struggle with respecting their decisions and moving on.
I think it's less subtle than that. The straw that broke the camel's back was quite literally abuse towards other kernel developers.
> You might want to read the full story on that one.
I read the full story. Everyone else can do the same. Somehow it seems you opt to skip it and prefer to be deeply invested in creating an alternative reality.
Your analogy fails to account that after "Friday" bug fixes are still allowed. A file system losing your files sounds like a bug to me.
Edit since you expanded your post:
>The most interesting to me was LWN, about the debian tools, where an actual psychologist got involved.
To me the comment was patronizing implying it was purely due to bad communication from Kent's end and shows how immature people are with running these operating system are. Putting priority on processes over the end user.
>respecting their decisions and moving on.
When this causes real pain for end users. It's validating that the decision was wrong.
> really doesn't matter how "right" you may be... not your circus
It does because it causes reputational damage for bcachefs. Even beyond reputational damage, delivering a good product to end users should be a priority. In my opinion projects as big as Debian causing harm to users should be called out instead of ignored. Else it can lead to practices like replacing dependencies out from underneath programs to become standard practice.
You still seem to be arguing that, shipping the change was the "right" thing to do. But that's not what's in dispute. Rather it is that, if what you think is right and what the person who makes the rules thinks is right are in disagreement, the adult thing to do is not to simply disregard the rules (and certainly not repeatedly, after being warned not to).
This is the difference between being smart and being wise. If the goal of all this grandstanding was that, it's so incredibly and vitally important for these patches to get into the kernel, well guess what, now due to all this drama this part of the kernel is going to go unmaintained entirely. Is that good for the users? Did that help our stated goal in any way? No.
>the adult thing to do is not to simply disregard the rules
The adult thing is to do best by the users. Critical file system bugs are worth blocking the release of any serious operating system in the real world as there is serious user impact.
>Is that good for the users?
I think it's complicated. It could allow for a faster release schedule for bug fixes which can allow for addressing file system issues faster.
Best by users in the long term is predictable processes. "RC = pure bug fixes" is a battle tested, dependable rule, absence of which causes chaos.
> Critical file system bugs are worth blocking the release
"Experimental" label EXACTLY to prevent this stuff from blocking release. Do you not know that bcachefs is experimental? This is an example of another rule which helps predictability.
This was a bug fix. My point is that there will always be bugs in the kernel so not all bugs are worth blocking a release, but losing data is worth blocking the release for.
>"Experimental" label EXACTLY to prevent this stuff from blocking release
In practice bcachefs is used in production with real users. If the experimental label prevents critical bug fixes from making it into the kernel then it would be better to just remove that label.
> In practice bcachefs is used in production with real users. If the experimental label prevents critical bug fixes from making it into the kernel then it would be better to just remove that label.
alternative perspective: those users have knowingly and willingly put experimental software into production. it was their choice, they were informed of the risk and so the consequences and responsibility are their’s.
it’s like signing up to take some experimental medicine, and then complaining no-one told me about the side-effect of persistent headaches.
that doesn’t stop anyone from being user-centric in their approach, e.g. call me if you notice any symptoms and i’ll come round your house to examine you.
… as long as everyone is clear about the fact it is experimental and the boundaries/limitations that apply, e.g. there will be certain persistent headache medicines that cannot be prescribed to you, or it might take longer for them to work because you’re on an experimental medicine.
Again: the elephant in the room is that a lot of bcachefs users are using it explicitly because they have lost a lot of data on btrfs, and they've found it to be more trustworthy.
This puts us all in a shitty situation. I want the experimental label to come off at the right time - when every critical bug is fixed and it's as trustworthy as I can reasonably make it, when I know according to the data I have that everyone is going to have a good experience - but I have real users who need this thing and need to be supported.
There is _no reason_ to interpret the experimental label in the way that you're saying, you're advocating that reliability for the end user be deprioritized versus every other filesystem.
But deprioritizing reliability is what got us into this mess.
>users are using it explicitly because they have lost a lot of data on btrfs
PLEASE, honestly, EDUCATE THESE USERS. This is still marked experimental for numerous reasons regardless of the 'planned work for 6.18'. Users who can't suffer any data loss and are repeating their mistake of using btrfs shouldn't be using a none default/standard/hardened filesystem period.
No, really. People aren't losing data on bcachefs. We still have minor hiccups that do affect usability, and I put a lot of effort into educating users about where we're at and what to expect.
In the past I've often told people who wanted to migrate off of btrfs "check back in six months", but I'm not now because 6.16 is looking amazingly solid; all the data I have says that your data really is safer on bcachefs than btrfs.
I'm not advocating for people to jump from ext4/xfs/zfs, that needs more time.
I'm not sure exactly what you are talking about, and I'm not sure you do either. The discussion that preceded bcachefs to be dropped from the Linux kernel mainline involved an attempt to sneak a new features in RC, sidestepping testing and QA work, which was followed up by yet more egregious behavior from the mantainer.
Too solve a bug with the filesystem that people in the wild were hitting. Like how Linus has said in the past with how there is a blurry line between security fixes and bug fixes. There is a blurry line between filesystem bugs and recovery features.
If you read the email it is clear that the full feature has more work needed and this is more of a basic implementation to address bugs that people hit in the wild.
> Too solve a bug with the filesystem that people in the wild were hitting.
So you acknowledge that this last episode involved trying to push new features into a RC.
As it was made abundantly clear, not only is the point of RC branches to only get tiny bugfixes after testing, the feature work that was presented was also untested and risked introducing major regressions.
All these red flags were repeatedly raised in the mailing list by multiple kernel maintainers. Somehow you're ignoring all the feedback and warnings and complains raised by people from Linux kernel maintainers, and instead you've opted to try to gaslight the thread.
bcachefs has a ton of QA, both automated testing and a lot of testers that run my latest and I work with on a daily basis. The patch was well tested; it was for codepaths that we have good regression tests for, it was algorithmically simple, and it worked perfectly to recover a filesystem from the original bug report, and it performed flawlessly again not long after.
I've explained my testing and QA on the lists multiple times.
You, like the other kernel maintainers in that thread, are making wild assertions despite having no involvement with the project.
I repeat: it sounds an awful lot like you are trying to gaslight this thread. Not cool.
When this fact was again explicitly pointed out to you by Linus himself, you even tried to bullshit Linus and try to move the goalpost with absurd claims about how somehow it was ok to force untested and unreviewed features into a RC because somehow you know better about what users want or need as if it was some kind of justification for you to skip testing and proper release processes.
You need to set aside some time for introspection because you sound like you are your own worst enemy. And those you interact with seem to be fed up and had enough of these stunts.
> Being correct comes second to being agreeable in human-human interactions
Prioritizing agreeableness above correctness is the reason the space shuttle Challenger blew up.
The bcachefs fracas is interesting and important because it's like a stain making some damn germ's organelles visible: it highlights a psychological division in tech and humanity in general between people who prioritize
1) deferring to authority, reading the room, knowing your place
and people who prioritize
2) insisting on your concept of excellence, standing up against a crowd, and speaking truth to power.
I am disturbed to see the weight position #1 has accumulated over the past decade or two. These people argue that Linus could be arbitrarily wrong and Overstreet arbitrarily right and it still wouldn't matter because being nice is critical to the success of a large scale project or something.
They get angry because they feel comfort in understanding their place in a social hierarchy. Attempts to upend that hierarchy in the name of what's right creates cognitive dissonance. The rule-followers feel a tension they can relieve only by ganging up and asserting "rules are rules and you need to follow them!" --- whether or not, at the object level, a) there are rules, b) the rules are beneficial, and c) whether the rules are applied consistently. a, b, and c are exactly those object-level does-the-o-ring-actually-work-when-cold considerations that the rule-following, rule-enforcing kind of person rejects in favor a reality built out of words and feelings, not works and facts.
They know it, too. They need Overstreet and other upstarts to fail: the failure legitimizes their own timid acquiescence to rules that make no sense. If other people are able to challenge rules and win, the #1 kind of person would have to ask himself serious and uncomfortable questions about what he's doing with his life.
It's easier and psychologically safer to just tear down anyone trying to do something new or different.
The thing is all technological progress depends on the #2 people winning in the end. As Feynmann talked about when diagnosing this exact phenomenon as the root cause of the Challenger disaster, mother nature (who appears to have taken on corrupting filesystems as a personal hobby of hers) does not care one bit about these word games or how nice someone is. The only thing that matters when solving a problem of technology is whether something works.
I think a lot of people in tech have entirely lost sight of this reality. I can't emphasize enough how absurd it is to state "[b]eing correct comes second to being agreeable in human-human interactions" and how dangerously anti-technology, anti-science, and-civilization, and anti-human this poison mindset is.
1. I laid down what I perceived as the state of things. The generalizations I drew from observing the system that is Linux development. Nowhere have I prescribed that kent "follow" my ideas. Simply that he can use these to try to understand the unfairness he feels.
2. Your anarcho-individualistic development ideas sound good in theory, but if they ever worked in practice we might have seen it be more widespread than it is today in team sizes > 3.
You should also note that if the oring is labelled experimental and there's an expectation of failure, it's development and testing will not stop the launch. The shuttle leaves when it leaves, it won't wait for the experimental oring to be done to your liking.
> Simply that he can use these to try to understand the unfairness he feels.
You're suggesting he deal with unfairness by internalizing it as virtue? That's how to make people who cheer at other people's failures.
> Your anarcho-individualistic development ideas sound good in theory
Thanks for illustrating my point. No project, >3 or <= 3, has ever made any new technology by adopting as a tenet that social agreement inside the project is more important than correctly modeling the world outside it, and you're suggesting I'm using inefficiently agreeable-sounding words to express it.
Linux is not correct. Linux has never been correct. Linux will never be correct. An incorrect belief that it is correct can only make it less correct.
You must know this when it comes to your own work. Why isn't bcachefs written in augmented rust with dependent types and formal correctness proofs for every line of code? How could there ever be a data losing bug if you had a formal proof that the file system could never lose data? Wouldn't that be more correct?
Turns out when some strong/broad notion of correctness isn't (practically) possible it is, in fact, very optional.
Good project management is all about managing resources and balancing tradeoffs. Sometimes this means making or allowing some things to be worse for the benefit of something else or in adherence to a process with a proven track record. Almost every choice makes something less correct than it could be-- with a goal of slowly inching towards a more perfect state overall in the long run.
It's also beneficial to rock the boat a bit at times, people can be wrong, processes can need improvement-- but there is a correct level, timing, and approach to achieve the best benefit. I expect that the kind of absolute approach you seem to have adopted in comments is unlikely to be successful at effective beneficial change.
You're staking out quite the postmodernist position there. All models are wrong, so who's to say that Alice's data corruption is worse than Bob's man page typo? The important thing is we stick to process with a proven track record, right?
I don't buy it. Object level considerations do matter. Alice's bug really is worse than Bob's. That "proven track record" shouldn't apply to Alice, and insisting that it does for the sake of process, in a way indifferent to the facts of the situation, is just a pretext for doing primate social hierarchy deference rituals in a situation in which they're producing a worse outcome and everyone knows it.
They do. And Kent expressed them and the linux kernel maintainers are amply qualified to hear out and make a call. I don't see a reason to think they were indifferent to the facts, they just weren't convinced by them. If they were they could have just said, "okay we think that this does qualify as a bugfix".
My understanding is the change in dispute wasn't over fixing the corruption introducing bug, but rather adding automated repair for cases where the corruption had already happened. I could easy see taking a position of "sad for people who are already corrupt, they can get their work around out of tree for now" (or heck, even forever depending on the scale of the impact).
Anyone who has been around for a while has seen their share of 'ate the horse to catch the spider to catch the fly to...' dance, of course the patch author is convinced that their repair is correct. They're almost always convinced of that or they don't submit it, so that carries little information. Because of this there is a strong preference for obviously minimal code in any kind of fix. Minimizing user suffering is important, but we also know every line of code comes with risk. The fact that the risk is not measurable on a case by case basis doesn't make it any less real.
> I don't see a reason to think they were indifferent to the facts
I don't think the Linux people thought of themselves as indifferent to facts. Nor do I think they were, not at first. Most people imagine themselves as fair-minded truth-seekers. When stakes are low, they usually act like it. It's only under pressure that people reveal whether they're more committed to PR or progress.
The shitty thing about this situation is that as the dispute escalated, the technical merits of change faded from relevance. (Linus even pulled the corruption repair work in the end!) The argument transformed into a dispute over power, pride, and personalities. Linus's commitment to technical excellence was tested. It failed. Consequently, Linux will lack a cutting-edge filesystem.
I don't even object to Linus being BDFL of Linux. Somebody has to make decisions. I think Linus was wrong to reject the corruption fix patch, but he could plausibly have been right. He had an opportunity to explain his patch rejection in such a way that Overstreet would have understood it as final but also felt heard and valued. Overstreet would have been upset, and justifiably so, but by the next merge window both sides would have cooled down and progress would have resumed.
It's when Linus banned Overstreet and bcachefs from the project that he departed irrecoverably from defensibility. Linus might think he's punishing Overstreet for his intransigence by blocking his work, but Linus is actually taking his frustration out on every Linux user instead. Overstreet's ban is rooted in primate power psychology, not technical trade-offs, and it makes everyone lose.
Technical leaders who ostracize brilliant but difficult people forever cap the amount of progress we can make in the fight against the limits of nature. They're neglecting their responsibilities as leaders to harness difficult people. It's not an easy job, but being a leader shouldn't be.
Linus took the easy way out and banned the brilliant troublemaker. He should be ashamed.
> the risk is not measurable on a case by case basis
It often is. That's why when I'm on the Linus side of a case like this, I try to avoid saying "no" and instead say "yes, if". Sometimes my counterparty pulls out an "if" that convinces me.
you might end up with the best filesystem in the world that no-one will use. you sacrificed long term sustainability for short term win.
even if It would be shipped in similar way to zfs, noone will use it for anything more important than homelab
why? with this altitude you cannot be threated serious and this imply many risks what you might came up with in the future.
another risk is they you are sole developer of this filesystem, that's also not acceptable to consider use if bcachefs seriously.
my advice would be: consider expanding team to have few developers that are able to contribute. learn to control your pride for the good of the while project. working with (and coordinating) other developers could make you understand better upstream kernel community. and given that chance you could delegate someone else with better diplomatic skills to deal with upstream in way that would be more beneficial for the whole project in long term.
It is not good when politics get in the way of good engineering.
Regardless of differing points of view on the situation, I think everyone can agree that bcachefs being actively updated on Linus tree is a good thing, right?
If you were able to work at your own pace, and someone else took the responsibility of pulling your changes at a pace that satisfies Linus, wouldn't that solve the problem of Linux having a good modern/CoW filesystem?
> Regardless of differing points of view on the situation, I think everyone can agree that bcachefs being actively updated on Linus tree is a good thing, right?
I think bcachefs is not the problem. The problem seems to be the sole maintainer who is notoriously abusive and apparently unable to work with other kernel developers.
I'm sure if another maintainer came along, one that wasn't barred for being abusive towards other maintainers, there would be no problem getting the project back in.
We were never able to get any sane and consistent policy on bugfixes, and I don't have high hopes that anyone else will have better luck. The XFS folks have had their own issues with interference, leading to burnout - they're on their third maintainer, and it's really not good for a project to be cycling through maintainers and burning people out, losing consistency of leadership and institutional knowledge.
And I'm still seeing Linus lashing out at people on practically a weekly basis. I could never ask anyone else to have to deal with that.
I think the kernel community has some things they need to figure out before bcachefs can go back in.
Keep in mind that bcachefs’s adoption and eventual mainstream acceptance are not contingent on Linus accepting your contributions or on you “removing the experimental label.” What matters is eliminating the barriers that prevent users from trying it, and that is far easier when bcachefs is an upstream filesystem—something that allows more distributions to offer it as an installer option.
> And I'm still seeing Linus lashing out at people on practically a weekly basis. I could never ask anyone else to have to deal with that.
This is a bit off‑topic, but I wouldn’t be so quick to judge how well Linus is doing his job; no one else in the world has his responsibilities.
At this point, any new kernel contributor should be familiar with Linus and have come to accept, or at least tolerate, his ways.
> I think the kernel community has some things they need to figure out before bcachefs can go back in.
Fair enough. It may be better to let things cool off while giving bcachefs more time to reach a stable state before attempting to reintegrate it into Linux development. I hope you won’t give up, because Linux needs this.
Since bcachefs is your project and you seem to enjoy working on it, it wouldn’t be a stretch to say that you need this too, right? Don’t let ego get in the way of achieving your goals.
> We were never able to get any sane and consistent policy on bugfixes, and I don't have high hopes that anyone else will have better luck.
This reads an awful lot like blatant gaslighting.
It's quite public that you were kicked out not only because of abusive behavior towards other kernel developers but also you kept ignoring any and all testing and QA guardrails, to the point you tried to push patched that failed to build.
From the very public discussion, you should sit down any discussion on bugfixes and testing because, while you are voicing strong opinions on high quality bars, the evidence suggests you were following none.
This sort of misrepresentation of your public behavior will only trash your reputation further. I encourage anyone who reads this to actually look up the mailing list threads. It’s very illuminating.
That does seem to be one of the big disconnects, yes.
In the past I've argued that I do need a relatively free hand and to be able to move quickly, and explained my reasoning: we've been at the stage of stabilization where the userbase is fairly big, and when someone reports a bug we really need to work with them and fix it in a timely manner in order to keep them testing and reporting bugs. When someone learns the system well enough to report a bug, that's an investment of time and effort on their part, and we don't want to lose that by having them get frustrated and leave.
IOW: we need to prioritize working with the user community, not just the developer community.
All that's been ignored though, and the other kernel maintainers seem to just want to ratchet down harder and harder and harder on strictness.
At this point, we're past the bulk of stabilization, and I've seen (to my surprise) that I've actually been stricter with what I consider a critical fix than other subsystems.
So this isn't even about needing special rules for experimental; this is just about having sane and consistent rules, at all.
The problem was that you weren't following the rules.
The rules were clear about the right time to merge things so they get in the next version, and if you don't, they will have to get in the version after that. I don't know the specific time since I'm not a kernel developer, but there was one.
Linus is trying to run the release cycle on a strict schedule, like a train station. You are trying to delay the train so that you can load more luggage on, instead of just waiting for the next train. You are not doing this once or twice in an emergency, but you are trying to delay every single train. Every single train, *you* have some "emergency" which requires the train to wait just for you. And the station master has gotten fed up and kicked you out of the station.
How can it be an emergency if it happens every single time? You need to plan better, so you will be ready before the train arrives. No, the train won't wait for you just because you forgot your hairbrush, and it won't even wait for you to go home and turn your oven off, even though that's really important. You have to get on the next train instead, but you don't understand that other people have their own schedules instead of running according to yours.
If it happened once, okay - shit happens. But it happens every time. Why is that? They aren't mad at you because of this specific feature. They are mad at you because it happens every time. It seems like bcachefs is not stable. Perhaps it really was an emergency just the one time you're talking about, but that means it either was an emergency all the other times and your filesystem is too broken to be in the kernel, or it wasn't an emergency all the other times and you chose to become the boy who cried wolf. In either case Linus's reaction is valid.
It's a bugfix, and bugfixes are allowed at and time - weighing regression risk against where we're at in the cycle. It was a very high severity bug, low regression risk for the fix, and we were at rc3.
If you are not acting on bad faith, I suggest you read Wittgensen
He has made a lot of work around the idea of language, which basically boil down to the fact that words have no intrinsic meaning : the meaning of a word is the meaning that a given population gives to that word
So in your case, you may be right about the meaning of the word "bugfix" in some population, but you must translate and use the meaning of that word in the "kernel" population
Damn. I was enjoying not having to deal with the fun of ZFS and DKMS, but it seems like now bcachefs will be in the same boat, either dealing with DKMS and occasional breakage or sticking with the kernel version that slowly gets more and more out of date.
The article says that bcachefs is not being removed from the mainline kernel. This looks like mostly a workaround for Linus and other kernel devs to not have to deal with Kent directly.
* Another kernel Dev takes over management and they tread it as a fork (highly unlikely according to their estimate)
* Kent hires someone to upstream the changes for him and Kent stops complaining wrt when it's getting merged
* Bcachefs gets no maintenance and will likely be removed in the next major release
I do not know him personally, but most interactions I've read online by him sounded grounded and not particularly offensive, so I'm abstaining from making any kind of judgement on it.
But while I have no stake in this, Drama really does seem to follow Kent around for one reason or another. And it's never his fault if you take him by his public statements - which I want to repeat: he sounds very grounded and not offensive to me whatsoever.
If you look at all the places where Kent has had drama, the common element is him and environments that have pretty rigid workflows. The common thread seems to be him not respecting workflows and processes that those places have, that inconvenience his goals. So, he ignores the workflows and processes of those places, and creates a constant state of friction and papercuts for those who he needs to accomplish his goals. They eventually get fed up, and either say no, not working with you anymore, or no, you’re not welcome to contribute here anymore.
He’s not super offensive, but he will tell a Debian package maintainer that their process sucks, and the should change it and they are being stupid by following that process. Overall, he seems a bit entitled, and unwilling to compromise with others. It’s not just Kent though, the areas that seem to be the most problematic for him, are when it’s an unstoppable force (Kent), and an immovable wall (Linux / Debian).
Working in the Linux kernel is well known for its frustrations and the personal conflict that it creates, to the point that there are almost no linux kernel devs/maintainers that aren’t paid to do the work. You can see a similar set of events happen with Rust4Linux people, Asahi linux project and their R4L drivers, etc.
The first one is by Linus? And his replies (at least the ones I read) are - to me- less aggressive then the rest of the mails in that chain
The second has one offensive remark:
> Get your head examined. And get the fuck out of here with this shit.
which I thought he admitted was out of line and - said sorry for. Or do I misremember? I admit once again, I'm still completely uninvolved and merely saw it play out on the internet.
If you read his replies downthread from that, Kent seems to be going through a lot of effort to not apologize, in any form, and prefers talking about how other people were mean to him.
It's complicated, no one really knows what "externally maintained" entails at the moment. Linus is not exactly poised to pull directly from Kent, and there is no solution lined-up at the moment.
Both Linus and Kent drive a hard bargain, and it's not as simple as finding someone else to blindly forward bcachefs patches. At the first sign of conflict, the poor person in the middle would have no power, no way to make anyone back down, and we'd be back to square one.
It's in limbo, and there is still time, but if left to bitrot it will be removed eventually.
Unfortunately, there's also nothing they can do if Kent says no. Say there's a disagreement on a patch that touches something outside fs/bcachefs, that person can't exactly write their own patches incorporating the feedback. They're not going to fork and maintain their own patches. They'd be stuck between a rock and a hard place, and that gets us back to a deadlock.
The issue is that I have never seen Kent back down a single time. Kent will explain in details why the rules are bullshit and don't apply in this particular case, every single time, without any room for compromise.
If the only problem was when to send patches, that would be one thing. But disagreements over patches aren't just a timing problem that can be routed around.
The key thing here is I've never challenged Linus's authority on patches outside fs/bcachefs/; I've quietly respun pull requests for that, on more than one occasion.
The point of contention here was a patch within fs/bcachefs/, which was repair code to make sure users didn't lose data.
If we can't have clear boundaries and delineations of responsibility, there really is no future for bcachefs in the kernel; my core mission is a rock solid commitment to reliability and robustness, including being responsive to issues users hit, and we've seen repeatedly that the kernel process does not share those priorities.
You may be right, but I think looking at it from a lens of who has authority and can impose their decision is still illustrating the point I'm trying to make.
To some extent drawing clear boundaries is good as a last resort when people cannot agree, but it can't be the main way to resolve disagreements. Thinking in terms of who owns what and has the final say is not the same as trying to understand the requirements from the other side to find a solution that works for everyone.
I don't think the right answer is to blindly follow whatever Linus or other people say. I don't mean you should automatically back down without technical reasons, because authority says so. But I notice I can't remember an email where concessions where made, or attemps to find a middle grounds by understanding the other side. Maybe someone can find counterexamples.
But this idea of using ownership to decide who has more authority and can impose their vision, that can't be the only way to collaborate. It really is uncompromising.
> To some extent drawing clear boundaries is good as a last resort when people cannot agree, but it can't be the main way to resolve disagreements. Thinking in terms of who owns what and has the final say is not the same as trying to understand the requirements from the other side to find a solution that works for everyone.
Agreed 100%. In an ideal world, we'd be sitting down together, figuring out what our shared priorities are, and working from there.
Unfortunately, that hasn't been possible, and I have no idea what Linus's priorities except that they definitely aren't a bulletproof filesystem and safeguarding user data; his response to journal_rewind demonstrated that quite definitively.
So that's where we're at, and given the history with other local filesystems I think I have good reason not to concede. I don't want to see bcachefs run off the rails, but given all the times I've talked about process and the way I'm doing things I think that's exactly what would happen if I started conceding on these points. It's my life's work, after all.
You'd think bcachefs's track record (e.g. bug tracker, syzbot) and the response it gets from users would be enough, but apparently not, sadly. But given the way the kernel burns people out and outright ejects them, not too surprising.
> Unfortunately, that hasn't been possible, and I have no idea what Linus's priorities except that they definitely aren't a bulletproof filesystem and safeguarding user data
Remarks like this come across as extremely patronizing, as you completely ignore what the other party says and instead project your own conclusions about the other persons motives and beliefs.
> his response to journal_rewind demonstrated that quite definitively
No, no it did not in any shape way or form do that. You had multiple other perfectly valid options to help the affected users besides getting that code shipped in the kernel there and then. Getting it shipped in the kernel was merely a convenience.
If bcachefs was established and stable it would be a different matter. But it's an experimental file system. Per definition data loss is to be expected, even if recovery is preferable.
No, bcachefs-tools wasn't an option because the right way to do this kind of repair is to first do a dry run test repair and mount, so you can verify with your eyes that everything is back as it should be.
If we had the fuse driver done that would have worked, though. Still not completely ideal because we're at the mercy of distros to make sure they're getting -tools updates out in a timely manner, they're not always as consistent with that as the kernel. Most are good, though).
Just making it available in a git repo was not an option because lots of bcachefs users are getting it from their distro kernel and have never built a kernel before (yes, I've had to help users with building kernels for the first time; it's slow and we always look for other options), and even if you know how, if your primary machine is offline the last thing you want to have to do is build a custom rescue image with a custom kernel.
And there was really nothing special about this than any other bugfix, besides needing to use a new option (which is also something that occasionally happens with hotfixes).
Bugs are just a fact of life, every filesystem has bugs and occasionally has to get hotfixes out quickly. It's just not remotely feasible or sane to be coming up with our own parallel release process for hotfixes.
That you or the user dislike some of the downsides does not invalidate an option.
I will absolutely agree with you that merging that repair code would be vastly preferable to you and the users. And again, if bcachefs was mature and stable, I absolutely think users should get a way to repair ASAP.
But bcachefs is currently experimental and thus one can reasonably expect users to be prepared to deal with the consequences of that. And hence the kernel team, with Linus at the top, should be able to assume this when making decisions.
If you have users who are not prepared for this, you have a different problem and should seek how to fix that ASAP. Best would probably be to figure out how to dissuade them from installing. In any case, not doing something to prevent that scenario would be a disservice to those users.
bcachefs has had active users, with real data that they want to protect, since before it was merged.
A lot of the bcachefs users are using it explicitly because they've been burned by btrfs and need something more reliable.
I am being much, much more conservative with removing the experimental label than past practice, but I have been very explicit that while it may not be perfect yet and users should expect some hiccups, I support it like any other stable production filesystem.
That's been key to getting it stabilized: setting a high expectations. Users know that if they find a critical bug it's going to be top priority.
Given the bug fixes and changes, the experimental flag seems quite appropriate to me. That's not a bad thing.
However, it was put in the kernel as experimental. That carries with it implications.
As such, while it's very commendable that you wish to support the experimental bcachefs as-if it was production ready, you cannot reasonably impose that wish upon the rest of the kernel.
That said I think you and your small team is doing a commendable job, and I strongly wish you succeed in making bcachefs feature complete and production ready. And I say that as someone who really, really likes ZFS and run it on my Linux boxes.
Fair enough. As someone who has lost filesystems to bugs and files to corrupted blocks, I definitely appreciate the work you've done on repair and reliability.
I think there's room to have your cake and eat it too, but I certainly can't blame you for caring about quality, that much is sure.
Linus T is responsible for everything in Linux, it is his project and he is the maintainer. He can do everything he wants in his branch and people just have to accept it. If you want to be responsible you have to fork Linux.
> Damn. I was enjoying not having to deal with the fun of ZFS and DKMS, but it seems like now bcachefs will be in the same boat, either dealing with DKMS and occasional breakage or sticking with the kernel version that slowly gets more and more out of date.
Your distro could very easily include bcachefs if it wishes? Although I think the ZFS + Linux situation is mostly Linux religiosity gone wild, that very particular problem doesn't exist re: bcachefs?
The problem with bcachefs is the problem with btrfs. It mostly still doesn't work to solve the problems ZFS already solves.
> Although I think the ZFS + Linux situation is mostly Linux religiosity gone wild,
I think the Linux Kernel just doesn't want to be potentially in violation of Oracle's copyrights. That really doesn't seem that unreasonable to me, even if it feels pointless to you.
Who would use a file system which essentially seems to be developed by a single person? A bus-factor of one seems unacceptable for a FS. But maybe I am wrong and there are other developers, then why do they not take over upstreaming if the main developer is unable to collaborate with the kernel community.
I did for my laptop and Raspberry Pi which I didn't care much about. It was great being able to interact with Kent over IRC to sort out problems and when he is actually available he's really helpeful, but it made me realise that bcachefs has a long ways to go, and I have come to the realisation bus factor 1 is not something I'd want long term.
I'm in that boat. I'm looking over at that Synology unit sitting in the corner of my living room, knowing it'll be the last of its kind to live here, and wondering what its replacement will look like. FreeBSD's been good to me and it might be time to reintroduce myself to it.
Fwiw, I'm running a NAS on btrfs (on top of mdadm raid as I don't fully trust the btrfs raid, and the recovery tools seem worse). It seems to be working well so far.
Being able to do snapshots easily is really nice. I have a script that makes hourly snapshots and keeps the N latest, which protects me against a bunch of pebkac errors
I do periodic backups to non-btrfs storage though. I need backups anyway so it seemed like an easy way to de-risk
Go for it. I made the switch ~10 years ago and didn't regret it at all. First-class, rock solid ZFS integration. Saved my data on more than one occasion.
people understand they're different, but if bcachefs is out, then that leaves btrfs as the only modern in-tree filesystem, but apparently you can't trust it with important data either.
I've been using btrfs on my NAS for years and have not had any problems. I suspect there are a hell of a lot of people like me you will not hear about because people don't generally get as vocal when things just work.
The venn diagram of "people who want a modern copy-on-write filesystem with snapshots to manage large quantities of data" and "people who want a massive pool of fault-tolerant storage" (e.g. building a NAS) has some pretty significant overlap.
The latter is where BTRFS is still hobbled: While the RAID-0, RAID-1, & RAID-10 modes work absolutely fine, the RAID-5 & RAID-6 modes are still broken, with an explicit warning during mkfs time (and in the manpages) that the feature is still experimental and should not be used to hold data that you care about retaining. This has, and continues to, bite people, with terabytes of data loss (backups are important, people!). That then sours them on every other aspect of ever using BTRFS again.
Is it just me or does Kent seem self-destructively glued to his own idea of how kernel development should work?
I don’t doubt that people on all sides have made mis-steps, but from the outside it mostly just seems like Kent doesn’t want to play by the rules (despite having been given years of patience).
It's not just kernel development. In the lwn thread, he mentioned and then demonstrated difficulty working with Debian developers as well.
IMHO, what his communications show is an unwillingness to acknowledge that other projects that include his work have focus, priorities, and policies that are not the same as that of his project. Also, expecting exceptions to be made for his case, since exceptions have been made in other cases.
Again IMHO, I think he would be better off developing apart with an announcement mailing list. When urgent changes are made, send to the announcement list. Let other interested parties sort out the process of getting those changes into the kernel and distributions.
If people come with bug reports from old versions distributed by others, let them know how to get the most up to date version from his repository, and maybe gently poke the distributors.
Yes, that means users will have older versions and not get fixes immediately. But what he's doing isn't working to get fixes to users immediately either.
Being an outsider to this whole scene, the whole thread reads very differently to me.
Kent seems very patient in explaining his position (and frustrations arising from other people introducing bugs to his code) and the kernel & debian folks are performing a smearing campaign instead of replying to what I see are genuine problems in the process. As an example, the quotes that are referenced by user paravoid are, imho, taken out of context (judging by reading the provided links).
There probably is a lot more history to it, but judging from that thread it's not Kent who looks like a bad guy.
This is one of the problems: Kent is frequently unable to accept that things don't go his way. He will keep bringing it up again and again and he just grinds people down with it. If you see just one bit of it then it may seem somewhat reasonable, but it's really not because this is the umpteenth time this exact discussion is happening and it's groundhog day once again.
This is a major reason why people burn out on Kent. You can't just have a disagreement/conflict and resolve it. Everything is a discussion with Kent. He can't just shrug and say "well, I think that's a bit silly, but okay, I can work with it, I guess". The options are 1) Kent gets his way, or 2) he will keep pushing it (not infrequently ignoring previous compromises, restarting the discussion from square one). Here too, the Debian people have this entire discussion (again) forced upon them by Kent's comments in a way that's just completely unnecessary and does nothing to resolve anything.
Even as an interested onlooker who is otherwise uninvolved and generally more willing to accept difficult behaviour than most people, I've rather soured on Kent over time.
You do realize that data integrity issues are not "live and let live" type things, right?
And there's a real connection to the issue that sparked all this drama in the kernel and the Debian drama: critical system components (the kernel, the filesystem, and others) absolutely need to be able to get bugfixes in a timely manner. That's not optional.
With Debian, we had a package maintainer who decided that unbundling Rust dependencies was more important than getting out updates, and then we couldn't get a bugfix out for mount option handling. This was a non-issue for every other distro with working processes because the bug was fixed in a few days, but a lot of Debian users weren't able to mount in degraded mode and lost access to their filesystems.
In the kernel drama, Linus threw a fit over a repair code to recover from a serious bug and make sure users didn't lose data, and he's repeatedly picked fights over bugfixes (and even called pushing for getting bugfixes out "whining" in the past).
There are a lot of issues that there can be give and take on, but getting fixes out in a timely manner is just part of the baseline set of expectations for any serious project.
Look, I get where you're coming from. It's not unreasonable. I've said this before.
But there are also reasons why things are the way they are, and that is also not unreasonable. And at the end of the day: Linus is the boss. It really does come down to that. He has dozens of other subsystem maintainers to deal with and this is the process that works for him.
Similar stuff applies to Debian. Personally, I deeply dislike Debian's inflexible and outmoded policy and lack of pragmatism. But you know, the policy is the policy, and at some point you just need to accept that and work with it the best you can.
It's okay to make all the arguments you've made. It's okay to make them forcefully (within some limits of reason). It's not okay to keep repeating them again and again until everyone gets tired of it and seemingly just completely fail to listen to what people are sating. This is where you are being unreasonable.
I mean, you *can* do that, I guess, but look at where things are now. No one is happy with this – certainly not you. And it's really not a surprise, I already said this in November last year: "I wouldn't be surprised to see bcachefs removed from the kernel at some point".[1] To be clear: I didn't want that to happen – I think you've done great work with bcachefs and I really want it to succeed every which way. But everyone could see this coming from miles.
> But there are also reasons why things are the way they are, and that is also not unreasonable.
It is unreasonable if it leads to users losing data. At this point, the only reasonable thing is to either completely remove support for bcachefs or give timely fixes for critical bugs, there's no middle position that won't willfully lead to users losing their data.
This used to be the default for distributions like Debian some time ago. You only supported foundational software if you were willing to also distribute critical fixes in a timely manner. If not, why bother?
For all other issues, I guess we can accept that things are the way they are.
> It is unreasonable if it leads to users losing data.
Changing the kernel development process to allow adding new features willy-nilly late in the RC cycle will lead to much worse things than a few people using an experimental file system losing their data in the long term.
The process exists for a reason, and the kernel is a massive project that includes more than just one file system, no matter how special its developers and users believe it is.
There's no need for kernel development process to change. New features go in during RCs all the time, it's always just a risk vs. reward calculation, and I'm more conservative with what I send outside the merge window that a lot of subsystems.
Not too familiar with the kernel process for this, but for Linux distros there are ways to respond to critical issues including data corruption and data loss. It's just that you have to follow their processes to do this, such as producing a minimal patch that fixes the problem which is backported into the older code base (and there's a reason for that too: end users don't want churn on their installed systems, they want an install to be stable and predictable). Since distros are how you ultimately get your code into users' hands, it's really their way or the highway. Telling the distros they are wrong isn't going to go well.
For the Debian thing, I'm not sure on the specifics for bcachefs-progs (I'm going by what the author is reporting and some blog posts) but I think the problem with Debian is that they willfully ignore when upstream says "this is only compatible with this library version 2.1.x" and will downgrade or upgrade the library into not supported versions, to match the versions used in other programs already packaged. This kind of thing can introduce subtle, hard to debug bugs. It's a mess and problems are usually reported to upstream, that's a recurrent problem for Rust programs packaged in Debian. Rust absolutely isn't this language where if it compiles, it works, no matter how much people think otherwise.
And this is happening even though it's common for Debian to package the same C library multiple times, like, libfuse2 and libfuse3. This could be done for Rust libraries if they wanted to.
But that's exactly the point here. In the context of a whole distribution, you don't want to update some package to a new version (on a stable branch), because that would affect lots of other packages that depended on that one. It may even be that other packages cannot work with the new updated dependency. Even if they can, end users don't want versions to change greatly (again, along a stable branch). Upstreams should accept this reality and ensure they support the older libraries as far as possible. Or they can deny reality and then we get into this situation.
And carrying multiple versions is problematic too as it causes increased burdens for the downstream maintainers.
I'd argue that libfuse is a bit of a special case since the API between 2 & 3 changed substantially, and not all dependencies have moved to version 3 (or can move, since if you move the v3 then you break on other platforms like BSD and macOS that still only support the v2 API).
Rust and especially Golang are both a massive pile of instability because the developers don't seem to understand that long term stable APIs are a benefit. You have to put in a bit of care and attention rather than always chasing the new thing and bundling everything.
XFS has burned through maintainers, citing "upstream burnout". It's not just bcachefs that things are broken for.
And it was burning me out, too. We need a functioning release process, and we haven't had that; instead I've been getting a ton of drama that's boiled over into the bcachefs community, oftentimes completely drowning out all the calmer, more technical conversations that we want.
It's not great. It would have been much better if this could have been worked out. But at this point, cutting ties with the kernel community and shipping as a DKMS module is really the only path forwards.
It's not the end of the world. Same with Debian; we haven't had those issues in any other distros, so eventually we'll get a better package maintainer who can work the process or they'll figure out that their Rust policy actually isn't as smart as they think it is as Rust adoption goes up.
I'm just going to push for doing things right, and if one route or option fails there's always others.
Yeah, that's in place. If nothing else, the decades of successful releases indicate that the process -at worst- functions. Whether that process fits your process is irrelevant.
> You have to consider the bigger picture.
Right back at you. Buddy, you need to learn how to lose.
It may seem like that on the surface, but you should recognise that these sorts of situations seem to follow Kent around.
So either Kent is on a righteous crusade against unreasonable processes within the Kernel, Debian, and every other large software project he interacts with. Or there's something about the way Kent interacts with these projects that causes friction.
I like Bcachefs, I think Kent is a very talented developer, but I'm not going to pretend that he is innocent in all this.
OTOH, the only named projects I've seen are Linux and Debian, which are 2 of the most toxic projects I'm aware of (I'm pretty sure the C++ standards committee beats the two of them combined).
But the problem with comparisons is that even if you're better than nuclear waste being dumped into the aquifer, you still might be enough to light a river on fire.
I've been involved in C++ standardization. In my country's national body, it is nothing like what goes on in Linux kernel development, even when there are strong disagreements amongst members.
I actually like the idea of the maintainer going out of his way to make sure that my filesystem is safe to use. Even if it goes against the established rules. And I'm saying that as someone who actually likes both Linux and Debian.
It's a strawman to imagine that Debian doesn't have a way to ensure filesystems are safe and to respond to critical bugs that might cause data corruption. It's just that you have to follow their rules to do it. (And broadly the same rules apply to the other big distros as well).
I think he's too exposed to users reports, because anybody that shows up is in a potential data loss situation. So he's very focused on making everything as bug free as possible, and getting frustrated that people with different focus are not propagating the fixes as fast as possible.
Almost makes me think the distros light-forking it to just change the name (IceWeasel style) so the support requests don't get to him will help… probably not, though, because people will still go there because they want to recover their data.
I think Kent is in the wrong here, but it really doesn't help that the kernel people from Linus on down are seemingly unable to explain the problem, and instead resort to playground insults. Apart from being unprofessional and making for a hostile work environment, it doesn't really communicate why Kent's actions are problematic, so I've some sympathy for his not believing that they are.
People have explaining things, at great length, many times. Many of these have been posted to HN before, either as submissions or comments.
Kent just does not listen. Every time the discussion starts from the top. Even if you do agree on some compromise, in a month or two he'll just do the same thing again and all the same arguments start again.
You can't expect people to detail about four or five years of context in every single engagement for the benefit of interested 3rd parties like you or me.
> it doesn't really communicate why Kent's actions are problematic
I agree that the kernel community can be a hostile environment.
Though I’d argue that people _have_ tried to explain things to Kent, multiple times. At least a few have been calm, respectful attempts.
Sadly, Kent responds to everything in an email except the key part that is being pointed out to him (usually his behavior). Or deflects by going on the attack. And generally refuses to apologise.
Definitely not saying that the problems are all on one side here. Agreed that going on the attack was bad (as well as dumb).
I just think that while, yes, the kernel folks have tried to explain, they didn't explain well. The "why" of it is a people thing. Linus needs to be able to trust that people he's delegated some authority will respect its limits. The maintainers need to be able to trust that each other maintainer will respect the area that they have been delegated authority over. I think that Kent genuinely doesn't get this.
> Sadly, Kent responds to everything in an email except the key part that is being pointed out to him (usually his behavior).
Behaviour sounds like the least important part of code contributions. I smell overpowered, should've-been-a-kindergarten-teacher code of conduct person overreach.
No, Kent has generally had a nice tone. The issue is that he has repeatedly violated the rules about code contributions. For example by including new features together with several bug-fixes during rc. That is not a CoC issue, it is not respecting the rules of patch submission and not respecting the time of the kernel maintainers.
No. As someone who likes bcachefs and even literally donates to Kent's patreon, the way he has gone about engaging with the kernel community is not productive. Unfortunately.
CoC isn't even the issue, he constantly breaks kernel development rules relating to the actual code, then starts arguments with everyone up to and including Linus when he gets called out, and aggressively misses the point every time. Then starts the same argument all over again 6 weeks later.
And, like, if you don't like some rules, then you can have that discussion, but submitting patches you know will be rejected and then re-litigating your dislike of the rules is a waste of everyone's time.
I've seen plenty of times where the problems has been explained to Kent. But he just don't give a shit about the problems of people that isn't himself or that doesn't use his file system experiences.
It seems very clear to me that it's almost always a "you can't argue canon law with the Pope" situation - the rules say no new features, and it doesn't matter what the definition of "feature" is if the definition AND the rule come from the same person, Linus.
You can't win a rules-lawyer argument with the rulemaker.
It's sort of frustrating that this constantly comes up. It's true that btrfs does have issues with RAID-5 and RAID-6 configurations, but this is frequently used (not necessarily by you) as some kind of gotcha as to why you shouldn't use it at all. That's insane. I promise that disk spanning issues won't affect your use of it on your tiny ThinkPad SSD.
But RAID-6 is the closest approximation to raid-z2 from ZFS! And raid-z2 is stable for a decade+. Indeed btrfs works just fine on my laptop. My point is that Linux lacks ZFS-like fs for large multi disc setups.
BTRFS does have stable, usable multi-disk support. The RAID 0, 1, and 10 modes are fine. I've been using BTRFS RAID1 for over a decade and across numerous disk failures. It's by far the best solution for building a durable array on my home server stuffed full of a random assortment of disks—ZFS will never have the flexibility to be useful with mismatched capacities like this. It's only the parity RAID modes that BTRFS lacks, and that's a real disadvantage but is hardly the whole story.
I thought the usual recommendation was to use mdadm to build the disk pool and then use btrfs on top of that - but that might be out of date. I haven't used it in a while
This is very much a big compromise where you decide for yourself that storage capacity and maybe throughput are more important than anything else.
The md metadata is not adequately protected. Btrfs checksums can tell you when a file has gone bad but not self-heal. And I'm sure there are going to be caching/perf benefits left on the table not having btrfs manage all the block storage itself.
I thought most distros have basically disabled the footgun modes at this point; that is, using the configuration that would lose data means you'd need to work hard to get there (at which point you should have been able to see all the warnings about data loss).
Maybe this specific feature should be marked as unstable and default to disabled on most kernel builds unless you add something like btrfs.experimental=1 to the kernel line or something
Parity support in multi-disk arrays is older than I am, it's a fairly standard feature. btrfs doesn't support this without data loss risks after 17 years of development.
If you're not interested in a multi-disk storage system that doesn't have (stable, non-experimental) parity modes, that's a valid personal preference but not at all a justification for the position that the rest of the features cannot be stable and that the project as a whole cannot be taken seriously by anyone.
as it turns out raid 5 and 6 being broken is kind of a big deal for people. its also far from ideal that the filesystem has random landmines that you can accidentally step on if you don't happen to read hacker news every day.
If you don't trust btrfs raid it's perfectly possible to run btrfs on top of lvm or mdadm raid. Then you have btrfs in a prety happy case single device mode. Also the recovery tooling is more well known and tested
I’ve been running btrfs on a little home Debian NAS for
over a year now. I have no complaints - it’s been working smoothly, doing exactly what I want. I have a heterogeneous set of probably 6 discs, >20TB total, no problems.
*caveat: I’m using RAID 10, not a parity RAID. It could have problems with parity RAID. So? If you really really want RAID 5, then just use md to make your RAID 5 device and put btrfs on top.
So do I and BTRFS is extremely good these days. It's also much faster than ZFS at mounting a disk with a large number of filesystems (=subvolumes), which is critical for building certain types of fileservers at scale. In contrast, ZFS scales horribly as the number of filesystems increases, where btrfs seems to be O(1). btrfs's quota functionality is also much better than it used to be (and very flexible), after all the work Meta put into it. Finally, having the option of easy writable snapshots is nice. BTRFS is fantastic!
> It's also much faster than ZFS at mounting a disk with a large number of filesystems (=subvolumes), which is critical for building certain types of fileservers at scale.
Now you've piqued my curiosity; what uses that many filesystems/subvolumes? (Not an attack; I believe you, I'm just trying to figure out where it comes up)
It can be useful to create a file server with one filesystem/subvolume per user, because each user has their own isolated snapshots, backups via send/recv are user-specific, quotas are easier, etc. If you only have a few hundred users, ZFS is fine. But what if you have 100,000 users? Then just doing "zpool import" would take hours, whereas mounting a btrfs filesystem with 100,000 subvolumes takes a seconds. This complexity difference was a show stopper for me to architect a certain solution on top of ZFS, despite me personally loving ZFS and having used it for a long time. The btrfs commands and UX are really awkward (for me) compared to ZFS, but btrfs is extremely efficient at some things where ZFS just falls down.
The main criticism in this thread about btrfs involves multidisk setups, which aren't relevant for me, since I'm working on cloud systems and disk storage is abstracted away as a single block device.
Incidentally, the application I'm reworking to use btrfs is cocalc.com. One of our main use cases is distributed assignments to students in classes, as part of the course management functionality. Imagine a class with 1500 students all getting an exact copy of a 50 MB folder, which they'll edit a little bit, and then it will be collected. The copy-on-write functionality of btrfs is fantastic for this use case (both in speed and disk usage).
Also, the out-of-band deduplication for btrfs using https://github.com/Zygo/bees is very impressive and flexible, in a way that ZFS just doesn't match.
I seem to recall some discussion in one of the OpenZFS leadership meetings about slow pool imports when you have many datasets. Sadly I can't recall the details, but at least it seems to be on their radar.
As far as I understand, a core use case at Meta was build system workers starting with prepopulated state and being able to quickly discard the working tree at the end of the build. CoW is pretty sweet for that.
I lost my BTRFS RAID-1 array a year or two ago when one of my drives went offline. Just poof, data gone and I had to rebuild. I am not saying that it happens all the time, but I wouldn't say it's completely bulletproof either.
All the anecdotes I see tend to be “my drive didn’t mount, and I tried nothing before giving up because everyone knows BTRFS sux lol”. My professional experience meanwhile is that I’ve never once been able to not (very easily!) recover a BTRFS drive someone else has given up for dead… just by running its standard recovery tools.
There's been much more recent friction between various parties, so I don't think this most recent news is a direct result of that decision. See for instance https://news.ycombinator.com/item?id=44464396
The whole situation is moronic at best. Linux needs a decent modern filesystem in tree. ZFS would easily be it, but unfortunately Sun decided back in the '00s to fuck Linux because they wanted to push Solaris instead. Little they knew ZFS ended up being FreeBSD top feature for years.
Btrfs is constantly eating people data, it's a bad joke nowadays. Right now on Linux you're basically forced to constantly deal with out of tree ZFS or accept that thinly provisioned XFS over LVM2 will inevitably cause you to lose data.
Btrfs is NOT constantly eating people data. You have nothing to back this statement.
It's widely used and the default filesystem of several distributions.
Most of the problems are like for the other filesystem: caused by the hardware.
I've been using it for more than 10 years without any problem and enjoy the experience. And like for any filesystem, I backup my data frequently (with btrbk, thanks for asking).
Tell it to my data then. I was 100% invested in Btrfs before 2017, the year where I lost a whole filesystem due to some random metadata corruption. I then started to move all of my storage to ZFS, which has never ever lost me a single byte of data yet despite the fact it's out of tree and stuff. My last Btrfs filesystem died randomly a few days ago (it was a disk in cold storage, once again random metadata corruption, disk is 100% healthy). I do not trust Btrfs in any shape and form nowadays. I also vastly prefer ZFS tooling but that's irrelevant to the argument here. The point is that I've never had nothing but pain from btrfs in more than a decade
> Btrfs is NOT constantly eating people data. You have nothing to back this statement.
Constantly may be a strong word, but there is a long line of people sharing tales of woe. It's good that it works for you, but that's not a universal experience.
> It's widely used and the default filesystem of several distributions.
As a former user, that's horrifying.
> Most of the problems are like for the other filesystem: caused by the hardware.
The whole point of btrfs over (say) ext4 is that it's supposed to hold up when things don't work.
btrfs has eaten my data, which was probably my bad for trying out a newly stable filesystem around 15 years ago. there are plenty of bug reports of btrfs eating other people's data in the years since.
It's probably mostly stable now, but it's silly to act like it's a paragon of stability in the kernel.
> but it's silly to act like it's a paragon of stability in the kernel.
And it's dishonest to act like bugs from 15 years ago justify present-tense claims that it is constantly eating people's data and is a bad joke. Nobody's arguing that btrfs doesn't have a past history of data loss, more than a decade ago; that's not what's being questioned here.
There's no need to call someone pointing out instability of a filesystem dishonest. That's bad faith.
I don't get why folks feel the need to come out and cheer for a tool like this, do you have skin in the game on whether or not btrfs is considered stable? Are you a contributor?
I don't get it.
But since you asked - let me find some recent bugs.
ext4 has "recent" correctness and corruption bugfixes. Just search through the 6.x and 5.x changelogs for "ext4:" to find them. It turns out that nontrivial filesystems are complex things that are hard to get right, even after decades of development by some of the most safety-and-correctness-obsessed people.
I've been using btrfs as the primary filesystem on my daily-driver PCs since 2009, 2010 or so. The only time I've had trouble with it was in the first couple of years I started using it. I've also used it as the primary FS on production systems at $DAYJOB. It works fine.
I would love to use xfs on my NAS setup but no checksums is a deal breaker. Checksums have saved me multiple times where I've been able to either repair files with parity or restore from backups.
Without checksums I would have overwritten my backup data and lost a ton of files because the drives were reported that everything was OK for months while writing corrupt files.
Some collaboration failing there, no technical reasons for it. I hope they'll sort this nonsense out and it will go back to normal upstream maintenance.
The sad part, that despite the years of the development BTRS never reached the parity with ZFS. And yesterday's news "Josef Bacik who is a long-time Btrfs developer and active co-maintainer alongside David Sterba is leaving Meta. Additionally, he's also stepping back from Linux kernel development as his primary job." see https://www.phoronix.com/news/Josef-Bacik-Leaves-Meta
There is no 'modern' ZFS-like fs in Linux nowadays.
There's literally ZFS-on-linux and it works great. And yes, I will once again say Linus is completely wrong about ZFS and the multiple times he's spoken about it, it's abundantly clear he's never used it or bothered to spend any time researching its features and functionality.
https://zfsonlinux.org/
ZFS deserves an absolutely legendary amount of respect for showing us all what a modern filesystem should be - the papers they wrote, alone, did the entire filesystem world such a massive service by demonstrating the possibilities of full data integrity and why we want it, and then they showed it could be done.
But there's a ton of room for improvement beyond what ZFS did. ZFS was a very conservative design in a lot of ways (rightly so! so many ambitious projects die because of second system syndrome); notably, it's block based and doesn't do extents - extents and snapshots are a painfully difficult combination.
Took me years to figure that one out.
My hope for bcachefs has always been to be a real successor to ZFS, with better and more flexible management, better performance, and even better robustness and reliability.
Long road, but the work continues.
> But there's a ton of room for improvement beyond what ZFS did.
Say more? I can't say I've really thought that much about filesystems and I'm curious in what direction you think they could be taken if time and budget weren't an issue.
that would be bcachefs :)
It's an entirely clean slate design, and I spent years taking my time on the core planning out the design; it's as close to perfect as I can make it.
The only things I can think of that I would change or add given unlimited time and budget: - It should be written in Rust, and even better a Rust + dependent types (which I suspect could be done with proc macros) for formal verification. And cap'n proto for on disk data structures (which still needs Rust improvements to be as ergonomic as it should be) would also be a really nice improvement.
- More hardening; the only other thing we're lacking is comprehensive fault injection testing of on disk errors. It's sufficiently battle hardened that it's not a major gap, but it really should happen at some point.
- There's more work to be done in bitrot prevention: data checksums really need to be plumbed all the way into the pagecache
I'm sure we'll keep discovering new small ways to harden, but nothing huge at this point.
Some highlights: - It has more defense in depth than any filesystem I know of. It's as close to impossible to have unrecoverable data loss as I think can really be done in a practical production filesystem - short of going full immutable/append only.
- Closest realization of "filesystem as a database" that I know of
- IO path options (replication level, compression, etc.) can be set on a per file or directory basis: I'm midway through a project extending this to do some really cool stuff, basically data management is purely declarative.
- Erasure coding is much more performant than ZFS's
- Data layout is fully dynamic, meaning you can add/remove devices at will, it just does the right thing - meaning smoother device management than ZFS
- The way the repair code works, and tracking of errors we've seen - fantastic for debugability
- Debugability and introspection are second to none: long bug hunts really aren't a thing in bcachefs development because you can just see anything the system is doing
There's still lots of work to do before we're fully at parity with ZFS. Over the next year or two I should be finishing erasure coding, online fsck, failure domains, lots more management stuff... there will always be more cool projects just over the horizon
Thanks for bcachefs and all the hard work you’ve put in it. It’s truly appreciated and hope you can continue to march on and not give up on the in-kernel code, even if it means bowing to Linus.
On a different note, have you heard about prolly trees and structural sharing? It’s a newer data structure that allows for very cheap structural sharing and I was wondering if it would be possible to build an FS on top of it to have a truly distributed fs that can sync across machines.
I have not seen those...
> Closest realization of "filesystem as a database" that I know of
More so than BFS?
https://en.m.wikipedia.org/wiki/Be_File_System
"Like its predecessor, OFS (Old Be File System, written by Benoit Schillings - formerly BFS), it includes support for extended file attributes (metadata), with indexing and querying characteristics to provide functionality similar to that of a relational database."
What BFS did is very cool, and I hope to add that to bcachefs someday.
But I'm talking more about the internals than external database functionality; the inner workings are much more fundamental.
bcachefs internally is structured more like a relational database than a traditional Unix filesystem, where everything hangs off the inode. In bcachefs, there's an extents btree (read: table), an inodes btree, a dirents btree, and a whole bunch of others - we're up to 20 (!).
There's transactions, where you can do arbitrary lookups, updates, and then commit, with all the database locking hidden from you; lookups within a transaction see uncommitted updates from that transaction. There's triggers, which are used heavily.
We don't have the full relational model - no SELECT or JOIN, no indices on arbitrary fields like with SQL (but you can do effectively the same thing with triggers, I do it all the time).
All the database/transactional primitives make the rest of the codebase much smaller and cleaner, and make feature development a lot easier than what you'd expect in other filesystems.
I happen to work at a company that uses a ton of capnp internally and this is the first time I've seen it mentioned much outside of here. Would you mind describing what about it you think would make it a good fit for something like bcachefs?
Cap'n proto is basically a schema language that gets you a well defined in-memory representation that's just as good as if you were writing C structs by hand (laboriously avoiding silent padding, carefully using types with well defined sizes) - without all the silent pitfalls of doing it manually in C.
It's extremely well thought out, it's minimalist in all the right ways; I've found the features and optimizations it has to be things that are borne out of real experience that you would want end up building yourself in any real world system.
E.g. it gives you the ability to add new fields without breaking compatibility. That's the right way to approach forwards/backwards compatibility, and it's what I do in bcachefs and if we'd been able to just use cap'n proto it would've taken out a lot of manual fiddly work.
The only blocker to using it more widely in my own code is that it's not sufficiently ergonomic in Rust - Rust needs lenses, from Swift.
I'm saddened by this turn of event, but I hope this won't deter you from working on bcachefs on your own term and eventually see a reconciliation into the kernel at one point.
Thank you for your hard work.
> - Erasure coding is much more performant than ZFS's
any plans for much lower rates than typical raid?
Increasingly modern high density devices are having block level failures at non-trivial rates instead of or in addition to whole device failures. A file might be 100,000 blocks long, adding 1000 blocks of FEC would expand it 1% but add tremendous protection against block errors. And can do so even if you have a single piece of media. Doesn't protect against device failures, sure, though without good block level protection device level protection is dicey since hitting some block level error when down to minimal devices seems inevitable and having to add more and more redundant devices is quite costly.
It's been talked about. I've seen some interesting work to use just a normal checksum to correct single bit errors.
If there's an optimized implementation we can use in the kernel, I'd love to add it. Even on modern hardware, we do see bit corruption in the wild, it would add real value.
It's pretty straight forward to use a normal checksum to correct single or even more bit errors (depending on the block size, choice of checksum, etc). Though I expect those bit errors are bus/ram, and hopefully usually transient. If there is corruption on the media, the whole block is usually going to be lost because any corruptions means that its internal block level FEC has more errors than it can fix.
I was more thinking along the lines of adding dozens or hundreds of correction blocks to a whole file, along the lines of par (though there are much faster techniques now).
You'd think that, wouldn't you? But there are enough moving parts in the IO stack below the filesystem that we do see bit errors. I don't have enough data to do correlations and tell you likely causes, but they do happen.
I think SSDs are generally worse than spinning rust (especially enterprise grade SCSI kit), the hard drive vendors have been at this a lot longer and SSDs are massively more complicated. From the conversations I've had with SSD vendors, I don't think they've put the some level of effort into making things as bulletproof as possible yet.
One thing to keep in mind is that correction always comes as some expense of detection.
Generally a code that can always detect N errors can only always correct N/2 errors. So you detect an errored block, you correct up to N/2 errors. The block now passes but if the block actually had N errors, your correction will be incorrect and you now have silent corruption.
The solution to this is just to have an excess of error correction power and then don't use all of it. But that can be hard to do if you're trying to shoehorn it into an existing 32-bit crc.
How big are the blocks that the CRC units cover in bcachefs?
bcachefs checksums (and compresses) at extent granularity, not block; encoded extents (checksummed/compressed) are limited to 128k by default.
This is a really good tradeoff in practice; the vast majority of applications are doing buffered IO, not small block O_DIRECT reads - that really only comes up in benchmarks :)
And it gets us better compression ratios and better metadata overhead.
We also have quite a bit of flexibility to add something bigger to the extent for FEC, if we need to - we're not limited to a 32/64 bit checksum.
You're replying to the bcachefs author, I expect his response will be fairly obvious. ;)
Can you explain your definition of "extent"? Because under every definition I dealt with in filesystems before, ZFS is extent based at the lower layer, and flat out object storage system (closer to S3) at upper layer.
I've recently started using OpenZFS after all these years, and after weighting all the pros and cons of BTRFS, mdadm, etc, ZFS is clearly on top for availability and resiliency.
Hopefull we can get to a point where Linux has a native, and first-class modern alternative to ZFS with BcacheFS.
Sometimes I wonder how someone so talented could be so wrong about ZFS, and it makes me wonder if his negative responses to ZFS discussions could be a way of creating plausible deniability in case Oracle's lawyers ever learn how to spell ZFS.
As far as I know, the license incompatibility is on the GPL side of the equation. As in, shipping a kernel with the ZoL functionality is a violation of the GPL, not the CDDL. Thus, Oracle would not be able to sue Canonical (Edit: or, rather, have any reasonable expectation of winning this battle), as they have no standing. A copyright holder of some materially significant portion of the GPL code of the kernel would have to sue Canonical for breaching the GPL by including CDDL code.
I am not a lawyer.
How many years has it been since Ubuntu started shipping ZFS, purportedly in violation of whatever legal fears the kernel team has? Four years? Five years?
I obviously have nothing like inside knowledge, but I assume the reason there have not been lawsuits over this, is that whoever could bring one (would it be only Oracle?) expects there are even-odds that they would lose? Thus the risk of setting an adverse precedent isn't worth the damages they might be awarded from suing Canonical?
The legal issues between Linux kernel and ZFS are that Linux license does not allow incorporating licenses with more restrictions - including anything that puts protections against being sued for patented code contributed by license giver.
I am aware of that. I did a bad job phrasing my post, and it came off sounding more confident than I actually intended. I have two questions: (1) What are the expected consequences of a violation? (2) Why haven't any consequences occurred yet?
My understanding is that Canonical is shipping ZFS with Ubuntu. Or do I misunderstand? Has Canonical not actually done the big, bad thing of distributing the Linux kernel with ZFS? Did they find some clever just-so workaround so as to technically not be violation of the Linux kernel's license terms?
Otherwise, if Canonical has actually done the big, bad thing, who has standing to bring suit? Would the Linux Foundation sue Canonical, or would Oracle?
I ask this in all humility, and I suspect there is a chance that my questions are nonsense and I don't know enough to know why.
It could be a long term strategy by Oracle to be able to sue IBM and other big companies distributing Linux with ZFS built in. If Oracle want people to use ZFS they can just relicense the code they have copyright on.
Oracle does not have copyright on OpenZFS code - only on the version in Solaris.
The code in OpenZFS and Solaris has diverged after Oracle closed OpenSolaris.
> The code in OpenZFS and Solaris has diverged after Oracle closed OpenSolaris.
Diverged. Not rewritten entirely.
Sure, but Oracle cannot retroactively relicense the code already published before then. The cat's already out of the bag, and as long as the code from before the fork is used according to the original license, it's legal.
If Linus has never touched ZFS that's not plausible deniability. That's actual deniability.
The most plausible deniability!
Oracle lawyers know how to spell ZFS.
But Sun ensured that they can only gnash their teeth.
The source of "license incompatibility" btw is the same as from using GPLv3 code in kernel - CDDL adds an extra restriction in form of patent protections (just like Apache 2)
To me, ZFS on Linux is extremely uninteresting except for the specific use case of a NAS with a bunch of drives. I don't want to deal with out-of-tree filesystems unless I absolutely have to. And even on a NAS, I would want the root partition to be ext4 or btrfs or something else that's in the kernel.
> the specific use case of a NAS with a bunch of drives
Aka a way bigger part of the industry than it should probably still be ;)
> works great.
I will not use or recommend ZFS on _any_ OS until they solve the double page cache problem. A filesystem has no business running its own damned page cache that duplicates the OS one. I don't give a damn if ZFS has a fancy eviction algorithm. ARC's patent is expired. Go port it to mainline Linux if it's not that good. Just don't make inner platform.
That’s such a weird hill to die on. It’s like refusing to drive a car because it uses head bolts instead of head studs in an engine.
Suse Linux Enterprise still uses Btrfs as the Root-FS, so it can't be that bad, right? What is Chris Mason actually doing these days? I did some googling and only found out that he was working on a tool called "rsched".
I used btrfs a few years ago, on OpenSUSE, because I also thought that would work, and it was on a single disk. It lost my root filesystem twice.
btrfs is fine for single disks or mirrors. In my experience, the main advantages of zfs over btrfs is that ZFS has production ready raid5/6 like parity modes and has much better performance for small sync writes, which are common for databases and hosting VM images.
Context: I mostly dealt with RAID1 in a home NAS setup
A ZFS pool will remain available even in degraded mode, and correct me if I'm wrong but with BTRFS you mount the array through one of the volume that is part of the array and not the array itself.. so if that specific mounted volume happens to go down, the array becomes unavailable unmounted until you remount another available volume that is part of the array which isn't great for availability.
I thought about mitigating that by making an mdadm RAID1 formatted with BTRFS and mount the virtual volume instwad, but then you lose the ability to prevent bit rot, since BTRFS lose that visibility if it doesn't manage the array natively.
> has much better performance for small sync writes
I spent some time researching this topic, and in all benchmarks I've seen and my personal tests btrfs is faster or much faster: https://www.reddit.com/r/zfs/comments/1i3yjpt/very_poor_perf...
Thanks for sharing! I just setup a fs benchmark system and I'll run your fio command so we can compare results. I have a question about your fio args though. I think "--ioengine=sync" and "--iodepth=16" are incompatible, in the sense that iodepth will only be 1.
"Note that increasing iodepth beyond 1 will not affect synchronous ioengines"[1]
Is there a reason you used that ioengine as opposed to, for example, "libaio" with a "--direct=1" flag?
[1] https://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-...
Intuition is that majority of software uses standard sync FS api..
We use OpenSuSE and I always switch the installs to ext4. No fancy features, but always works, doesn't lose my root fs.
This isn’t BTRFS
This might not be directly about btrfs but bcachefs zfs and btrfs are the only filesystems for Linux that provide modern features like transparent compression, snapshots, and CoW.
zfs is out of tree leaving it as an unviable option for many people. This news means that bcachefs is going to be in a very weird state in-kernel, which leaves only btrfs as the only other in-tree ‘modern’ filesystem.
This news about bcachefs has ramifications about the state of ‘modern’ FSes in Linux, and I’d say this news about the btrfs maintainer taking a step back is related to this.
Meh. This war was stale like nine years ago. At this point the originally-beaten horse has decomposed into soil. My general reply to this is:
1. The dm layer gives you cow/snapshots for any filesystem you want already and has for more than a decade. Some implementations actually use it for clever trickery like updates, even. Anyone who has software requirements in this space (as distinct from "wants to yell on the internet about it") is very well served.
2. Compression seems silly in the modern world. Virtually everything is already compressed. To first approximation, every byte in persistent storage anywhere in the world is in a lossy media format. And the ones that aren't are in some other cooked format. The only workloads where you see significant use of losslessly-compressible data are in situations (databases) where you have app-managed storage performance (and who see little value from filesystem choice) or ones (software building, data science, ML training) where there's lots of ephemeral intermediate files being produced. And again those are usages where fancy filesystems are poorly deployed, you're going to throw it all away within hours to days anyway.
Filesystems are a solved problem. If ZFS disappeared from the world today... really who would even care? Only those of us still around trying to shout on the internet.
For me bcachefs provides a feature no other filesystem on Linux has: automated tiered storage. I've wanted this ever since I got an SSD more than 10 years ago, but filesystems move slow.
A block level cache like bcache (not fs) and dm-cache handles it less ideally, and doesn't leave the SSD space as usable space. As a home user, 2TB of SSDs is 2TB of space I'd rather have. ZFS's ZIL is similar, not leaving it as usable space. Btrfs has some recent work in differentiating drives to store metadata on the faster drives (allocator hints), but that only does metadata as there is no handling of moving data to HDDs over time. Even Microsoft's ReFS does tiered storage I believe.
I just want to have 1 or 2 SSDs, with 1 or 2 HDDs in a single filesystem that gets the advantages of SSDs with recently used files and new writes, and moves all the LRU files to the HDDs. And probably keep all the metadata on the SSDs too.
> automated tiered storage. I've wanted this ever since I got an SSD more than 10 years ago, but filesystems move slow.
You were not alone. However, things changed, namely SSD continued to become cheaper and grew in capacity. I'd think most active data is these days on SSDs (certainly in most desktops, most servers which aren't explicit file or DB servers and all mobile and embedded devices), the role of spinning rust being more and more archiving (if found in a system at all).
> Compression seems silly in the modern world. Virtually everything is already compressed.
IIRC my laptop's zpool has a 1.2x compression ratio; it's worth doing. At a previous job, we had over a petabyte of postgres on ZFS and saved real money with compression. Hilariously, on some servers we also improved performance because ZFS could decompress reads faster than the disk could read.
> we also improved performance because ZFS could decompress reads faster than the disk could read
This is my favorite side effect of compression in the right scenarios. I remember getting a huge speed up in a proprietary in-memory data structure by using LZO (or one of those fast algorithms) which outperformed memcpy, and this was already in memory so no disk io involved! And used less than a third of the memory.
The performance gain from compression (replacing IO with compute) is not ironic, it was seen as a feature for the various NAS that Sun (and after them Oracle) developped around ZFS.
How do you get a PostgreSQL database to grow to one petabyte? The maximum table size is 32 TB o_O
Cumulative; dozens of machines with a combined database size over a PB even though each box only had like 20 TB.
Probably by using partitioning.
I know my own personal anecdote isn’t much, but I’ve noticed pretty good space savings on the order of like 100 GB from zstd compression and CoW on my personal disks with btrfs
As for the snapshots, things like LVM snapshots are pretty coarse, especially for someone like me where I run dm-crypt on top of LVM
I’d say zfs would be pretty well missed with its data integrity features. I’ve heard that btrfs is worse in that aspect, so given that btrfs saved my bacon with a dying ssd, I can only imagine what zfs does.
> And the ones that aren't are in some other cooked format.
Maybe, if you never create anything. I make a lot of game art source and much of that is in uncompressed formats. Like blend files, obj files, even DDS can compress, depending on the format and data, due to the mip maps inside them. Without FS compression it would be using GBs more space.
I'm not going to individually go through and micromanage file compression even with a tool. What a waste of time, let the FS do it.
> Filesystems are a solved problem. If ZFS disappeared from the world today... really who would even care? Only those of us still around trying to shout on the internet.
Yeah nah, have you tried processing terabytes of data every day and storing them? It gets better now with DDR5 but bit flips do actually happen.
And once more, you're positing the lack of a feature that is available and very robust (c.f. "yell on the internet" vs. "discuss solutions to a problem"). You don't need your filesystem to integrate checksumming when dm/lvm already do it for you.
> You don't need your filesystem to integrate checksumming when dm/lvm already do it for you.
https://wiki.archlinux.org/title/Dm-integrity
> It uses journaling for guaranteeing write atomicity by default, which effectively halves the write speed
I'd really rather not do that, thanks.
i'm not one for internet arguments and really just want solutions. maybe you could point me at the details for a setup that worked for you?
based on my own testing, dm has a lot of footguns and, with some kernels, as little as 100 bytes of corruption to the underlying disk could render a dm-integrity volume completely unusable (requiring a full rebuild) https://github.com/khimaros/raid-explorations
Well the intention of the integrity things is to preserve integrity that is an explicit choice, in particular for encrypted data. You definitely need a backup strategy.
Bit flips can happen, and if it’s a problem you should have additional verification above the filesystem layer, even if using ZFS.
And maybe below it.
And backups.
Backups make a lot of this minor.
Backups are great, but don't help much if you backup corrupted data.
You can certainly add verification above and below your filesystem, but the filesystem seems like a good layer to have verification. Capturing a checksum while writing and verifying it while reading seems appropriate; zfs scrub is a convenient way to check everything on a regular basis. Personally, my data feels important enough to make that level of effort, but not important enough to do anything else.
FWIW, framed the way you do, I'd say the block device layer would be an *even better* place for that validation, no?
> Personally, my data feels important enough to make that level of effort, but not important enough to do anything else.
OMG. Backups! You need backups! Worry about polishing your geek cred once your data is on physically separate storage. Seriously, this is not a technology choice problem. Go to Amazon and buy an exfat stick, whatever. By far the most important thing you're ever going to do for your data is Back. It. Up.
Filesystem choice is, and I repeat, very much a yell-on-the-internet kind of thing. It makes you feel smart on HN. Backups to junky Chinese flash sticks are what are going to save you from losing data.
I apprechiate the argument. I do have backups. Zfs makes it easy to send snapshots and so I do.
But I don't usually verify the backups, so there's that. And everything is in the same zip code for the most part, so one big disaster and I'll lose everything. C'est la vie.
What good is a backup if you can't restore it?
Well, I expect that I can restore it, and that expectation has been good enough thus far. :p
Ok I think you're making a well-considered and interesting argument about devicemapper vs. feature-ful filesystems but you're also kind of personalizing this a bit. I want to read more technical stuff on this thread and less about geek cred and yelling. :)
I wouldn't comment but I feel like I'm naturally on your side of the argument and want to see it articulated well.
I didn't really think it was that bad? But sure, point taken.
My goal was actually the same though: to try to short-circuit the inevitable platform flame by calling it out explicitly and pointing out that the technical details are sort of a solved problem.
ZFS argumentation gets exhausting, and has ever since it was released. It ends up as a proxy for Sun vs. Linux, GNU vs. BSD, Apple vs. Google, hippy free software vs. corporate open source, pick your side. Everyone has an opinion, everyone thinks it's crucially important, and as a result of that hyperbole everyone ends up thinking that ZFS (dtrace gets a lot of the same treatment) is some kind of magically irreplaceable technology.
And... it's really not. Like I said above if it disappeared from the universe and everyone had to use dm/lvm for the actual problems they need to solve with storage management[1], no one would really care.
[1] Itself an increasingly vanishing problem area! I mean, at scale and at the performance limit, virtually everything lives behind a cloud-adjacent API barrier these days, and the backends there worry much more about driver and hardware complexity than they do about mere "filesystems". Dithering about individual files on individual systems in the professional world is mostly limited to optimizing boot and update time on client OSes. And outside the professional world it's a bunch of us nerds trying to optimize our movie collections on local networks; realistically we could be doing that on something as awful NTFS if we had to.
How can I, with dm/lvm:
* For some detected corruption, be told directly which files are affected?
* Get filesystem level snapshots that are guaranteed to be consistent in the way ZFS and CephFS snapshots guarantee?
On urging from tptacek I'll take that seriously and not as flame:
1. This is misunderstanding how device corruption works. It's not and can't ever be limited to "files". (Among other things: you can lose whole trees if a directory gets clobbered, you'd never even be able to enumerate the "corrupted files" at all!). All you know (all you can know) is that you got a success and that means the relevant data and metadata matched the checksums computed at write time. And that property is no different with dm. But if you want to know a subset of the damage just read the stderr from tar, or your kernel logs, etc...
2. Metadata robustness in the face of inconsistent updates (e.g. power loss!) is a feature provided by all modern filesystems, and ZFS is no more or less robust than ext4 et. al. But all such filesystems (ZFS included) will "lose data" that hadn't been fully flushed. Applications that are sensitive to that sort of thing must (!) handle this by having some level of "transaction" checkpointing (i.e. a fsync call). ZFS does absolutely nothing to fix this for you. What is true is that an unsynchronized snapshot looks like "power loss" at the dm level where it doesn't in ZFS. But... that's not useful for anyone that actually cares about data integrity, because you still have to solve the power loss problem. And solving the power loss problem obviates the need for ZFS.
1 - you absolutely can and should walk reverse mappings in the filesystem so that from a corrupt block you can tell the user which file was corrupted.
In the future bcachefs will be rolling out auxiliary dirent indices for a variety of purposes, and one of those will be to give you a list of files that have had errors detected by e.g. scrub (we already generally tell you the affected filename in error messages)
2 - No, metadata robustness absolutely varies across filesystems.
From what I've seen, ext4 and bcachefs are the gold standard here; both can recover from basically arbitrary corruption and have no single points of failure.
Other filesystems do have single points of failure (notably btree roots), and btrfs and I believe ZFS are painfully vulnerable to devices with broken flush handling. You can blame (and should) blame the device and the shitty manufacturers, but from the perspective of a filesystem developer, we should be able to cope with that without losing the entire filesystem.
XFS is quite a bit better than btrfs, and I believe ZFS, because they have a ton of ways to reconstruct from redundant metadata if they lose a btree root, but it's still possible to lose the entire filesystem if you're very, very unlucky.
On a modern filesystem that uses b-trees, you really need a way of repairing from lost b-tree roots if you want your filesystem to be bulletproof. btrfs has 'dup' mode, but that doesn't mean much on SSDs given that you have no control over whether your replicas get written to the same erase unit.
Reiserfs actually had the right idea - btree node scan, and reconstruct your interior nodes if necessary. But they gave that approach a bad name; for a long time it was a crutch for a buggy b-tree implementation, and they didn't seed a filesystem specific UUID into the btree node magic number like bcachefs does, so it could famously merge a filesystem from a disk image with the host filesystem.
bcachefs got that part right, and also has per-device bitmaps in the superblock for 'this range of the device has btree nodes' so it's actually practical even if you've got a massive filesystem on spinning rust - and it was introduced long after the b-tree implementation was widely deployed and bulletproof.
> XFS is quite a bit better than btrfs, and I believe ZFS, because they have a ton of ways to reconstruct from redundant metadata if they lose a btree root
As I understand it ZFS also has a lot of redundant metatdata (copies=3 on anything important), and also previous uberblocks[1].
In what way is XFS better? Genuine question, not really familiar with XFS.
[1]: https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSMetadata...
I can't speak with any authority on ZFS, I know its structure the least out of all the major filesystems.
I do a ton of reading through forums gathering user input, and lots of people chime in with stories of lost filesystems. I've seen reports of lost filesystems with ZFS and I want to say I've seen them at around the same frequency of XFS; both are very rare.
My concern with ZFS is that they seem to have taken the same "no traditional fsck" approach as btrfs, favoring entirely online repair. That's obviously where we all want to be, but that's very hard to get right, and it's been my experience that if you prioritize that too much you miss the "disaster recovery" scenarios, and that seems to be what's happened with ZFS; I've read that if your ZFS filesystem is toast you need to send it to a data recovery service.
That's not something I would consider acceptable, fsck ought to be able to do anything a data recovery service would do, and for bcachefs it does.
I know the XFS folks have put a ton of outright paranoia into repair, including full on disaster recovery scenarios. It can't repair in scenarios where bcachefs can - but on the other hand, XFS has tricks that bcachefs doesn't, so I can't call bcachefs unequivocally better; we'd need to wait for more widespread usage and a lot more data.
The lack of traditional 'fsck' is because its operation would be exact same as normal driver operation. The most extreme case involves a very obscure option that lets you explicitly rewind transactions to one you specify, which I've seen used to recover a broken driver upgrade that led to filesystem corruption in ways that most FSCK just barf on, including XFS'
For low-level meddling and recovery, there's a filesystem debugger that understands all parts of ZFS and can help for example identifying previous uberblock that is uncorrupted, or recovering specific data, etc.
Rewinding transactions is cool. Bcachefs has that too :)
What happens on ZFS if you lose all your alloc info? Or are there other single points of failure besides the ublock in the on disk format?
> What happens on ZFS if you lose all your alloc info?
According to this[1] old issue, it hasn't happened frequently enough to prioritize implementing a rebuild option, however one should be able to import the pool read-only and zfs send it to a different pool.
As far as I can tell that's status quo. I agree it is something that should be implemented at some point.
That said, certain other spacemap errors might be recoverable[2].
[1]: https://github.com/openzfs/zfs/issues/3210
[2]: https://github.com/openzfs/zfs/issues/13483#issuecomment-120...
One feature I like about ZFS and have not seen elsewhere is that you can have each filesystem within the pool use its own encryption keys but more importantly all of the pool's data integrity and maintenance protection (scrubs, migrations, etc) work with filesystems in their encrypted state. So you can boot up the full system and then unlock and access projects only as needed.
The dm stuff is one key for the entire partition and you can't check it for bitrot or repair it without the key.
The other thing dm/lvm gives you is dogshit performance
> The dm layer gives you cow/snapshots for any filesystem you want already and has for more than a decade. Some implementations actually use it for clever trickery like updates, even.
O_o
Apparently I've been living under a rock, can you please show us a link about this? I was just recently (casually) looking into bolting ZFS/BTRFS-like partial snapshot features to simulate my own atomic distro where I am able to freely roll back if an update goes bad. Think Linux's Timeshift with something little extra.
There are downsides to adding features in layers, as opposed to integrating them with the FS, but dm can do quite a lot:
https://docs.kernel.org/admin-guide/device-mapper/snapshot.h...
DM has targets that facilitate block-level snapshots, lazy cloning of filesystems, compression, &c. Most people interact with those features through LVM2. COW snapshots are basically the marquee feature of LVM2.
Btrfs is the closest in-tree bcachefs alternative.
Does btrfs still eat your data if you try to use its included RAID featureset? Does it still break in a major way if you're close to running out of disk space? What I'm seeing is that most major Linux distributions still default to non-btrfs options for their default install, generally ext4.
Anecdotal but btrfs is the only filesystem I've lost data with (and it wasn't in a RAID configuration). That combined with the btrfs tools being the most aggressively bad management utilities out there* ensure that I'm staying with ext4/xfs/zfs for now.
*Coming from the extremely well thought out and documented zfs utilities to btrfs will have you wondering wtf fairly frequently while you learn your way around.
I just use ZFS. Canonical ships it and that's good enough for me on my personal machines.
Since the existing bcachefs driver will not be removed, and the problem is the bcachefs developer not following the rules, I wonder if someone else could take on the role of pulling bcachefs changes into the mainline, while also following the merge window rules.
No, the problem wasn't following the rules.
The patch that kicked off the current conflict was the 'journal_rewind' patch; we recently (6.15) had the worst bug in the entire history upstream - it was taking out entire subvolumes.
The third report got me a metadata dump with everything I needed to debug the issue, thank god, and now we have a great deal of hardening to ensure a bug like this can never happen again. Subsequently, I wrote new repair code, which fully restored the filesystem of the 3rd user hit by the bug (first two had backups).
Linus then flipped out because it was listed as a 'feature' in the pull request; it was only listed that way to make sure that users would know about it if they were affected by the original bug and needed it. Failure to maintain your data is always a bug for a filesystem, and repair code is a bugfix.
In the private maintainer thread, and even in public, things went completely off the rails, with Linus and Ted basically asserting that they knew better than I do which bcachefs patches are regression risks (seriously), and a page and a half rant from Linus on how he doesn't trust my judgement, and a whole lot more.
There have been many repeated arguments like this over bugfixes.
The thing is, since then I started perusing pull requests from other subsystems, and it looks like I've actually been more conservative with what I consider a critical bugfix (and send outside the merge window) than other subsystems. The _only_ thing that's been out of the ordinary with bcachefs has been the volume of bugfixes - but that's exactly what you'd expect to see from a new filesystem that's stabilizing rapidly and closing out user bug reports - high volume of pure bugfixing is exactly what you want to see.
So given that, I don't think having a go-between would solve anything.
It's so sad to see an excellent engineer such as yourself, building what seems like an excellent filesystem that has the potential to be better than everything else available for Linux for many use cases, completely fail to achieve your goals because you lack the people skills to navigate working as a part of a team under a technical leader. Every comment and e-mail I've seen from you has demonstrated an impressive lack of understanding with regard to why you're being treated as you are.
You don't have to agree with all other maintainers on everything, but if you're working on Linux (or any other major project that's owned, run and developed by other people), you need to have the people skills to at a minimum avoid pissing everyone else off. Or you need to delegate the communication work to someone with those skills. It's a shame you don't.
I get a ton of comments like this.
Pointing the finger at the skills I lack and my inability, while ignoring the wider picture, of the kernel burning out maintainers and not doing well on filesystems.
It's wearying.
You get a ton of comments like this because it's true. There are real problems in the kernel, I've seen how hostile it can be to people who are just trying to do the right thing and upstream their changes etc. But your case isn't that. Your behavior would get you in trouble at any job where you have to follow rules set by other people. Your refusal to treat your part of the kernel as anything other than your personal pet project has destroyed your project's potential.
If this was a month or two ago, I would've written something vaguely optimistic here about how you could still turn this around somehow, about what lessons you could learn and move forward with. But that ship has sailed. Your project is no longer the promising next generation filesystem which could replace ext4 as the default choice. Your role is now that of the developer of some small out-of-tree filesystem for a small group of especially interested users. Nobody wanted this for you, including myself. But you have refused to listen to anyone's advice, so now you're here.
Kent has gotten this same feedback across practically every single platform that has discussed his issues. He is unable to take critique and will instead just continue to argue and be combative, therefore proving yet again why he is in this situation in the first place
That's because it is my project, and my responsibility.
I can't be bowing to the demands of any one person; I have to balance the wants and needs of everyone and prioritize shipping something that works above all else.
Repeatedly we've seen that those priorities are not shared, unfortunately.
Arguments are just as heated as they ever were, but now instead of arguing over the actual issues - does this work, are we doing this right - people jump to arguing over language and conduct and demanding apologies or calling for people to be expelled.
But my core mission is just shipping a reliable trustworthy filesystem, and that's what I'm going to stick to.
> I can't be bowing to the demands of any one person
This right here is the core of the issue. When you're working as a part of a larger organizational structure, you have to bow down to your boss. When your software is a part of the kernel, it's not your project anymore; it's just one part of Linus's project. You're a contributor, not a leader. Just like I would not control Bcachefs's development process even if I contributed some small but important part to it, you do not control Linux's development process even though you contributed some small but important part to it.
Your core mission is evidently not shipping a reliable trustworthy filesystem. You say that, but your actions speak louder than your words. You know just as well as I do that a filesystem being in-tree rather than out-of-tree makes it significantly more reliable and trustworthy, which is why you chose to get Bcachefs merged into the kernel in the first place. Instead of working within the well-defined boundaries that's necessary to keep Bcachefs in the kernel, you've repeatedly pushed against those boundaries, belittled fellow maintainers, and in general worked hard to make yourself a persona non grata within the kernel community. The predictable outcome is that continued development of Bcachefs will have to happen out-of-tree, and your users won't gain the major reliability and trustworthiness benefits of using an in-tree filesystem. People will warn against using Bcachefs as their root filesystem, since every kernel upgrade will now carry some risk that DKMS or whatever mechanism is used to install the out-of-tree Bcachefs kernel module doesn't work with the new kernel.
And, to be honest, it doesn't matter whether or not you're "right" or "wrong" here. Maybe you're completely correct about absolutely everything and Linus, Greg, Ted, Miguel, Sasha, Josef, and everyone else involved are stupid and don't understand what it takes to develop reliable software. So what? They're your colleagues, some of them are your bosses. Everyone on Hacker News could take your side here and think you've been mistreated, it doesn't help. You'd still be thrown out of the kernel. You'd still be failing your users by not maintaining a good enough relationship with your colleagues and bosses to stay in-tree. You could be completely right on every technical matter and it does not matter.
If you play your cards right, you could maybe end up in a situation where you run the Bcachefs project entirely out-of-tree, with yourself as the supreme leader who doesn't bow down to the demands of anyone, with your own development and release process; and then someone else takes responsibility for pushing your code into the upstream kernel, following Linus's rules. They would dissect your releases and backport bug fixes while leaving out important features, in accordance with Linus's rules. Time will tell if you can find anyone to do that. And time will tell if you posess the humility necessary to let someone else ultimately control the experience most of your users will have.
Linus isn't my boss, though.
Am I paid by him or the Linux foundation? No.
Has he ever contributed to bcachefs in any way, is he in any way responsible for making sure that it works properly? No.
The only sense in which he has authority is that he can decide whether or not to pull it into his tree, but that's a two way relationship.
I have said my piece. It's not me you have to convince.
How about acknowledging you’ve been too sharp with words, apologizing, and attempting to move forward ?
I know other people in the kernel do the same mistake as you frequently do on mailing lists. But two wrongs do not make a right.
When everyone else is the problem, you're the problem.
Consider that when working in teams.
> minimum avoid pissing everyone else off
Which also, at times, means appeasing people even when you are confident that they are wrong because you need their cooperation in the future. In a large complicated system, being able to work together is often more important to the system's reliability, performance, etc. than being as right as possible.
Plus even when you're confident you are in the right you might still be in the wrong. After all, the people you are disagreeing with are also superbly competent and they believe they're in the right just as you do. There can be hills worth dying on, but they ought to be very rare.
> Which also, at times, means appeasing people even when you are confident that they are wrong because you need their cooperation in the future.
Being unwilling to follow basic QA processes in preparation of a release candidate, and then doubling down by attacking the release engineer with claims the QA process doesn't apply to you because you know better, is something that is far more serious than lacking basic soft skills. It's a fireable offense in most companies.
> It's a fireable offense in most companies.
In a company there are other employees who have your success as part of their job function. People to train you, to talk you down off a ledge, people to step in and guard you against misunderstanding or criticisms. People to advocate for you or send you home before a dispute crosses a point of no return. You're also paid to be there, to put up with the companies BS, .. the project isn't yours, it's not usually your reputation that's hurt when the company wants to make a decision you don't agree with and it goes poorly.
The context is so different, I don't think it's really comparable.
> In a company there are other employees who have your success as part of their job function.
Yes, and they enforce basic relase processes to ensure you don't break releases by skipping QA processes or introducing untested and unverified features in release candidates.
And you sure as hell don't have primadonna developers stay in the payroll for long if they start throwing tantrums and personal attacks towards fellow engineers when they are asked to follow the release process or are called out for trying to sneak untested changes in mission-critical components.
Exactly. An extremely important part of working in some hierarchical organizational structure, be that as a Linux kernel developer or as an employee at a company, is the ability to disagree with a superior's decision yet acquiesce and go along with it. Good organizations leave room for disagreement, but there always comes a point where someone in a leadership position has made a final decision and the time for debate is over.
To list down the current state of things:
1. Regardless of whether correct or not, it's Linus that decides what's a feature and what's not in Linux. Like he has for the last however many decades. Repair code is a feature if Linus says it is a feature.
2. Being correct comes second to being agreeable in human-human interactions. For example, dunking on x file system does not work as a defense when the person opposite you is a x file system maintainer.
3. rules are rules, and generally don't have to be "correct" to be enforced in an organization
I think your perceived "unfairness" might make sense if you just thought of these things as un-workaroundable constraints, Just like the fact that SSDs wear out over time.
When rules and authority start to take precedence over making sure things work, things have gone off the rails and we're not doing engineering anymore.
> When rules and authority start to take precedence over making sure things work, (...)
Didn't Linus lambast you for "lack of testing and collaboration before submitting patches", to the point the patches you were trying to push weren't even building?
https://ostechnix.com/linus-torvalds-expresses-frustration-w...
Linus has broken the build more recently than I have. (In the time since bcachefs went upstream, we've both done that once, that I've seen).
Linus doesn't seem to believe in automated testing. He just seems to think that there's no way I could QA code as quickly as I do, but that's because I've invested heavily in automated testing and building up a community of people doing very good testing and QA work; bcachefs's automated testing is the best of any upstream filesystem that I've seen (there's a whole cluster of machines dedicated to this), and I have people running my latest branch on a daily basis.
Nearly all of the collaboration just happens on IRC.
For big changes I wait for explicit acks from testers that they've ran it and things look good; a lot of people read and review my code too, it's just typically less formal than the rest of the kernel.
> Linus has broken the build more recently than I have.
Even taking your claims at face value (which from this thread alone is a heck of a leap) I'm baffled by the way you believe this holds any relevance.
I mean, the kernel project has in place a quality assurance process designed to minimize the odds of introducing problems when preparing a release. You were caught purposely ignoring any QA process in place and trying to circumvent the whole quality assurance process and sneak into a RC features that were untested and unverified.
There is a QA process, and you purposely decided to ignore it and plow away. And then your best argument for purposely ignoring any semblance of QA is that others may or may not have broken a build before?
Come on, man. You know better than this. How desperate are you to avoid any accountability to pull these gaslighting stunts?
Please, tell us about these wonderful QA processes the kernel has.
Yeah but you don’t get to make the calls. Linus does and your “well kernel daddy does it too” and “actually I’m doing it better than my critics understand” don’t play well with the kernel daddy (or really any bdfl). Do you not see your comment as dismissive?
All your comments are dismissive of the criticisms so far and you’re shrugging your shoulders as to why.
It’s great you’re able to reason and defend yourself but Linux as a whole is larger than you and refusing to submit to their ways will make technology move no where.
Collaborative projects don't work on pure engineering. There are significant resource management components that basically amount to therapy, psychiatry, and side show entertainment because the most critical resources are human minds.
Excellent engineering management largely isolates engineers from having to deal with this non-engineering stuff (except for the subset that is specifically for their own personal benefit)-- but open source tends to radically flatten organizations that produce software, such that every contributor must also be their own manager to a great degree.
In a well run project you don't necessarily have to be good at or even interested in all the more socially oriented components of the project organization. But if you're not you must be willing to let someone else handle that stuff and go along with their judgements even if they seem suboptimal from the narrower perspective you've adopted. If you can't then from a "collaborative development as a system" view you're a faulty component that doesn't provide the right interface for the system's requirements (and are gonna get removed!). :)
Another way to look at it is that it would be ideal if every technical element were optimal at all times. In small systems with well understood requirements this can be possible or at least close to possible. But in big complex and poorly scoped systems it's just not possible: We have imperfect information, there are conflicting requirements, we have finite time, and so on. The system as a whole will always be far from perfect. If anyone tried to make it all perfect it would just fail to make progress, deadlock, or otherwise. The management of the project is always trying to balance the imperfections. They know that their decisions are often making things worse for a local concern, but they do so with belief that over time the decisions result in a better system overall. Linux has a good reputation in large part due to a long history of making good decisions about the flaws to accept or even introduce, which issues to gloss over vs debate to death.
There are import differences between small scale (individual or a few people) engineering and larger scale engineering.
For many humans to work together over time on something very complex is hard. Structure and process are required. And sometimes they come at the expense of what some might call “pure” engineering. But they are the right trade offs to optimize for the actual goal.
If you can’t accept that, stick to solo projects.
I think this attitude is exactly why this happened. I would have done the same thing.
Do you argue with your school teachers that your book report shouldn't be due on Friday because it's not perfect yet?
I read several of your response threads across different websites. The most interesting to me was LWN, about the debian tools, where an actual psychologist got involved.
All the discussions seem to show the same issue: You disagree with policies held by people higher up than you, and you struggle with respecting their decisions and moving on.
Instead you keep arguing about things you can't change, and that leads people to getting frustrated and walking away from you.
It really doesn't matter how "right" you may be... not your circus, not your monkeys.
> All the discussions seem to show the same issue: You disagree with policies held by people higher up than you, and you struggle with respecting their decisions and moving on.
I think it's less subtle than that. The straw that broke the camel's back was quite literally abuse towards other kernel developers.
https://lwn.net/Articles/999197/
You might want to read the full story on that one.
> You might want to read the full story on that one.
I read the full story. Everyone else can do the same. Somehow it seems you opt to skip it and prefer to be deeply invested in creating an alternative reality.
I think that abuse falls under "struggling" to respect their decisions, but yes I agree that was a big part of it.
Your analogy fails to account that after "Friday" bug fixes are still allowed. A file system losing your files sounds like a bug to me.
Edit since you expanded your post:
>The most interesting to me was LWN, about the debian tools, where an actual psychologist got involved.
To me the comment was patronizing implying it was purely due to bad communication from Kent's end and shows how immature people are with running these operating system are. Putting priority on processes over the end user.
>respecting their decisions and moving on.
When this causes real pain for end users. It's validating that the decision was wrong.
> really doesn't matter how "right" you may be... not your circus
It does because it causes reputational damage for bcachefs. Even beyond reputational damage, delivering a good product to end users should be a priority. In my opinion projects as big as Debian causing harm to users should be called out instead of ignored. Else it can lead to practices like replacing dependencies out from underneath programs to become standard practice.
You still seem to be arguing that, shipping the change was the "right" thing to do. But that's not what's in dispute. Rather it is that, if what you think is right and what the person who makes the rules thinks is right are in disagreement, the adult thing to do is not to simply disregard the rules (and certainly not repeatedly, after being warned not to).
This is the difference between being smart and being wise. If the goal of all this grandstanding was that, it's so incredibly and vitally important for these patches to get into the kernel, well guess what, now due to all this drama this part of the kernel is going to go unmaintained entirely. Is that good for the users? Did that help our stated goal in any way? No.
>the adult thing to do is not to simply disregard the rules
The adult thing is to do best by the users. Critical file system bugs are worth blocking the release of any serious operating system in the real world as there is serious user impact.
>Is that good for the users?
I think it's complicated. It could allow for a faster release schedule for bug fixes which can allow for addressing file system issues faster.
> The adult thing is to do best by the users
Best by users in the long term is predictable processes. "RC = pure bug fixes" is a battle tested, dependable rule, absence of which causes chaos.
> Critical file system bugs are worth blocking the release
"Experimental" label EXACTLY to prevent this stuff from blocking release. Do you not know that bcachefs is experimental? This is an example of another rule which helps predictability.
This was a bug fix. My point is that there will always be bugs in the kernel so not all bugs are worth blocking a release, but losing data is worth blocking the release for.
>"Experimental" label EXACTLY to prevent this stuff from blocking release
In practice bcachefs is used in production with real users. If the experimental label prevents critical bug fixes from making it into the kernel then it would be better to just remove that label.
> In practice bcachefs is used in production with real users. If the experimental label prevents critical bug fixes from making it into the kernel then it would be better to just remove that label.
alternative perspective: those users have knowingly and willingly put experimental software into production. it was their choice, they were informed of the risk and so the consequences and responsibility are their’s.
it’s like signing up to take some experimental medicine, and then complaining no-one told me about the side-effect of persistent headaches.
that doesn’t stop anyone from being user-centric in their approach, e.g. call me if you notice any symptoms and i’ll come round your house to examine you.
… as long as everyone is clear about the fact it is experimental and the boundaries/limitations that apply, e.g. there will be certain persistent headache medicines that cannot be prescribed to you, or it might take longer for them to work because you’re on an experimental medicine.
Again: the elephant in the room is that a lot of bcachefs users are using it explicitly because they have lost a lot of data on btrfs, and they've found it to be more trustworthy.
This puts us all in a shitty situation. I want the experimental label to come off at the right time - when every critical bug is fixed and it's as trustworthy as I can reasonably make it, when I know according to the data I have that everyone is going to have a good experience - but I have real users who need this thing and need to be supported.
There is _no reason_ to interpret the experimental label in the way that you're saying, you're advocating that reliability for the end user be deprioritized versus every other filesystem.
But deprioritizing reliability is what got us into this mess.
>users are using it explicitly because they have lost a lot of data on btrfs
PLEASE, honestly, EDUCATE THESE USERS. This is still marked experimental for numerous reasons regardless of the 'planned work for 6.18'. Users who can't suffer any data loss and are repeating their mistake of using btrfs shouldn't be using a none default/standard/hardened filesystem period.
No, really. People aren't losing data on bcachefs. We still have minor hiccups that do affect usability, and I put a lot of effort into educating users about where we're at and what to expect.
In the past I've often told people who wanted to migrate off of btrfs "check back in six months", but I'm not now because 6.16 is looking amazingly solid; all the data I have says that your data really is safer on bcachefs than btrfs.
I'm not advocating for people to jump from ext4/xfs/zfs, that needs more time.
> This was a bug fix.
I'm not sure exactly what you are talking about, and I'm not sure you do either. The discussion that preceded bcachefs to be dropped from the Linux kernel mainline involved an attempt to sneak a new features in RC, sidestepping testing and QA work, which was followed up by yet more egregious behavior from the mantainer.
https://www.phoronix.com/news/Linux-616-Bcachefs-Late-Featur...
>sneak a new features in RC
Too solve a bug with the filesystem that people in the wild were hitting. Like how Linus has said in the past with how there is a blurry line between security fixes and bug fixes. There is a blurry line between filesystem bugs and recovery features.
If you read the email it is clear that the full feature has more work needed and this is more of a basic implementation to address bugs that people hit in the wild.
> Too solve a bug with the filesystem that people in the wild were hitting.
So you acknowledge that this last episode involved trying to push new features into a RC.
As it was made abundantly clear, not only is the point of RC branches to only get tiny bugfixes after testing, the feature work that was presented was also untested and risked introducing major regressions.
All these red flags were repeatedly raised in the mailing list by multiple kernel maintainers. Somehow you're ignoring all the feedback and warnings and complains raised by people from Linux kernel maintainers, and instead you've opted to try to gaslight the thread.
No, I'm sorry but you're simply wrong.
bcachefs has a ton of QA, both automated testing and a lot of testers that run my latest and I work with on a daily basis. The patch was well tested; it was for codepaths that we have good regression tests for, it was algorithmically simple, and it worked perfectly to recover a filesystem from the original bug report, and it performed flawlessly again not long after.
I've explained my testing and QA on the lists multiple times.
You, like the other kernel maintainers in that thread, are making wild assertions despite having no involvement with the project.
> No, I'm sorry but you're simply wrong.
It sounds like you have a hard time coping with reality.
https://www.phoronix.com/news/Linux-616-Bcachefs-Late-Featur...
I repeat: it sounds an awful lot like you are trying to gaslight this thread. Not cool.
When this fact was again explicitly pointed out to you by Linus himself, you even tried to bullshit Linus and try to move the goalpost with absurd claims about how somehow it was ok to force untested and unreviewed features into a RC because somehow you know better about what users want or need as if it was some kind of justification for you to skip testing and proper release processes.
You need to set aside some time for introspection because you sound like you are your own worst enemy. And those you interact with seem to be fed up and had enough of these stunts.
The changes weren't untested or unreviewed, and they've performed flawlessly on quite a few occasions since then.
Sorry, the only person gaslighting here is you.
I don't think getting the FS kicked out of the kernel is best by the users.
Good engineering requires long term thinking.
There's more than bcachefs in the kernel. If dealing with bcachefs takes an inordinate amount of time and effort, dropping it is the right move.
I don't know the situation well enought to review where they drew the line, but there definitely should be a line somewhere.
That was my point exactly.
I'm convinced this account is an alt of koverstreet, possibly just to get around the posting delays.
You seem careful not to refer to any of his decisions as your own, but the writing style and inability to respect authority is still there.
You think I have an alt with higher karma than my actual account? :)
> Being correct comes second to being agreeable in human-human interactions
Prioritizing agreeableness above correctness is the reason the space shuttle Challenger blew up.
The bcachefs fracas is interesting and important because it's like a stain making some damn germ's organelles visible: it highlights a psychological division in tech and humanity in general between people who prioritize
1) deferring to authority, reading the room, knowing your place
and people who prioritize
2) insisting on your concept of excellence, standing up against a crowd, and speaking truth to power.
I am disturbed to see the weight position #1 has accumulated over the past decade or two. These people argue that Linus could be arbitrarily wrong and Overstreet arbitrarily right and it still wouldn't matter because being nice is critical to the success of a large scale project or something.
They get angry because they feel comfort in understanding their place in a social hierarchy. Attempts to upend that hierarchy in the name of what's right creates cognitive dissonance. The rule-followers feel a tension they can relieve only by ganging up and asserting "rules are rules and you need to follow them!" --- whether or not, at the object level, a) there are rules, b) the rules are beneficial, and c) whether the rules are applied consistently. a, b, and c are exactly those object-level does-the-o-ring-actually-work-when-cold considerations that the rule-following, rule-enforcing kind of person rejects in favor a reality built out of words and feelings, not works and facts.
They know it, too. They need Overstreet and other upstarts to fail: the failure legitimizes their own timid acquiescence to rules that make no sense. If other people are able to challenge rules and win, the #1 kind of person would have to ask himself serious and uncomfortable questions about what he's doing with his life.
It's easier and psychologically safer to just tear down anyone trying to do something new or different.
The thing is all technological progress depends on the #2 people winning in the end. As Feynmann talked about when diagnosing this exact phenomenon as the root cause of the Challenger disaster, mother nature (who appears to have taken on corrupting filesystems as a personal hobby of hers) does not care one bit about these word games or how nice someone is. The only thing that matters when solving a problem of technology is whether something works.
I think a lot of people in tech have entirely lost sight of this reality. I can't emphasize enough how absurd it is to state "[b]eing correct comes second to being agreeable in human-human interactions" and how dangerously anti-technology, anti-science, and-civilization, and anti-human this poison mindset is.
Ugh, this is a lot of words for nothing.
1. I laid down what I perceived as the state of things. The generalizations I drew from observing the system that is Linux development. Nowhere have I prescribed that kent "follow" my ideas. Simply that he can use these to try to understand the unfairness he feels.
2. Your anarcho-individualistic development ideas sound good in theory, but if they ever worked in practice we might have seen it be more widespread than it is today in team sizes > 3.
You should also note that if the oring is labelled experimental and there's an expectation of failure, it's development and testing will not stop the launch. The shuttle leaves when it leaves, it won't wait for the experimental oring to be done to your liking.
> Simply that he can use these to try to understand the unfairness he feels.
You're suggesting he deal with unfairness by internalizing it as virtue? That's how to make people who cheer at other people's failures.
> Your anarcho-individualistic development ideas sound good in theory
Thanks for illustrating my point. No project, >3 or <= 3, has ever made any new technology by adopting as a tenet that social agreement inside the project is more important than correctly modeling the world outside it, and you're suggesting I'm using inefficiently agreeable-sounding words to express it.
Thanks, I've been struggling to put this into words.
When you're working on the core technology we all depend on, correctness is not optional.
Linux is not correct. Linux has never been correct. Linux will never be correct. An incorrect belief that it is correct can only make it less correct.
You must know this when it comes to your own work. Why isn't bcachefs written in augmented rust with dependent types and formal correctness proofs for every line of code? How could there ever be a data losing bug if you had a formal proof that the file system could never lose data? Wouldn't that be more correct?
Turns out when some strong/broad notion of correctness isn't (practically) possible it is, in fact, very optional.
Good project management is all about managing resources and balancing tradeoffs. Sometimes this means making or allowing some things to be worse for the benefit of something else or in adherence to a process with a proven track record. Almost every choice makes something less correct than it could be-- with a goal of slowly inching towards a more perfect state overall in the long run.
It's also beneficial to rock the boat a bit at times, people can be wrong, processes can need improvement-- but there is a correct level, timing, and approach to achieve the best benefit. I expect that the kind of absolute approach you seem to have adopted in comments is unlikely to be successful at effective beneficial change.
You're staking out quite the postmodernist position there. All models are wrong, so who's to say that Alice's data corruption is worse than Bob's man page typo? The important thing is we stick to process with a proven track record, right?
I don't buy it. Object level considerations do matter. Alice's bug really is worse than Bob's. That "proven track record" shouldn't apply to Alice, and insisting that it does for the sake of process, in a way indifferent to the facts of the situation, is just a pretext for doing primate social hierarchy deference rituals in a situation in which they're producing a worse outcome and everyone knows it.
> Object level considerations do matter.
They do. And Kent expressed them and the linux kernel maintainers are amply qualified to hear out and make a call. I don't see a reason to think they were indifferent to the facts, they just weren't convinced by them. If they were they could have just said, "okay we think that this does qualify as a bugfix".
My understanding is the change in dispute wasn't over fixing the corruption introducing bug, but rather adding automated repair for cases where the corruption had already happened. I could easy see taking a position of "sad for people who are already corrupt, they can get their work around out of tree for now" (or heck, even forever depending on the scale of the impact).
Anyone who has been around for a while has seen their share of 'ate the horse to catch the spider to catch the fly to...' dance, of course the patch author is convinced that their repair is correct. They're almost always convinced of that or they don't submit it, so that carries little information. Because of this there is a strong preference for obviously minimal code in any kind of fix. Minimizing user suffering is important, but we also know every line of code comes with risk. The fact that the risk is not measurable on a case by case basis doesn't make it any less real.
Thanks for the thoughtful reply.
> I don't see a reason to think they were indifferent to the facts
I don't think the Linux people thought of themselves as indifferent to facts. Nor do I think they were, not at first. Most people imagine themselves as fair-minded truth-seekers. When stakes are low, they usually act like it. It's only under pressure that people reveal whether they're more committed to PR or progress.
The shitty thing about this situation is that as the dispute escalated, the technical merits of change faded from relevance. (Linus even pulled the corruption repair work in the end!) The argument transformed into a dispute over power, pride, and personalities. Linus's commitment to technical excellence was tested. It failed. Consequently, Linux will lack a cutting-edge filesystem.
I don't even object to Linus being BDFL of Linux. Somebody has to make decisions. I think Linus was wrong to reject the corruption fix patch, but he could plausibly have been right. He had an opportunity to explain his patch rejection in such a way that Overstreet would have understood it as final but also felt heard and valued. Overstreet would have been upset, and justifiably so, but by the next merge window both sides would have cooled down and progress would have resumed.
It's when Linus banned Overstreet and bcachefs from the project that he departed irrecoverably from defensibility. Linus might think he's punishing Overstreet for his intransigence by blocking his work, but Linus is actually taking his frustration out on every Linux user instead. Overstreet's ban is rooted in primate power psychology, not technical trade-offs, and it makes everyone lose.
Technical leaders who ostracize brilliant but difficult people forever cap the amount of progress we can make in the fight against the limits of nature. They're neglecting their responsibilities as leaders to harness difficult people. It's not an easy job, but being a leader shouldn't be.
Linus took the easy way out and banned the brilliant troublemaker. He should be ashamed.
> the risk is not measurable on a case by case basis
It often is. That's why when I'm on the Linus side of a case like this, I try to avoid saying "no" and instead say "yes, if". Sometimes my counterparty pulls out an "if" that convinces me.
> Prioritizing agreeableness above correctness is the reason the space shuttle Challenger blew up.
Oh dear lord no. That is not even what _any_ of the actual investigations suggested.
woke agreeableness is bad but it wasn't getting along at the water-cooler that lead to challenger.
Citation needed.
No.
you might end up with the best filesystem in the world that no-one will use. you sacrificed long term sustainability for short term win.
even if It would be shipped in similar way to zfs, noone will use it for anything more important than homelab
why? with this altitude you cannot be threated serious and this imply many risks what you might came up with in the future. another risk is they you are sole developer of this filesystem, that's also not acceptable to consider use if bcachefs seriously.
my advice would be: consider expanding team to have few developers that are able to contribute. learn to control your pride for the good of the while project. working with (and coordinating) other developers could make you understand better upstream kernel community. and given that chance you could delegate someone else with better diplomatic skills to deal with upstream in way that would be more beneficial for the whole project in long term.
It is not good when politics get in the way of good engineering.
Regardless of differing points of view on the situation, I think everyone can agree that bcachefs being actively updated on Linus tree is a good thing, right?
If you were able to work at your own pace, and someone else took the responsibility of pulling your changes at a pace that satisfies Linus, wouldn't that solve the problem of Linux having a good modern/CoW filesystem?
> Regardless of differing points of view on the situation, I think everyone can agree that bcachefs being actively updated on Linus tree is a good thing, right?
I think bcachefs is not the problem. The problem seems to be the sole maintainer who is notoriously abusive and apparently unable to work with other kernel developers.
I'm sure if another maintainer came along, one that wasn't barred for being abusive towards other maintainers, there would be no problem getting the project back in.
At this time, I don't think so.
We were never able to get any sane and consistent policy on bugfixes, and I don't have high hopes that anyone else will have better luck. The XFS folks have had their own issues with interference, leading to burnout - they're on their third maintainer, and it's really not good for a project to be cycling through maintainers and burning people out, losing consistency of leadership and institutional knowledge.
And I'm still seeing Linus lashing out at people on practically a weekly basis. I could never ask anyone else to have to deal with that.
I think the kernel community has some things they need to figure out before bcachefs can go back in.
Keep in mind that bcachefs’s adoption and eventual mainstream acceptance are not contingent on Linus accepting your contributions or on you “removing the experimental label.” What matters is eliminating the barriers that prevent users from trying it, and that is far easier when bcachefs is an upstream filesystem—something that allows more distributions to offer it as an installer option.
> And I'm still seeing Linus lashing out at people on practically a weekly basis. I could never ask anyone else to have to deal with that.
This is a bit off‑topic, but I wouldn’t be so quick to judge how well Linus is doing his job; no one else in the world has his responsibilities.
At this point, any new kernel contributor should be familiar with Linus and have come to accept, or at least tolerate, his ways.
> I think the kernel community has some things they need to figure out before bcachefs can go back in.
Fair enough. It may be better to let things cool off while giving bcachefs more time to reach a stable state before attempting to reintegrate it into Linux development. I hope you won’t give up, because Linux needs this.
Since bcachefs is your project and you seem to enjoy working on it, it wouldn’t be a stretch to say that you need this too, right? Don’t let ego get in the way of achieving your goals.
> We were never able to get any sane and consistent policy on bugfixes, and I don't have high hopes that anyone else will have better luck.
This reads an awful lot like blatant gaslighting.
It's quite public that you were kicked out not only because of abusive behavior towards other kernel developers but also you kept ignoring any and all testing and QA guardrails, to the point you tried to push patched that failed to build.
From the very public discussion, you should sit down any discussion on bugfixes and testing because, while you are voicing strong opinions on high quality bars, the evidence suggests you were following none.
This sort of misrepresentation of your public behavior will only trash your reputation further. I encourage anyone who reads this to actually look up the mailing list threads. It’s very illuminating.
Was there any attempt at making rules for experimental features looser than other filesystems? That seems to be the biggest bottleneck here.
That does seem to be one of the big disconnects, yes.
In the past I've argued that I do need a relatively free hand and to be able to move quickly, and explained my reasoning: we've been at the stage of stabilization where the userbase is fairly big, and when someone reports a bug we really need to work with them and fix it in a timely manner in order to keep them testing and reporting bugs. When someone learns the system well enough to report a bug, that's an investment of time and effort on their part, and we don't want to lose that by having them get frustrated and leave.
IOW: we need to prioritize working with the user community, not just the developer community.
All that's been ignored though, and the other kernel maintainers seem to just want to ratchet down harder and harder and harder on strictness.
At this point, we're past the bulk of stabilization, and I've seen (to my surprise) that I've actually been stricter with what I consider a critical fix than other subsystems.
So this isn't even about needing special rules for experimental; this is just about having sane and consistent rules, at all.
The problem was that you weren't following the rules.
The rules were clear about the right time to merge things so they get in the next version, and if you don't, they will have to get in the version after that. I don't know the specific time since I'm not a kernel developer, but there was one.
Linus is trying to run the release cycle on a strict schedule, like a train station. You are trying to delay the train so that you can load more luggage on, instead of just waiting for the next train. You are not doing this once or twice in an emergency, but you are trying to delay every single train. Every single train, *you* have some "emergency" which requires the train to wait just for you. And the station master has gotten fed up and kicked you out of the station.
How can it be an emergency if it happens every single time? You need to plan better, so you will be ready before the train arrives. No, the train won't wait for you just because you forgot your hairbrush, and it won't even wait for you to go home and turn your oven off, even though that's really important. You have to get on the next train instead, but you don't understand that other people have their own schedules instead of running according to yours.
If it happened once, okay - shit happens. But it happens every time. Why is that? They aren't mad at you because of this specific feature. They are mad at you because it happens every time. It seems like bcachefs is not stable. Perhaps it really was an emergency just the one time you're talking about, but that means it either was an emergency all the other times and your filesystem is too broken to be in the kernel, or it wasn't an emergency all the other times and you chose to become the boy who cried wolf. In either case Linus's reaction is valid.
It's a bugfix, and bugfixes are allowed at and time - weighing regression risk against where we're at in the cycle. It was a very high severity bug, low regression risk for the fix, and we were at rc3.
> It's a bugfix, and bugfixes are allowed at and time (...)
I'm afraid you sound like you're trying to gaslight everyone in the thread.
https://news.itsfoss.com/linux-kernel-bcachefs-drop/
Reading https://lore.kernel.org/all/4xkggoquxqprvphz2hwnir7nnuygeybf...
It is a not a bugfix, and you know it :(
If you are not acting on bad faith, I suggest you read Wittgensen
He has made a lot of work around the idea of language, which basically boil down to the fact that words have no intrinsic meaning : the meaning of a word is the meaning that a given population gives to that word
So in your case, you may be right about the meaning of the word "bugfix" in some population, but you must translate and use the meaning of that word in the "kernel" population
The dictionary is a lie .. :)
> - New option: journal_rewind [...]
> - Some new btree iterator tracepoints, for tracking down some livelock-ish behaviour we've been seeing in the main data write path.
Yeah, how are these two things bug-fixes? Especially the first one should not be merged late.
I think you mean Wittgenstein, though I wouldn’t recommend Philosophical Investigations as an entrypoint.
Do your really want to slip from being difficult to work with to being a liar? Be careful.
Damn. I was enjoying not having to deal with the fun of ZFS and DKMS, but it seems like now bcachefs will be in the same boat, either dealing with DKMS and occasional breakage or sticking with the kernel version that slowly gets more and more out of date.
The article says that bcachefs is not being removed from the mainline kernel. This looks like mostly a workaround for Linus and other kernel devs to not have to deal with Kent directly.
The three listed options in the OP thread were
* Another kernel Dev takes over management and they tread it as a fork (highly unlikely according to their estimate)
* Kent hires someone to upstream the changes for him and Kent stops complaining wrt when it's getting merged
* Bcachefs gets no maintenance and will likely be removed in the next major release
I do not know him personally, but most interactions I've read online by him sounded grounded and not particularly offensive, so I'm abstaining from making any kind of judgement on it.
But while I have no stake in this, Drama really does seem to follow Kent around for one reason or another. And it's never his fault if you take him by his public statements - which I want to repeat: he sounds very grounded and not offensive to me whatsoever.
If you look at all the places where Kent has had drama, the common element is him and environments that have pretty rigid workflows. The common thread seems to be him not respecting workflows and processes that those places have, that inconvenience his goals. So, he ignores the workflows and processes of those places, and creates a constant state of friction and papercuts for those who he needs to accomplish his goals. They eventually get fed up, and either say no, not working with you anymore, or no, you’re not welcome to contribute here anymore.
He’s not super offensive, but he will tell a Debian package maintainer that their process sucks, and the should change it and they are being stupid by following that process. Overall, he seems a bit entitled, and unwilling to compromise with others. It’s not just Kent though, the areas that seem to be the most problematic for him, are when it’s an unstoppable force (Kent), and an immovable wall (Linux / Debian).
Working in the Linux kernel is well known for its frustrations and the personal conflict that it creates, to the point that there are almost no linux kernel devs/maintainers that aren’t paid to do the work. You can see a similar set of events happen with Rust4Linux people, Asahi linux project and their R4L drivers, etc.
Grounded? Not offensive?
https://lore.kernel.org/lkml/CAHk-=wiLE9BkSiq8F-mFW5NOtPzYrt...
https://lore.kernel.org/all/citv2v6f33hoidq75xd2spaqxf7nl5wb...
The first one is by Linus? And his replies (at least the ones I read) are - to me- less aggressive then the rest of the mails in that chain
The second has one offensive remark:
> Get your head examined. And get the fuck out of here with this shit.
which I thought he admitted was out of line and - said sorry for. Or do I misremember? I admit once again, I'm still completely uninvolved and merely saw it play out on the internet.
If you read his replies downthread from that, Kent seems to be going through a lot of effort to not apologize, in any form, and prefers talking about how other people were mean to him.
I had high hopes for bcachefs. sigh
It's complicated, no one really knows what "externally maintained" entails at the moment. Linus is not exactly poised to pull directly from Kent, and there is no solution lined-up at the moment.
Both Linus and Kent drive a hard bargain, and it's not as simple as finding someone else to blindly forward bcachefs patches. At the first sign of conflict, the poor person in the middle would have no power, no way to make anyone back down, and we'd be back to square one.
It's in limbo, and there is still time, but if left to bitrot it will be removed eventually.
That person would be accountable to Linus, but not to Kent.
Unfortunately, there's also nothing they can do if Kent says no. Say there's a disagreement on a patch that touches something outside fs/bcachefs, that person can't exactly write their own patches incorporating the feedback. They're not going to fork and maintain their own patches. They'd be stuck between a rock and a hard place, and that gets us back to a deadlock.
The issue is that I have never seen Kent back down a single time. Kent will explain in details why the rules are bullshit and don't apply in this particular case, every single time, without any room for compromise.
If the only problem was when to send patches, that would be one thing. But disagreements over patches aren't just a timing problem that can be routed around.
The key thing here is I've never challenged Linus's authority on patches outside fs/bcachefs/; I've quietly respun pull requests for that, on more than one occasion.
The point of contention here was a patch within fs/bcachefs/, which was repair code to make sure users didn't lose data.
If we can't have clear boundaries and delineations of responsibility, there really is no future for bcachefs in the kernel; my core mission is a rock solid commitment to reliability and robustness, including being responsive to issues users hit, and we've seen repeatedly that the kernel process does not share those priorities.
You may be right, but I think looking at it from a lens of who has authority and can impose their decision is still illustrating the point I'm trying to make.
To some extent drawing clear boundaries is good as a last resort when people cannot agree, but it can't be the main way to resolve disagreements. Thinking in terms of who owns what and has the final say is not the same as trying to understand the requirements from the other side to find a solution that works for everyone.
I don't think the right answer is to blindly follow whatever Linus or other people say. I don't mean you should automatically back down without technical reasons, because authority says so. But I notice I can't remember an email where concessions where made, or attemps to find a middle grounds by understanding the other side. Maybe someone can find counterexamples.
But this idea of using ownership to decide who has more authority and can impose their vision, that can't be the only way to collaborate. It really is uncompromising.
> To some extent drawing clear boundaries is good as a last resort when people cannot agree, but it can't be the main way to resolve disagreements. Thinking in terms of who owns what and has the final say is not the same as trying to understand the requirements from the other side to find a solution that works for everyone.
Agreed 100%. In an ideal world, we'd be sitting down together, figuring out what our shared priorities are, and working from there.
Unfortunately, that hasn't been possible, and I have no idea what Linus's priorities except that they definitely aren't a bulletproof filesystem and safeguarding user data; his response to journal_rewind demonstrated that quite definitively.
So that's where we're at, and given the history with other local filesystems I think I have good reason not to concede. I don't want to see bcachefs run off the rails, but given all the times I've talked about process and the way I'm doing things I think that's exactly what would happen if I started conceding on these points. It's my life's work, after all.
You'd think bcachefs's track record (e.g. bug tracker, syzbot) and the response it gets from users would be enough, but apparently not, sadly. But given the way the kernel burns people out and outright ejects them, not too surprising.
> Unfortunately, that hasn't been possible, and I have no idea what Linus's priorities except that they definitely aren't a bulletproof filesystem and safeguarding user data
Remarks like this come across as extremely patronizing, as you completely ignore what the other party says and instead project your own conclusions about the other persons motives and beliefs.
> his response to journal_rewind demonstrated that quite definitively
No, no it did not in any shape way or form do that. You had multiple other perfectly valid options to help the affected users besides getting that code shipped in the kernel there and then. Getting it shipped in the kernel was merely a convenience.
If bcachefs was established and stable it would be a different matter. But it's an experimental file system. Per definition data loss is to be expected, even if recovery is preferable.
No, bcachefs-tools wasn't an option because the right way to do this kind of repair is to first do a dry run test repair and mount, so you can verify with your eyes that everything is back as it should be.
If we had the fuse driver done that would have worked, though. Still not completely ideal because we're at the mercy of distros to make sure they're getting -tools updates out in a timely manner, they're not always as consistent with that as the kernel. Most are good, though).
Just making it available in a git repo was not an option because lots of bcachefs users are getting it from their distro kernel and have never built a kernel before (yes, I've had to help users with building kernels for the first time; it's slow and we always look for other options), and even if you know how, if your primary machine is offline the last thing you want to have to do is build a custom rescue image with a custom kernel.
And there was really nothing special about this than any other bugfix, besides needing to use a new option (which is also something that occasionally happens with hotfixes).
Bugs are just a fact of life, every filesystem has bugs and occasionally has to get hotfixes out quickly. It's just not remotely feasible or sane to be coming up with our own parallel release process for hotfixes.
That you or the user dislike some of the downsides does not invalidate an option.
I will absolutely agree with you that merging that repair code would be vastly preferable to you and the users. And again, if bcachefs was mature and stable, I absolutely think users should get a way to repair ASAP.
But bcachefs is currently experimental and thus one can reasonably expect users to be prepared to deal with the consequences of that. And hence the kernel team, with Linus at the top, should be able to assume this when making decisions.
If you have users who are not prepared for this, you have a different problem and should seek how to fix that ASAP. Best would probably be to figure out how to dissuade them from installing. In any case, not doing something to prevent that scenario would be a disservice to those users.
bcachefs has had active users, with real data that they want to protect, since before it was merged.
A lot of the bcachefs users are using it explicitly because they've been burned by btrfs and need something more reliable.
I am being much, much more conservative with removing the experimental label than past practice, but I have been very explicit that while it may not be perfect yet and users should expect some hiccups, I support it like any other stable production filesystem.
That's been key to getting it stabilized: setting a high expectations. Users know that if they find a critical bug it's going to be top priority.
Given the bug fixes and changes, the experimental flag seems quite appropriate to me. That's not a bad thing.
However, it was put in the kernel as experimental. That carries with it implications.
As such, while it's very commendable that you wish to support the experimental bcachefs as-if it was production ready, you cannot reasonably impose that wish upon the rest of the kernel.
That said I think you and your small team is doing a commendable job, and I strongly wish you succeed in making bcachefs feature complete and production ready. And I say that as someone who really, really likes ZFS and run it on my Linux boxes.
Fair enough. As someone who has lost filesystems to bugs and files to corrupted blocks, I definitely appreciate the work you've done on repair and reliability.
I think there's room to have your cake and eat it too, but I certainly can't blame you for caring about quality, that much is sure.
Linus T is responsible for everything in Linux, it is his project and he is the maintainer. He can do everything he wants in his branch and people just have to accept it. If you want to be responsible you have to fork Linux.
Let's examine this, shall we?
Has he ever even been involved with a bcachefs bug? No, aside from arguing against shipping bugfixes.
Has he contributed in any way, besides merging code? No...
Has he set rules or guidelines that benefited bcachefs reliability? No, but he has shouted down talk about automated testing.
I think you're confusing power with responsibility.
FWIW DKMS is not the only way to distribute out of tree modules.
https://github.com/chimera-linux/ckms
> Damn. I was enjoying not having to deal with the fun of ZFS and DKMS, but it seems like now bcachefs will be in the same boat, either dealing with DKMS and occasional breakage or sticking with the kernel version that slowly gets more and more out of date.
Your distro could very easily include bcachefs if it wishes? Although I think the ZFS + Linux situation is mostly Linux religiosity gone wild, that very particular problem doesn't exist re: bcachefs?
The problem with bcachefs is the problem with btrfs. It mostly still doesn't work to solve the problems ZFS already solves.
> Although I think the ZFS + Linux situation is mostly Linux religiosity gone wild
I can think of non-religious reasons to want to avoid legal fights with Oracle.
> Although I think the ZFS + Linux situation is mostly Linux religiosity gone wild,
I think the Linux Kernel just doesn't want to be potentially in violation of Oracle's copyrights. That really doesn't seem that unreasonable to me, even if it feels pointless to you.
Who would use a file system which essentially seems to be developed by a single person? A bus-factor of one seems unacceptable for a FS. But maybe I am wrong and there are other developers, then why do they not take over upstreaming if the main developer is unable to collaborate with the kernel community.
I did for my laptop and Raspberry Pi which I didn't care much about. It was great being able to interact with Kent over IRC to sort out problems and when he is actually available he's really helpeful, but it made me realise that bcachefs has a long ways to go, and I have come to the realisation bus factor 1 is not something I'd want long term.
It's that good.
FreeBSD is giving me a sultry look as I ponder my NAS build.
I'm in that boat. I'm looking over at that Synology unit sitting in the corner of my living room, knowing it'll be the last of its kind to live here, and wondering what its replacement will look like. FreeBSD's been good to me and it might be time to reintroduce myself to it.
Doesn't TrueNAS (Linux version) come with ZFS?
TrueNAS Scale, which is the Linux variant, does indeed come with ZFS.
The comapny behind it, iXsystems, pays for ZFS developers as well.
Fwiw, I'm running a NAS on btrfs (on top of mdadm raid as I don't fully trust the btrfs raid, and the recovery tools seem worse). It seems to be working well so far.
Being able to do snapshots easily is really nice. I have a script that makes hourly snapshots and keeps the N latest, which protects me against a bunch of pebkac errors
I do periodic backups to non-btrfs storage though. I need backups anyway so it seemed like an easy way to de-risk
There are some Linux distros with ZFS as a near first class citizen. NixOS is one, I believe Alpine is another.
I looked long and hard at FreeBSD and eventually went with Gentoo on ZFS, as I was familiar with Gentoo and ZFS wasn't hard to add.
Proxmox is Debian + ZFS. Works great.
Ubuntu too.
FreeBSD is extremely simple to keep up to date, including third party packages.
Unless you have some very specific reason to use Linux, I would go with FreeBSD.
Go for it. I made the switch ~10 years ago and didn't regret it at all. First-class, rock solid ZFS integration. Saved my data on more than one occasion.
It's amazing how many of the "experts" here don't get that bcachefs != btrfs
people understand they're different, but if bcachefs is out, then that leaves btrfs as the only modern in-tree filesystem, but apparently you can't trust it with important data either.
I've been using btrfs on my NAS for years and have not had any problems. I suspect there are a hell of a lot of people like me you will not hear about because people don't generally get as vocal when things just work.
The venn diagram of "people who want a modern copy-on-write filesystem with snapshots to manage large quantities of data" and "people who want a massive pool of fault-tolerant storage" (e.g. building a NAS) has some pretty significant overlap.
The latter is where BTRFS is still hobbled: While the RAID-0, RAID-1, & RAID-10 modes work absolutely fine, the RAID-5 & RAID-6 modes are still broken, with an explicit warning during mkfs time (and in the manpages) that the feature is still experimental and should not be used to hold data that you care about retaining. This has, and continues to, bite people, with terabytes of data loss (backups are important, people!). That then sours them on every other aspect of ever using BTRFS again.
Is it just me or does Kent seem self-destructively glued to his own idea of how kernel development should work?
I don’t doubt that people on all sides have made mis-steps, but from the outside it mostly just seems like Kent doesn’t want to play by the rules (despite having been given years of patience).
It's not just kernel development. In the lwn thread, he mentioned and then demonstrated difficulty working with Debian developers as well.
IMHO, what his communications show is an unwillingness to acknowledge that other projects that include his work have focus, priorities, and policies that are not the same as that of his project. Also, expecting exceptions to be made for his case, since exceptions have been made in other cases.
Again IMHO, I think he would be better off developing apart with an announcement mailing list. When urgent changes are made, send to the announcement list. Let other interested parties sort out the process of getting those changes into the kernel and distributions.
If people come with bug reports from old versions distributed by others, let them know how to get the most up to date version from his repository, and maybe gently poke the distributors.
Yes, that means users will have older versions and not get fixes immediately. But what he's doing isn't working to get fixes to users immediately either.
Being an outsider to this whole scene, the whole thread reads very differently to me.
Kent seems very patient in explaining his position (and frustrations arising from other people introducing bugs to his code) and the kernel & debian folks are performing a smearing campaign instead of replying to what I see are genuine problems in the process. As an example, the quotes that are referenced by user paravoid are, imho, taken out of context (judging by reading the provided links).
There probably is a lot more history to it, but judging from that thread it's not Kent who looks like a bad guy.
Kent brings up Debian himself, unprompted.
This is one of the problems: Kent is frequently unable to accept that things don't go his way. He will keep bringing it up again and again and he just grinds people down with it. If you see just one bit of it then it may seem somewhat reasonable, but it's really not because this is the umpteenth time this exact discussion is happening and it's groundhog day once again.
This is a major reason why people burn out on Kent. You can't just have a disagreement/conflict and resolve it. Everything is a discussion with Kent. He can't just shrug and say "well, I think that's a bit silly, but okay, I can work with it, I guess". The options are 1) Kent gets his way, or 2) he will keep pushing it (not infrequently ignoring previous compromises, restarting the discussion from square one). Here too, the Debian people have this entire discussion (again) forced upon them by Kent's comments in a way that's just completely unnecessary and does nothing to resolve anything.
Even as an interested onlooker who is otherwise uninvolved and generally more willing to accept difficult behaviour than most people, I've rather soured on Kent over time.
You do realize that data integrity issues are not "live and let live" type things, right?
And there's a real connection to the issue that sparked all this drama in the kernel and the Debian drama: critical system components (the kernel, the filesystem, and others) absolutely need to be able to get bugfixes in a timely manner. That's not optional.
With Debian, we had a package maintainer who decided that unbundling Rust dependencies was more important than getting out updates, and then we couldn't get a bugfix out for mount option handling. This was a non-issue for every other distro with working processes because the bug was fixed in a few days, but a lot of Debian users weren't able to mount in degraded mode and lost access to their filesystems.
In the kernel drama, Linus threw a fit over a repair code to recover from a serious bug and make sure users didn't lose data, and he's repeatedly picked fights over bugfixes (and even called pushing for getting bugfixes out "whining" in the past).
There are a lot of issues that there can be give and take on, but getting fixes out in a timely manner is just part of the baseline set of expectations for any serious project.
Look, I get where you're coming from. It's not unreasonable. I've said this before.
But there are also reasons why things are the way they are, and that is also not unreasonable. And at the end of the day: Linus is the boss. It really does come down to that. He has dozens of other subsystem maintainers to deal with and this is the process that works for him.
Similar stuff applies to Debian. Personally, I deeply dislike Debian's inflexible and outmoded policy and lack of pragmatism. But you know, the policy is the policy, and at some point you just need to accept that and work with it the best you can.
It's okay to make all the arguments you've made. It's okay to make them forcefully (within some limits of reason). It's not okay to keep repeating them again and again until everyone gets tired of it and seemingly just completely fail to listen to what people are sating. This is where you are being unreasonable.
I mean, you *can* do that, I guess, but look at where things are now. No one is happy with this – certainly not you. And it's really not a surprise, I already said this in November last year: "I wouldn't be surprised to see bcachefs removed from the kernel at some point".[1] To be clear: I didn't want that to happen – I think you've done great work with bcachefs and I really want it to succeed every which way. But everyone could see this coming from miles.
[1]: https://news.ycombinator.com/item?id=42225345
> But there are also reasons why things are the way they are, and that is also not unreasonable.
It is unreasonable if it leads to users losing data. At this point, the only reasonable thing is to either completely remove support for bcachefs or give timely fixes for critical bugs, there's no middle position that won't willfully lead to users losing their data.
This used to be the default for distributions like Debian some time ago. You only supported foundational software if you were willing to also distribute critical fixes in a timely manner. If not, why bother?
For all other issues, I guess we can accept that things are the way they are.
> It is unreasonable if it leads to users losing data.
Changing the kernel development process to allow adding new features willy-nilly late in the RC cycle will lead to much worse things than a few people using an experimental file system losing their data in the long term.
The process exists for a reason, and the kernel is a massive project that includes more than just one file system, no matter how special its developers and users believe it is.
There's no need for kernel development process to change. New features go in during RCs all the time, it's always just a risk vs. reward calculation, and I'm more conservative with what I send outside the merge window that a lot of subsystems.
This blowup was entirely unnecessary.
Not too familiar with the kernel process for this, but for Linux distros there are ways to respond to critical issues including data corruption and data loss. It's just that you have to follow their processes to do this, such as producing a minimal patch that fixes the problem which is backported into the older code base (and there's a reason for that too: end users don't want churn on their installed systems, they want an install to be stable and predictable). Since distros are how you ultimately get your code into users' hands, it's really their way or the highway. Telling the distros they are wrong isn't going to go well.
For the Debian thing, I'm not sure on the specifics for bcachefs-progs (I'm going by what the author is reporting and some blog posts) but I think the problem with Debian is that they willfully ignore when upstream says "this is only compatible with this library version 2.1.x" and will downgrade or upgrade the library into not supported versions, to match the versions used in other programs already packaged. This kind of thing can introduce subtle, hard to debug bugs. It's a mess and problems are usually reported to upstream, that's a recurrent problem for Rust programs packaged in Debian. Rust absolutely isn't this language where if it compiles, it works, no matter how much people think otherwise.
And this is happening even though it's common for Debian to package the same C library multiple times, like, libfuse2 and libfuse3. This could be done for Rust libraries if they wanted to.
Anyway see the discussion and the relevant article here https://news.ycombinator.com/item?id=41407768 and https://jonathancarter.org/2024/08/29/orphaning-bcachefs-too...
But that's exactly the point here. In the context of a whole distribution, you don't want to update some package to a new version (on a stable branch), because that would affect lots of other packages that depended on that one. It may even be that other packages cannot work with the new updated dependency. Even if they can, end users don't want versions to change greatly (again, along a stable branch). Upstreams should accept this reality and ensure they support the older libraries as far as possible. Or they can deny reality and then we get into this situation.
And carrying multiple versions is problematic too as it causes increased burdens for the downstream maintainers.
I'd argue that libfuse is a bit of a special case since the API between 2 & 3 changed substantially, and not all dependencies have moved to version 3 (or can move, since if you move the v3 then you break on other platforms like BSD and macOS that still only support the v2 API).
Rust and especially Golang are both a massive pile of instability because the developers don't seem to understand that long term stable APIs are a benefit. You have to put in a bit of care and attention rather than always chasing the new thing and bundling everything.
BTW here's where I ported nbdfuse from v2 to v3 so you can see the kinds of changes: https://gitlab.com/nbdkit/libnbd/-/commit/c74c7d7f01975e708b...
You have to consider the bigger picture.
XFS has burned through maintainers, citing "upstream burnout". It's not just bcachefs that things are broken for.
And it was burning me out, too. We need a functioning release process, and we haven't had that; instead I've been getting a ton of drama that's boiled over into the bcachefs community, oftentimes completely drowning out all the calmer, more technical conversations that we want.
It's not great. It would have been much better if this could have been worked out. But at this point, cutting ties with the kernel community and shipping as a DKMS module is really the only path forwards.
It's not the end of the world. Same with Debian; we haven't had those issues in any other distros, so eventually we'll get a better package maintainer who can work the process or they'll figure out that their Rust policy actually isn't as smart as they think it is as Rust adoption goes up.
I'm just going to push for doing things right, and if one route or option fails there's always others.
> We need a functioning release process...
Yeah, that's in place. If nothing else, the decades of successful releases indicate that the process -at worst- functions. Whether that process fits your process is irrelevant.
> You have to consider the bigger picture.
Right back at you. Buddy, you need to learn how to lose.
It may seem like that on the surface, but you should recognise that these sorts of situations seem to follow Kent around.
So either Kent is on a righteous crusade against unreasonable processes within the Kernel, Debian, and every other large software project he interacts with. Or there's something about the way Kent interacts with these projects that causes friction.
I like Bcachefs, I think Kent is a very talented developer, but I'm not going to pretend that he is innocent in all this.
OTOH, the only named projects I've seen are Linux and Debian, which are 2 of the most toxic projects I'm aware of (I'm pretty sure the C++ standards committee beats the two of them combined).
But the problem with comparisons is that even if you're better than nuclear waste being dumped into the aquifer, you still might be enough to light a river on fire.
I've been involved in C++ standardization. In my country's national body, it is nothing like what goes on in Linux kernel development, even when there are strong disagreements amongst members.
it's waaay simpler than that. Some projects have established rules, and kent doesn't want to follow them. It doesn't matter how nice (or not) he is.
I actually like the idea of the maintainer going out of his way to make sure that my filesystem is safe to use. Even if it goes against the established rules. And I'm saying that as someone who actually likes both Linux and Debian.
It's a strawman to imagine that Debian doesn't have a way to ensure filesystems are safe and to respond to critical bugs that might cause data corruption. It's just that you have to follow their rules to do it. (And broadly the same rules apply to the other big distros as well).
I think he's too exposed to users reports, because anybody that shows up is in a potential data loss situation. So he's very focused on making everything as bug free as possible, and getting frustrated that people with different focus are not propagating the fixes as fast as possible.
Almost makes me think the distros light-forking it to just change the name (IceWeasel style) so the support requests don't get to him will help… probably not, though, because people will still go there because they want to recover their data.
autism is a hell of a social disability sometimes.
Don’t smear all of us with the bad behavior one.
I think Kent is in the wrong here, but it really doesn't help that the kernel people from Linus on down are seemingly unable to explain the problem, and instead resort to playground insults. Apart from being unprofessional and making for a hostile work environment, it doesn't really communicate why Kent's actions are problematic, so I've some sympathy for his not believing that they are.
People have explaining things, at great length, many times. Many of these have been posted to HN before, either as submissions or comments.
Kent just does not listen. Every time the discussion starts from the top. Even if you do agree on some compromise, in a month or two he'll just do the same thing again and all the same arguments start again.
You can't expect people to detail about four or five years of context in every single engagement for the benefit of interested 3rd parties like you or me.
> it doesn't really communicate why Kent's actions are problematic
I agree that the kernel community can be a hostile environment.
Though I’d argue that people _have_ tried to explain things to Kent, multiple times. At least a few have been calm, respectful attempts.
Sadly, Kent responds to everything in an email except the key part that is being pointed out to him (usually his behavior). Or deflects by going on the attack. And generally refuses to apologise.
Definitely not saying that the problems are all on one side here. Agreed that going on the attack was bad (as well as dumb).
I just think that while, yes, the kernel folks have tried to explain, they didn't explain well. The "why" of it is a people thing. Linus needs to be able to trust that people he's delegated some authority will respect its limits. The maintainers need to be able to trust that each other maintainer will respect the area that they have been delegated authority over. I think that Kent genuinely doesn't get this.
> Sadly, Kent responds to everything in an email except the key part that is being pointed out to him (usually his behavior).
Behaviour sounds like the least important part of code contributions. I smell overpowered, should've-been-a-kindergarten-teacher code of conduct person overreach.
No, Kent has generally had a nice tone. The issue is that he has repeatedly violated the rules about code contributions. For example by including new features together with several bug-fixes during rc. That is not a CoC issue, it is not respecting the rules of patch submission and not respecting the time of the kernel maintainers.
No. As someone who likes bcachefs and even literally donates to Kent's patreon, the way he has gone about engaging with the kernel community is not productive. Unfortunately.
CoC isn't even the issue, he constantly breaks kernel development rules relating to the actual code, then starts arguments with everyone up to and including Linus when he gets called out, and aggressively misses the point every time. Then starts the same argument all over again 6 weeks later.
And, like, if you don't like some rules, then you can have that discussion, but submitting patches you know will be rejected and then re-litigating your dislike of the rules is a waste of everyone's time.
I've seen plenty of times where the problems has been explained to Kent. But he just don't give a shit about the problems of people that isn't himself or that doesn't use his file system experiences.
It seems very clear to me that it's almost always a "you can't argue canon law with the Pope" situation - the rules say no new features, and it doesn't matter what the definition of "feature" is if the definition AND the rule come from the same person, Linus.
You can't win a rules-lawyer argument with the rulemaker.
> unable to explain the problem
unfortunately that's either due to lack of investigation by yourself or a bit dishonest.
It's orphaned in Debian as well, but I'm not sure what significant advantages it has over btrfs, which is very stable these days.
btrfs was unusable in multi disk setup for kernels 6.1 and older. Didn't try since then. How's stable btrs today in such setups?
Also see https://www.phoronix.com/news/Josef-Bacik-Leaves-Meta
It's sort of frustrating that this constantly comes up. It's true that btrfs does have issues with RAID-5 and RAID-6 configurations, but this is frequently used (not necessarily by you) as some kind of gotcha as to why you shouldn't use it at all. That's insane. I promise that disk spanning issues won't affect your use of it on your tiny ThinkPad SSD.
It's important to note that striping and mirroring works just fine. It's only the 5/6 modes that are unstable: https://btrfs.readthedocs.io/en/stable/Status.html#block-gro...
But RAID-6 is the closest approximation to raid-z2 from ZFS! And raid-z2 is stable for a decade+. Indeed btrfs works just fine on my laptop. My point is that Linux lacks ZFS-like fs for large multi disc setups.
Seriously for the people who take filesystems seriously and have strong preferences... Multi disk might be important.
BTRFS does have stable, usable multi-disk support. The RAID 0, 1, and 10 modes are fine. I've been using BTRFS RAID1 for over a decade and across numerous disk failures. It's by far the best solution for building a durable array on my home server stuffed full of a random assortment of disks—ZFS will never have the flexibility to be useful with mismatched capacities like this. It's only the parity RAID modes that BTRFS lacks, and that's a real disadvantage but is hardly the whole story.
That’s nice and all, but I have five disks in my server. I want the 6 mode.
In practice RAIDZ2 works great.
How can I know what configurations of btrfs lose my data?
I also have had to deal with thousands of nodes kernel panicing due to a btrfs bug in linux kernel 6.8 (stable ubuntu release).
I thought the usual recommendation was to use mdadm to build the disk pool and then use btrfs on top of that - but that might be out of date. I haven't used it in a while
This is very much a big compromise where you decide for yourself that storage capacity and maybe throughput are more important than anything else.
The md metadata is not adequately protected. Btrfs checksums can tell you when a file has gone bad but not self-heal. And I'm sure there are going to be caching/perf benefits left on the table not having btrfs manage all the block storage itself.
I thought most distros have basically disabled the footgun modes at this point; that is, using the configuration that would lose data means you'd need to work hard to get there (at which point you should have been able to see all the warnings about data loss).
Respectfully to the maintainers:
How can this be a stable filesystem if parity is unstable and risks data loss?
How has this been allowed to happen?
It just seems so profoundly unserious to me.
Does the whole filesystem need to be marked as unstable if it has a single experimental feature? Is any other filesystem held to that standard?
Maybe this specific feature should be marked as unstable and default to disabled on most kernel builds unless you add something like btrfs.experimental=1 to the kernel line or something
Parity support in multi-disk arrays is older than I am, it's a fairly standard feature. btrfs doesn't support this without data loss risks after 17 years of development.
If you're not interested in a multi-disk storage system that doesn't have (stable, non-experimental) parity modes, that's a valid personal preference but not at all a justification for the position that the rest of the features cannot be stable and that the project as a whole cannot be taken seriously by anyone.
Is that what I said?
> on your tiny ThinkPad SSD
Ad hominem. My thinkpad ssd is massive.
Good news, it will work just fine on that too.
as it turns out raid 5 and 6 being broken is kind of a big deal for people. its also far from ideal that the filesystem has random landmines that you can accidentally step on if you don't happen to read hacker news every day.
FWIW: RAID 5 and 6 having problems is not a random hole you'll accidentally stumble into.
The man page for mkfs.btrfs says:
> Warning: RAID5/6 has known problems and should not be used in production.
When you actually tell it to use raid5 or raid6, mkfs.btrfs will also print a large warning:
> WARNING: RAID5/6 support has known problems is strongly discouraged to be used besides testing or evaluation.
If you don't trust btrfs raid it's perfectly possible to run btrfs on top of lvm or mdadm raid. Then you have btrfs in a prety happy case single device mode. Also the recovery tooling is more well known and tested
I’ve been running btrfs on a little home Debian NAS for over a year now. I have no complaints - it’s been working smoothly, doing exactly what I want. I have a heterogeneous set of probably 6 discs, >20TB total, no problems.
*caveat: I’m using RAID 10, not a parity RAID. It could have problems with parity RAID. So? If you really really want RAID 5, then just use md to make your RAID 5 device and put btrfs on top.
i run btrfs on servers and desktops. it's usuable.
So do I and BTRFS is extremely good these days. It's also much faster than ZFS at mounting a disk with a large number of filesystems (=subvolumes), which is critical for building certain types of fileservers at scale. In contrast, ZFS scales horribly as the number of filesystems increases, where btrfs seems to be O(1). btrfs's quota functionality is also much better than it used to be (and very flexible), after all the work Meta put into it. Finally, having the option of easy writable snapshots is nice. BTRFS is fantastic!
> It's also much faster than ZFS at mounting a disk with a large number of filesystems (=subvolumes), which is critical for building certain types of fileservers at scale.
Now you've piqued my curiosity; what uses that many filesystems/subvolumes? (Not an attack; I believe you, I'm just trying to figure out where it comes up)
It can be useful to create a file server with one filesystem/subvolume per user, because each user has their own isolated snapshots, backups via send/recv are user-specific, quotas are easier, etc. If you only have a few hundred users, ZFS is fine. But what if you have 100,000 users? Then just doing "zpool import" would take hours, whereas mounting a btrfs filesystem with 100,000 subvolumes takes a seconds. This complexity difference was a show stopper for me to architect a certain solution on top of ZFS, despite me personally loving ZFS and having used it for a long time. The btrfs commands and UX are really awkward (for me) compared to ZFS, but btrfs is extremely efficient at some things where ZFS just falls down.
The main criticism in this thread about btrfs involves multidisk setups, which aren't relevant for me, since I'm working on cloud systems and disk storage is abstracted away as a single block device.
Incidentally, the application I'm reworking to use btrfs is cocalc.com. One of our main use cases is distributed assignments to students in classes, as part of the course management functionality. Imagine a class with 1500 students all getting an exact copy of a 50 MB folder, which they'll edit a little bit, and then it will be collected. The copy-on-write functionality of btrfs is fantastic for this use case (both in speed and disk usage).
Also, the out-of-band deduplication for btrfs using https://github.com/Zygo/bees is very impressive and flexible, in a way that ZFS just doesn't match.
I seem to recall some discussion in one of the OpenZFS leadership meetings about slow pool imports when you have many datasets. Sadly I can't recall the details, but at least it seems to be on their radar.
As far as I understand, a core use case at Meta was build system workers starting with prepopulated state and being able to quickly discard the working tree at the end of the build. CoW is pretty sweet for that.
Absurd to claim it’s unusable without any qualification whatsoever.
Single, dup, raid0, raid1, raid10 have been usable and stable for a decade or more.
I lost my BTRFS RAID-1 array a year or two ago when one of my drives went offline. Just poof, data gone and I had to rebuild. I am not saying that it happens all the time, but I wouldn't say it's completely bulletproof either.
What did you try before giving up?
All the anecdotes I see tend to be “my drive didn’t mount, and I tried nothing before giving up because everyone knows BTRFS sux lol”. My professional experience meanwhile is that I’ve never once been able to not (very easily!) recover a BTRFS drive someone else has given up for dead… just by running its standard recovery tools.
Related: Linux CoC Announces Decision Wrt Kent Overstreet (Bcachefs) (kernel.org)
https://news.ycombinator.com/item?id=42221564 - 2024-11-23, 103 comments
There's been much more recent friction between various parties, so I don't think this most recent news is a direct result of that decision. See for instance https://news.ycombinator.com/item?id=44464396
Surprised it took this long.
The whole situation is moronic at best. Linux needs a decent modern filesystem in tree. ZFS would easily be it, but unfortunately Sun decided back in the '00s to fuck Linux because they wanted to push Solaris instead. Little they knew ZFS ended up being FreeBSD top feature for years.
Btrfs is constantly eating people data, it's a bad joke nowadays. Right now on Linux you're basically forced to constantly deal with out of tree ZFS or accept that thinly provisioned XFS over LVM2 will inevitably cause you to lose data.
Btrfs is NOT constantly eating people data. You have nothing to back this statement.
It's widely used and the default filesystem of several distributions. Most of the problems are like for the other filesystem: caused by the hardware.
I've been using it for more than 10 years without any problem and enjoy the experience. And like for any filesystem, I backup my data frequently (with btrbk, thanks for asking).
> Btrfs is NOT constantly eating people data
Tell it to my data then. I was 100% invested in Btrfs before 2017, the year where I lost a whole filesystem due to some random metadata corruption. I then started to move all of my storage to ZFS, which has never ever lost me a single byte of data yet despite the fact it's out of tree and stuff. My last Btrfs filesystem died randomly a few days ago (it was a disk in cold storage, once again random metadata corruption, disk is 100% healthy). I do not trust Btrfs in any shape and form nowadays. I also vastly prefer ZFS tooling but that's irrelevant to the argument here. The point is that I've never had nothing but pain from btrfs in more than a decade
2017 was 8 years ago...
8 years ago was the first time that person encountered a problem with btrfs. But that wasn’t the last apparently:
> My last Btrfs filesystem died randomly a few days ago
8 years is not a long time.
> Btrfs is NOT constantly eating people data. You have nothing to back this statement.
Constantly may be a strong word, but there is a long line of people sharing tales of woe. It's good that it works for you, but that's not a universal experience.
> It's widely used and the default filesystem of several distributions.
As a former user, that's horrifying.
> Most of the problems are like for the other filesystem: caused by the hardware.
The whole point of btrfs over (say) ext4 is that it's supposed to hold up when things don't work.
btrfs has eaten my data, which was probably my bad for trying out a newly stable filesystem around 15 years ago. there are plenty of bug reports of btrfs eating other people's data in the years since.
It's probably mostly stable now, but it's silly to act like it's a paragon of stability in the kernel.
> but it's silly to act like it's a paragon of stability in the kernel.
And it's dishonest to act like bugs from 15 years ago justify present-tense claims that it is constantly eating people's data and is a bad joke. Nobody's arguing that btrfs doesn't have a past history of data loss, more than a decade ago; that's not what's being questioned here.
There's no need to call someone pointing out instability of a filesystem dishonest. That's bad faith.
I don't get why folks feel the need to come out and cheer for a tool like this, do you have skin in the game on whether or not btrfs is considered stable? Are you a contributor?
I don't get it.
But since you asked - let me find some recent bugs.
5.15.37 - fixes data corruption in database reads using btrfs https://www.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.15....
5.15.65 - fixes double allocation and cache corruption https://www.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.15....
6.1.105 - fixes O_APPEND with direct i/o can write corurpted files https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.1...
6.1.110 - fixes fsync race and corruption https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.1...
6.2.16 - fixes truncation of files causing data corruption https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.2.1...
btrfs-progs 6.2 fixes corruption on zstd extent read https://btrfs.readthedocs.io/en/latest/CHANGES.html
6.15.3, 4: possible data corruption, seems to be reparable: https://www.phoronix.com/news/Btrfs-Log-Tree-Corruption-Fix
Are people that encountered these also dishonest?
ext4 has "recent" correctness and corruption bugfixes. Just search through the 6.x and 5.x changelogs for "ext4:" to find them. It turns out that nontrivial filesystems are complex things that are hard to get right, even after decades of development by some of the most safety-and-correctness-obsessed people.
I've been using btrfs as the primary filesystem on my daily-driver PCs since 2009, 2010 or so. The only time I've had trouble with it was in the first couple of years I started using it. I've also used it as the primary FS on production systems at $DAYJOB. It works fine.
I know it's not the same, but close enough for me: lvm + xfs works wonders and it's rock solid.
Thin provisioning with lvm is afaik problematic, because the file systems aren't aware of it and if you run out of space you're screwed
I would love to use xfs on my NAS setup but no checksums is a deal breaker. Checksums have saved me multiple times where I've been able to either repair files with parity or restore from backups.
Without checksums I would have overwritten my backup data and lost a ton of files because the drives were reported that everything was OK for months while writing corrupt files.
Some collaboration failing there, no technical reasons for it. I hope they'll sort this nonsense out and it will go back to normal upstream maintenance.
It won’t. Kent is in this very post (as well as TFA) showing that he’s not learned anything
Shame, at the very least I was hoping this would lead to approaching someone to work as a go between.
(I find it difficult to believe nobody tried to step up to that position either)
Who are the bca chefs?