Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add a file attribute or mount option for case insensitivity... #44

Open
piedrok opened this issue Jan 11, 2023 · 17 comments
Open

Comments

@piedrok
Copy link

piedrok commented Jan 11, 2023

Hi!

With ext4 there is the file attribute "+F" which adds case insensitivity to a folder and the containing files. BTRFS doesn't have any option like this...

For cross platform tasks there are many use cases for which case folding is desirable or in some cases even necessary.

My suggestion would be to have this as attribute like on etx4 or as a mount option for a subvolume - or even both?

As it is now, I for example have to use a loop device formatted as ext4, prepared with "chattr +F", as a collection folder that gets input from different file sets send to me by people working on different platforms...

Well, I don't think I have to convince anybody that a case insensitive option would be a useful addition - so I hope this can be added to the todo list!

thanks for reading,
pk

Thanks for reading

@MarkRose
Copy link

There are patches and some discussion about this on the mailing list: https://lore.kernel.org/linux-bcachefs/[email protected]/T/#u

@kakra
Copy link

kakra commented Oct 13, 2023

There are patches and some discussion about this on the mailing list

But that's for bcachefs.

@devZer0
Copy link

devZer0 commented Feb 22, 2024

For cross platform tasks there are many use cases for which case folding is desirable or in some cases even necessary.

just for my curiosity - where is that needed or for what ?

@Atemu
Copy link

Atemu commented Feb 22, 2024

People run Windows applications through WINE which sometimes assume case insensitivity.

The SteamDeck uses ext4 with case folding by default for this reason.

@piedrok
Copy link
Author

piedrok commented Feb 22, 2024

@Atemu: Exactly!

Also Steam Workshop Mods are mainly created with Windows in mind.

So mods - even for native linux games! - subscribed to through the Steam workshop will often fail to work.

Though using a case insensitive folder or file system for the same game's mod installation folder often makes the mods and mod update subscription work flawlessly!

Also this improves compatibility for shared folders or partitions between Linux and Windows users. (E.g. via WinBtrfs which claims to have a working "Windows10 case sensitivity flag"...)

It's useful to have the option "case folding" for folders or partitions!

@adrianinsaval
Copy link

I also have a need for this and it's not related to gaming, I got my hands on a very good maintenance manual for my car but unfortunately it's in the form of a webpage that was designed to be hosted on windows. Whoever was the idiot that made it did not pay attention to cases and when following links it constantly tries to go to an nonexistent file because the case is different.

@piedrok
Copy link
Author

piedrok commented Mar 16, 2024

Importing web projects is a nightmare without case insensitivity!

The same is true for backups that have been saved and moved around on different systems. Recovering those file structures gets much easier if case folding can be enabled on some folders - and be it temporarily...

@ChaosRifle
Copy link

+1 for case folding.
game modding or software in wine often require case insensitivity, and web projects kinda suck without it, as they are often designed without case sensitivity in mind.

Without casefolding we are forced to use ext4 when, not if, we need the feature. With it I could ditch ext4 entirely though.

@ikcikoR
Copy link

ikcikoR commented Nov 14, 2024

A video game called "Phantasy Star Online 2: New Genesis" which can be ran through Proton stores all assets as tiny separate files and from what I understand the lack of case folding is the reason why it runs into EXTREME stutters when loading assets, even on an NVME, so the support for that feature is more than welcome

@piedrok
Copy link
Author

piedrok commented Dec 12, 2024

Does any dev here even take notice?

@kdave
Copy link
Member

kdave commented Jan 2, 2025

Casefolding is on the short term todo-list.

@kdave
Copy link
Member

kdave commented Jan 2, 2025

https://lore.kernel.org/all/CAHk-=wice8YV5N1jjyz42uNi-eZTvG-G2M46qaN7T9VsSaCP_Q@mail.gmail.com/ Linus does not like the feature so we may need some "but users want it" eventually.

@Atemu
Copy link

Atemu commented Jan 2, 2025

I mean, he's not wrong; it's stupid. I'd prefer if we didn't need casefolding either but many Windows applications rely on it sadly and this sort of thing cannot be efficiently emulated in userspace without the filesystem making guarantees.

IIRC what Linus wrote was in reply to a regression in f2fs case folding in that couldn't be solved in any good way because the on-disk format of f2fs depended on an implementation detail that was changed. Perhaps btrfs' casefolding support could be implemented in a way where it's simply not ever possible to ever run into this issue.

That might appease Linus and would be prudent in any case.

I'd also like to use this opportunity to kindly request that this be implemented such that it's a per-directory property first and foremost, rather than a filesystem-global mount option. For the use-cases that are most pressing today, we only need specific directories (and their children) to be case-folding. The entire rest of the filesystem should (and in many cases must) remain case-sensitive.

@kdave
Copy link
Member

kdave commented Jan 2, 2025

Linus suggests to use slow lookup instead of hashing the name and then looking it up by that. I'm not sure if this is feasible so we may end up with thesame logic as ext4 and f2fs (avoiding the known bugs).

The case folding is now implemented as per-directory flag (and can be set only on an empty directory), so I'm expecting that btrfs will keep that behaviour for parity.

@Atemu
Copy link

Atemu commented Jan 2, 2025

I'm likely missing practical details here but couldn't we just always store the original file name as-is in the on-disk format but for all purposes of comparison we bring everything into the "folded space"?

If it needs to be determined whether a given file name exists in a dnode for instance, you'd do any required comparison with both names folded:

foldeq(a, b) = fold(a) == fold(b)
is_in_dnode_folded(dnode, name) = any(foldeq(name), dnode.names) # or an equivalent hashed version for perf

@kdave
Copy link
Member

kdave commented Jan 3, 2025

I think it is implemented like that, the first lookup when creating the file stores the name in the dentry cache and then the filesystem uses it in its metadata. Any later lookups are done in the case insensitive manner, as you outlined. The file name hashing can be done in the dentry and/or in the filesystem itself. Here btrfs uses crc32c as name has for the key.offset when looking up the DIR_ITEM.

@Atemu
Copy link

Atemu commented Jan 3, 2025

The file name hashing can be done in the dentry and/or in the filesystem itself. Here btrfs uses crc32c as name has for the key.offset when looking up the DIR_ITEM.

The important question here is though whether the folded version of the name (or a hash thereof) is ever stored on-disk because the definition of how to fold a given file name can change over time. When that happens, you can get into a situation where a newer kernel can no longer correctly tell whether a given file name is present in the dnode based on on-disk data because when it folds the name, that might not match the folded name on-disk that would have matched before.
Worse yet, modifying a dnode with a different folding definition and then downgrading the kernel has the same issue: you could easily get yourself into a situation where neither the new or the old kernel can correctly query all dnodes which is then essentially a corrupted on-disk state.

AFAIUI, this is what happened (or was possible) with f2fs.

If you only ever store the orignal name on-disk (never a folded version or derivative thereof), you could still get into inconsistent state where two dentries that were considered separate before would now be considered the same which is bad of course but that's a much more graceful failure mode because it'd be recoverable with both the old and the new kernel:
On the new kernel, you'd have to unlink/rename the "same" dentry twice because it'd first resolve to one and then to the other dentry that both have a name that is now considered the "same" but both are in theory accessible.
On the old kernel, it'd work just like before because they'd still be considered separate.

If the definition of folding were to be changed to be more exclusionary, it'd be the same situation but with old/new kernel positions reversed.

If you did have to to encode the folded names into the on-disk state, you'd need to explicitly version the folding definition IMV; perhaps as an incompat flag. That wouldn't necessarily meant the kernel would need to keep all folding implementations around forever but you could at least warn or even error users when an incompatible folding definition is used to prevent getting into inconsistent states.

That might perhaps be a good idea anyways?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants