-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
number of files/file descriptors #2
Comments
This might suggest that we should consider a container format that is essentially an embedded filesystem. But it's probably better if we don't have to. |
Rocksdb has this problem too, fwiw they have a setting to set the max open files (and assume resort to closing/opening files once they go beyond this limit). I'll add it to the doc |
FWIW, older Microsoft Office files used the MS "compound file binary format" that contained an embedded FAT file system. |
Another problem with multiple files is that sequential access in multiple files != sequential access on disk |
the assumption for writes is that the OS block allocator is good enough to be able to allocate consecutive blocks (if possible) for writing the batches this is a nice thing about spdk which gives you more fine grained control over this in the blobfs library. fwiw rocksdb has a similar design where the LSM tree has to read from potentially multiple files |
I forgot to mention, one plus of the many files approach is that once the file is written it is immutable and no longer changes. I heard two times (e.g., talk on Paimon and some talk from RockSet) how this is a nice property for distributed storage as you can move some of these files to cold storage (or send them around etc.) whenever necessary and you don't have to worry about them being modified concurrently by the pipeline. |
Since we're dealing with immutable data, we might find some value in the ability to refer to an object by its hash, e.g. name a batch by a hash of its data. I don't have a clear idea of how this is valuable yet. |
At least a naive reading of Data Format suggests that, in this design, there will be one or more files per OrderedLayer/ColumnLayer. Since a Spine can contain an arbitrary number of these, we need to be careful about proliferating the number of them unless we want to be in the business of not just buffer caching but fd caching also. I believe that Linux has a hard limit of 65536 fds per process.
The text was updated successfully, but these errors were encountered: