-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Consider supporting size management for OCI Store #472
Comments
cc: @Wwwsylvia |
Duplicate of #454 |
Is there a plan to implement higher level function beyond Delete? Such as a way to add a time last accessed annotation in the index.json and an accompanying lastused descriptor method? Basically an API to make it easier to implement an LRU eviction strategy. |
That seems an auto purge functionality and is currently out of the scope since |
While considering similar concerns in our code base, we considered using containerd's content store package, which offers garbage collection controlled by labels. Having support for some similar functionality in oras would be great, but at the moment I'm more interested in getting #470 so we can delete at all. Once it supports delete and updating the predecessor/successor graph, then we can at least identify unreferenced/orphaned resources. |
Removing the |
Let's have some discussions on the following general questions, which choices work better with actual needs?
You are welcome to take a look at the draft PRs #652, #653 and give feedbacks on the design. These two PRs collectively implement automatic recursive garbage collection upon |
Hi!
Automatic would be welcomed, but having a dedicated function as a first step is OK.
The garbage collection should be recursive, otherwise we will end up with wasted node (C and D in the first case).
I took a look at your PRs and they look pretty fine. Best regards. |
I think we can start with providing a public GC function. And then we can consider having a store option for enabling auto-GC. It could be somewhat similar to
Yes, the idea of GC is to clean up all garbage (dangling nodes). If we delete only the direct successors ( |
Then should the default value of |
#656 will enable However, this issue requests a higher level of garbage collection. That is predicate-based garbage collection where the predicate is based on access metadata (e.g. LRU mentioned in the issue description). In summary, it requires the following sub features to fulfill the request:
For 1, the metadata database can be in memory if it is within a single process, using a file if it is accessed by a single process at a time, or using a real DB if multi-process access is required. Here is an example implementation: type MetadataStore struct {
*oci.Store
LastAccessTime sync.Map // map[digest.Digest]time.Time
}
func (s *MetadataStore) Push(ctx context.Context, expected ocispec.Descriptor, reader io.Reader) error {
if err := s.Store.Push(ctx, expected, reader); err != nil {
return err
}
if isManifest(expected) {
s.LastAccessTime.Store(expected.Digest, time.Now())
}
return nil
}
func (s *MetadataStore) Fetch(ctx context.Context, target ocispec.Descriptor) (io.ReadCloser, error)
if rc, err := s.Store.Fetch(ctx, target); err != nil {
return nil, err
}
if isManifest(target) {
s.LastAccessTime.Store(target.Digest, time.Now())
}
return rc, nil
}
func (s *MetadataStore) Delete(ctx context.Context, target ocispec.Descriptor) error {
if err := s.Store.Delete(ctx, target); err != nil {
return err
}
s.LastAccessTime.Delete(target.Digest)
return nil
} For 2, a GC scheduler can be implemented using a go-routine with a predicate. For example, ms := GetMetadataStore()
ctx := context.Todo()
ticker := time.NewTicker(time.Minute * 10) // 10 mins
for range ticker.C {
func() {
ctx, cancel := context.WithTimeout(ctx, time.Second * 10) // stop the world for 10s for GC
defer cancel()
// remove those images which are not accessed in 1 day
ms.LastAccessTime.Range(func(key, value any) {
dgst := key.(digest.Digest)
accessTime := value.(time.Time)
if accessTime.Add(time.Hour * 24).Before(time.Now()) {
ms.Delete(ctx, ocispec.Descriptor{Digest: dgst})
}
})
ms.GC(ctx) // optional as ms.AutoGC is enabled by default
}()
} @akashsinghal We need your inputs / comments on this feature request as we are not sure if this feature request is still sync with your demand so that we can continue working on it or close when #656 is merged. |
Part of #472, this PR implements recursive GC for `oci.Store`. A field `AutoGarbageCollection` of `oci.Store` is added, default value is true. Signed-off-by: Xiaoxuan Wang <[email protected]>
@shizhMSFT Thanks for laying out the functionality required here. We are on the same page. The LRU eviction with scheduled GC is exactly what the Ratify project is looking for. In our specific use case, it's multi process so the metadata store implementation would need to handle concurrent read/writes. We already have requirements of Adding GC support is a great start for us in #656. However we will need the metadata store and scheduling support for us to be able to effectively use GC. If oras-go can support this that would be great, however I'm not sure what level of functionality ORAS folks would like to expose versus require consumers (Ratify) to build and manage on top of ORAS on their own. Do you have any thoughts on this boundary? If the metadata store was built in to oras-go and exposed, would it still be the consumer's responsibility to provide a metadata store implementation that fits their needs (single process vs multi concurrent processes DB)? |
Since it is non-trivial to define a generic metadata store, it would be better if Ratify can implement its metadata store for its specific scenarios. Multi-process support is also another non-trivial work and it is not on the roadmap of |
Part of #472 Signed-off-by: Xiaoxuan Wang <[email protected]>
As #653 and #656 are merged, the basic delete features have been done. Thank @wangxiaoxuan273 @eiffel-fl @carabasdaniel for your contributions to this effort. Since this issue request more features, which are out of the scope of |
Currently, there isn't built in functionality for deleting and managing OCI Store size. It would be nice if there was a garbage collection-like functionality to evict LRU blobs from the index and delete the corresponding blobs from the OCI store
The text was updated successfully, but these errors were encountered: