dnf-json improvements #37

teg · 2021-10-02T19:04:05Z

teg
Oct 2, 2021
Maintainer

Problems

At the moment dnf-json suffers from several problems.

Cache Management

There are two main problems.

Lack of Cleanup

Our caches are not properly cleaned up, leading to regular problems for the internal service.

The most likely reason for this is that repository caches are indexed by their URL, and old ones are only cleaned up when replaced by new ones. For brew every compose has its own repository URL, so we only ever add new ones.

Cache Invalidation

Verifying that caches are up-to-date is moderately expensive (~200ms), so always doing it might add up to be too expensive. Never doing it would also cause problems, needless to say, as we would miss new packages. This would be particularly pertinent when people are iterating on a yum repository, creating new images with their RPMs, and ideally there should not be any races so that createrepo_c ; weldr-cli compose should give an image with the most up-to-date content.

We could probably get away with depsolve being a bit slower, but package enumeration and search should be faster. So some sort of logic that always verified the cache for depsolve but more rarely on search, might work.

Performance

We need both package search and depsolving to be performant.

`depsolve`

Some delay is acceptable for depsolve, but it should be in the order of 1-2 seconds, not 10, as it currently can be. This is an operation that is ideally done once per image build, but it is something we expect to be possible to fail (incompatible package selections), so it should not be so slow that iterating is frustrating. Most of the time (80%) is spent on loading caches from disk and setting up the dnf Base object. If we did not spawn a new process for every depsolve, we might be able to improve this greatly.

`search`

Both in cockpit-composer and in console.redhat.com this needs to be snappy. Currently, we don't use dnf-json in CRC, as we use static package lists, but as we get support for 3rd party repositories, we will need to make this dynamic too.

Considering cockpit-composer as that's what's currently using dnf-json for search, what we see is the following: Several calls to enumerate packages are made when you view a blueprint. Each of these take several seconds, even when the caches are hot. Leading to a sluggish experience. In my brief experiment I noticed that all packages are fetched from dnf-json before they are filtered in weldr/api.go, by moving the filtering into dnf-json package search/enumeration went from taking 600ms to 20ms. When added up that made a very noticeable difference (when a blueprint is opened three searches and one depsolve is performed).

Hard to reuse

In addition to being called from composer, we would like to call dnf-json from the workers to move depsolving there, and we would like to have the search functionality as a service in CRC. Moreover, osbuild-mpp could benefit from reusing dns-json too.

Potential Solution

ship dns-json as its own package, possibly as a sub-package of osbuild.
make it into a systemd service that speaks JSON as now, but runs as a deamon, making it possible to preserve state between calls, in this way, very similar to how we want osbulid-api to be.
redo the cache management to do more of it ourselves, rather than rely on dnf (also see the zchunk issues we recently found).
consider how to best expose the search functionality as a REST API in CRC.
once we know what we want, check in with the dnf team to see if they have some input/interest in this.

bcl · 2021-10-04T17:40:21Z

bcl
Oct 4, 2021
Collaborator

When I had similar issues in lorax-composer I settled on using the cache for depsolving blueprints and packages, but forcing a refresh of the metadata on every build. This was because Anaconda would also be fetching its own copy, so the NEVRAs had to match what Anaconda would encounter. I also had problems with dnf expiring things the way I expected it to so I ended up using my own timer and a lock, forcing a metadata refresh when it timed out. IIRC that was due to dnf being used inside a long running process, so dnf-json won't have the same problems but a systemd service may.

If you move depsolving to workers you will likely hit problems with the mirrors (in Fedora at least), you don't always get the same one and they don't all have the same metadata so that's something to watch out for.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image Builder

dnf-json improvements #37

{{title}}

Replies: 0 comments 1 reply

{{title}}

Select a reply

Image Builder

dnf-json improvements #37

teg Oct 2, 2021 Maintainer

Problems

Cache Management

Lack of Cleanup

Cache Invalidation

Performance

depsolve

search

Hard to reuse

Potential Solution

Replies: 0 comments · 1 reply

bcl Oct 4, 2021 Collaborator

teg
Oct 2, 2021
Maintainer

`depsolve`

`search`

Replies: 0 comments 1 reply

bcl
Oct 4, 2021
Collaborator