Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory allocation in prometheus.MakeLabelPair #1702

Open
jkroepke opened this issue Dec 25, 2024 · 2 comments
Open

High memory allocation in prometheus.MakeLabelPair #1702

jkroepke opened this issue Dec 25, 2024 · 2 comments

Comments

@jkroepke
Copy link
Contributor

While investigate memory leak with windows_exporter, I figure out that there is a decent memory allocation rate in prometheus.MakeLabelPairs.

windows_exporter (like node_exporter) generate a high amount of ConstMetrics on each scrape call. The process collector expose around 6200 metrics on each call, resulting in 6200 MustNewConstMetric calls on each scrape.

I ask myself, if this can be optimized somehow. The heap dump is generate after 1 days of runtime, with 5 seconds scrape interval (to simulate a long running scenario)

the dto.LabelPair struct contains other struct related to protobuf. Not sure, if they needed. Prometheus 2.0 does not support protobuf metris scrape, it feels like an expensive leftover.


Flame Graph:
2024-12-25 18_49_20-20 101 113 87_3389 - Remotedesktopverbindung

Source View:
2024-12-25 18_51_46-20 101 113 87_3389 - Remotedesktopverbindung

Heap Dump via pprof: memory_profile_windows.pb.gz

Copy link

dosubot bot commented Dec 25, 2024

Hey @jkroepke! I'm here to help you with bugs, questions, and becoming a contributor. Let's tackle this issue together!

The MakeLabelPairs function in the Prometheus client_golang library is indeed a point of interest when it comes to memory allocation, especially in scenarios like the one you described with the windows_exporter. This function constructs label pairs for metrics, which can lead to significant memory usage when a large number of metrics are generated, such as the 6200 MustNewConstMetric calls per scrape in your case.

The function preallocates a slice for label pairs to minimize memory allocations during appends, but the creation of dto.LabelPair instances, which involve protobuf-related structures, can still contribute to memory usage [1]. Since Prometheus 2.0 does not support protobuf metrics scrape, these structures might seem like an unnecessary overhead.

To optimize memory usage, consider the following approaches:

  1. Reduce Metric Volume: If possible, reduce the number of metrics being generated or exposed. This might involve aggregating metrics or selectively disabling certain metrics that are not critical.

  2. Optimize Label Usage: Minimize the number of labels and their cardinality. High cardinality can lead to increased memory usage due to the large number of unique label combinations.

  3. Custom Implementation: If the protobuf structures are indeed unnecessary for your use case, consider implementing a custom version of MakeLabelPairs that avoids using these structures, though this would require maintaining a fork or custom build of the library.

  4. Review Scrape Interval: Adjust the scrape interval to balance between data granularity and resource usage.

These strategies can help mitigate the memory allocation issues you're experiencing.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@bwplotka
Copy link
Member

Yup, that's a known problem for the Const* metric users. A great showcase is in cadvisor also impacted by this. Kube state metrics get rid of client_golang for this reason for the scrape endpoints.

At some point I tried to optimize cadvisor code to essentially cache a metrics and update them in place or so, which is not trivial for the bigger codebase: google/cadvisor#2974

There's definitely room to either make mutable caching layer for bigger exporters efficiency or make LabelPairs more efficient in the current NewConst* API. For the latter, we could also try a global (or local) set of sync.Pool pools, or something longer term for both LabelPair and LabelPair slices (typically small). It feels collaborating on efficient mutable dto.Metric or something for exporter use has a bigger effect.

the dto.LabelPair struct contains other struct related to protobuf. Not sure, if they needed. Prometheus 2.0 does not support protobuf metris scrape, it feels like an expensive leftover.

That's true the LabelPair struct itself could be smaller. However we DO support and intend to have some protobuf metric scrape, so that part has to work (marshalling/unmarshalling). Maybe we could get rid of unknown field support for this type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants