-
Notifications
You must be signed in to change notification settings - Fork 7
feat: add metrics collection using Prometheus #29
feat: add metrics collection using Prometheus #29
Conversation
Need to rebase experimental with main |
ca34ca2
to
7d78a28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jedel1043 can you rebase this branch with the latest changes from the experimental
branch? You might need to resolve some conflicts.
I'm going to start working on putting together some test bundles that we can use within our documentation!
I'll do that after the CI is in a good state (probably after #31 gets merged). |
c17d519
to
671912a
Compare
Moment of truth... |
76f525f
to
031c349
Compare
031c349
to
f3aa587
Compare
@jedel1043 could you update this section of the PR message to indicate how you verified that COS was working in a test deployment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I'm surprised how painless the COS integration itself was into the charm. Most of the fun came from just getting the exporter supported 😅
Just two small nits I'd like you to address, and after that I'm good to ship 🚢
@@ -112,6 +120,8 @@ def _on_install(self, event: InstallEvent) -> None: | |||
self._legacy_manager.write_jwt_rsa(jwt_rsa) | |||
|
|||
self._slurmctld.enable() | |||
self._slurmctld.munge.enable() | |||
self._slurmctld.exporter.enable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[foresight]: we'll want to make the port the prometheus exporter uses configurable in the future as we'll potentially have multiple exporters running per-node. We should likely address this after we have a better idea of what exporters we'll need to have installed on each of the nodes.
f3aa587
to
562bbbf
Compare
Enables integrating the charm with Grafana Agent, which can proxy the observability data to an external K8s cloud. Implements some default alerts and adds a prototype Grafana dashboard.
562bbbf
to
f3e7557
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Description
This change allows integrating the charm with the Canonical Observability Stack. Using Rivos Inc's Prometheus exporter, we can collect metrics from the controller, which can then be sent to a subordinate grafana-agent that can forward the metrics to the COS on Kubernetes using a cross-model, cross-cloud relation.
Still WIP:
How was the code tested?
Tested locally (Ubuntu Noble) in a hybrid-cloud environment on Juju (Microk8s + LXD).
Checklist