-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manager: sync rdma resource to node #2249
manager: sync rdma resource to node #2249
Conversation
2fa36c7
to
b1c9e60
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2249 +/- ##
==========================================
+ Coverage 66.07% 66.10% +0.02%
==========================================
Files 454 456 +2
Lines 53451 53632 +181
==========================================
+ Hits 35318 35451 +133
- Misses 15593 15632 +39
- Partials 2540 2549 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
b1c9e60
to
8cafd84
Compare
/lgtm |
pkg/slo-controller/noderesource/plugins/gpudeviceresource/plugin.go
Outdated
Show resolved
Hide resolved
pkg/slo-controller/noderesource/plugins/rdmadevicereource/plugin.go
Outdated
Show resolved
Hide resolved
pkg/slo-controller/noderesource/plugins/rdmadevicereource/plugin.go
Outdated
Show resolved
Hide resolved
8cafd84
to
528e095
Compare
pkg/slo-controller/noderesource/plugins/gpudeviceresource/plugin.go
Outdated
Show resolved
Hide resolved
pkg/slo-controller/noderesource/plugins/rdmadevicereource/plugin.go
Outdated
Show resolved
Hide resolved
Signed-off-by: wangjianyu.wjy <[email protected]>
528e095
to
30a4b7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[koord-manager-rdma]
Ⅰ. Describe what this PR does
In order to implement the end-to-end scheme of the rdma device, it is necessary to add an rdma resource controller to update the status of node nodes in time, including Capacity and Allocatable. The koordinator.sh/rdma resources are mainly updated.
Ⅱ. Does this pull request fix one issue?
Fixed loss of functionality for rdma resource registration and update to nodes
Ⅲ. Describe how to verify it
In the k8s cluster, prepare one or more servers that support rdma network adapters as cluster nodes. Install the new version of the koordlet component on each node and install the new version of the koord-manager component. Check whether the number of rdma resources in a node is updated to the actual number of rdma nics on the node.
Uninstall one or more RDMA network cards of the node and restart the server. After the node is added to the cluster and the cluster is stable, check whether the number of rdma network cards in the node resource is updated to the actual number of rdma network cards of the node.
Ⅳ. Special notes for reviews
Updating RDMA device resources to nodes requires the koordlet component on the end. Therefore, you must install the koordlet version that supports RDMA on each node in the cluster. In this way, the koord-manager component can be used to test the effect
V. Checklist
make test