Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manager: sync rdma resource to node #2249

Merged
merged 1 commit into from
Dec 5, 2024

Conversation

ferris-cx
Copy link
Contributor

[koord-manager-rdma]

Ⅰ. Describe what this PR does

In order to implement the end-to-end scheme of the rdma device, it is necessary to add an rdma resource controller to update the status of node nodes in time, including Capacity and Allocatable. The koordinator.sh/rdma resources are mainly updated.

Ⅱ. Does this pull request fix one issue?

Fixed loss of functionality for rdma resource registration and update to nodes

Ⅲ. Describe how to verify it

  1. In the k8s cluster, prepare one or more servers that support rdma network adapters as cluster nodes. Install the new version of the koordlet component on each node and install the new version of the koord-manager component. Check whether the number of rdma resources in a node is updated to the actual number of rdma nics on the node.

  2. Uninstall one or more RDMA network cards of the node and restart the server. After the node is added to the cluster and the cluster is stable, check whether the number of rdma network cards in the node resource is updated to the actual number of rdma network cards of the node.

Ⅳ. Special notes for reviews

Updating RDMA device resources to nodes requires the koordlet component on the end. Therefore, you must install the koordlet version that supports RDMA on each node in the cluster. In this way, the koord-manager component can be used to test the effect

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@ZiMengSheng ZiMengSheng force-pushed the koord-manager-rdma branch 2 times, most recently from 2fa36c7 to b1c9e60 Compare November 20, 2024 07:55
Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 95.38462% with 6 lines in your changes missing coverage. Please review.

Project coverage is 66.10%. Comparing base (b36d230) to head (30a4b7a).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...r/noderesource/plugins/rdmadevicereource/plugin.go 93.40% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2249      +/-   ##
==========================================
+ Coverage   66.07%   66.10%   +0.02%     
==========================================
  Files         454      456       +2     
  Lines       53451    53632     +181     
==========================================
+ Hits        35318    35451     +133     
- Misses      15593    15632      +39     
- Partials     2540     2549       +9     
Flag Coverage Δ
unittests 66.10% <95.38%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ZiMengSheng
Copy link
Contributor

/lgtm

@ZiMengSheng ZiMengSheng changed the title add rdma device controller manager: sync rdma resource to node Dec 4, 2024
Copy link
Member

@saintube saintube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@saintube saintube added the lgtm label Dec 5, 2024
@ZiMengSheng
Copy link
Contributor

/approve

@koordinator-bot koordinator-bot bot merged commit b8b1d89 into koordinator-sh:main Dec 5, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants