Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koordlet: fix core sched conflicts with GI and revise API #1829

Conversation

saintube
Copy link
Member

@saintube saintube commented Jan 10, 2024

Ⅰ. Describe what this PR does

  1. koordlet: fix an issue for the core sched compatibility with the group identity feature.
    • Background: The Anolis kernel (<=5.10.134-16.1) does not support enabling the core scheduling when some cgroups remain setting bvt=-1. Otherwise, the kernel hard lockup can occur while scheduling bvt=-1 tasks with the core-sched-enabled tasks on a rq.
    • Solution: Before enabling the core scheduling, the koordlet checks if the group identity can be disabled via sysctl and temporarily reset the cgroups of the group identity for a clean-up. And the group identity can skip reconciling the cgroups while the sysctl is already disabled. Before enabling the group identity, the koordlet checks if the core sched can be disabled via sysctl.
  2. apis: revise the core sched group API.
    • Goal: Separate the APIs between the core sched group ID and the policy (optional) about how to customize the core sched for the pod. Revise the defaults to switch easily between the core scheduling and the group identity.
    • Solution: Add a new label API koordinator.sh/core-sched-policy that can define if the core sched is disabled or make it an individual group exclusive to others. When the label koordinator.sh/core-sched-group-id is not set, we no longer consider it an exclusive policy, instead we regard it as an empty group ID.
metadata:
  labels:
    # Now when the label is missing or empty value just means an empty group id.
    # It no longer explicitly enables the core sched for the pod (instead, require the nodeSLO to enable).
    # And the usage about generating the group id according to the pod UID is moving to the following "exclusive" policy.
    koordinator.sh/core-sched-group-id: ""
    # (optional)
    # Add a new label to define the pod-level policy.
    # - "none": explicitly disable the core sched for the pod (reset to cookie 0).
    # - "exclusive": use the pod UID to overwrite the group id so that taking SMT isolation to all other pods.
    # - "" or other values: nothing new.
    koordinator.sh/core-sched-policy: "none" # options: "none", "exclusive", ""

Ⅱ. Does this pull request fix one issue?

Part of #1728 (2.1).

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

See #1728 (2.2). After the Anolis kernel fixes the compatibility problem and provides a more stable interface, koordlet removes the workaround about when the node's CPU QoS policy is migrating from Group Identity to Core Scheduling.

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

Copy link

codecov bot commented Jan 10, 2024

Codecov Report

Attention: 64 lines in your changes are missing coverage. Please review.

Comparison is base (c70f410) 67.03% compared to head (6d83897) 67.06%.

Files Patch % Lines
.../koordlet/runtimehooks/hooks/groupidentity/rule.go 58.06% 17 Missing and 9 partials ⚠️
pkg/koordlet/util/system/core_sched.go 0.00% 18 Missing ⚠️
pkg/koordlet/util/system/system_file.go 0.00% 11 Missing ⚠️
pkg/koordlet/koordlet.go 0.00% 3 Missing ⚠️
...oordlet/runtimehooks/hooks/coresched/core_sched.go 94.91% 2 Missing and 1 partial ⚠️
pkg/koordlet/util/node.go 85.71% 2 Missing ⚠️
pkg/koordlet/runtimehooks/hooks/coresched/rule.go 95.23% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1829      +/-   ##
==========================================
+ Coverage   67.03%   67.06%   +0.03%     
==========================================
  Files         407      407              
  Lines       45557    45716     +159     
==========================================
+ Hits        30538    30659     +121     
- Misses      12796    12820      +24     
- Partials     2223     2237      +14     
Flag Coverage Δ
unittests 67.06% <72.41%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@saintube saintube force-pushed the koordlet-fix-coresched-conflict-with-gi branch from 943577a to 4eaeb49 Compare January 11, 2024 14:14
@saintube saintube changed the title koordlet: fix core sched conflicts with group identity koordlet: fix core sched conflicts with GI and revise API Jan 12, 2024
@saintube saintube force-pushed the koordlet-fix-coresched-conflict-with-gi branch from 4eaeb49 to bb19d80 Compare January 12, 2024 02:45
@hormes
Copy link
Member

hormes commented Jan 15, 2024

"exclusive": use the pod UID to overwrite the group id so that taking SMT isolation to all other pods.

为啥要在这里覆盖上面的 core-sched-group-id,而不是直接修改 core-sched-group-id?

@saintube
Copy link
Member Author

"exclusive": use the pod UID to overwrite the group id so that taking SMT isolation to all other pods.

为啥要在这里覆盖上面的 core-sched-group-id,而不是直接修改 core-sched-group-id?

“关闭 core sched” 和 “与所有其他 pod 互斥” 其实都应该能通过配置 “group--id” 来支持,但单独维护这样的分组策略是有成本的,而 policy 协议主要是基于分组机制提供特殊配置方案。独立协议的另一原因是,在应急运维和灰度场景下,可以简单打标 policy 协议来把 pod 标记为关闭或强隔离,随后删去该协议即可恢复原有分组,不需要单独维护临时分组。

@saintube saintube force-pushed the koordlet-fix-coresched-conflict-with-gi branch 2 times, most recently from 7f7c869 to 1d6a742 Compare January 15, 2024 09:34
@zwzhang0107
Copy link
Contributor

/lgtm
/approve

koordlet: revise core sched group api

Signed-off-by: saintube <[email protected]>
@saintube saintube force-pushed the koordlet-fix-coresched-conflict-with-gi branch from 1d6a742 to 6d83897 Compare January 16, 2024 06:07
@koordinator-bot koordinator-bot bot removed the lgtm label Jan 16, 2024
@hormes
Copy link
Member

hormes commented Jan 16, 2024

/lgtm
/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hormes, zwzhang0107

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 10c2d6b into koordinator-sh:main Jan 16, 2024
20 checks passed
saintube added a commit to saintube/koordinator that referenced this pull request Feb 26, 2024
saintube added a commit that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants