Skip to content

Commit

Permalink
operator self-node-remediation (0.7.1)
Browse files Browse the repository at this point in the history
  • Loading branch information
mshitrit authored Oct 25, 2023
1 parent 569f3c6 commit f65b2ad
Show file tree
Hide file tree
Showing 11 changed files with 1,101 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
control-plane: controller-manager
self-node-remediation-operator: ""
name: self-node-remediation-controller-manager-metrics-service
spec:
ports:
- name: https
port: 8443
targetPort: https
selector:
control-plane: controller-manager
self-node-remediation-operator: ""
status:
loadBalancer: {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: null
labels:
rbac.ext-remediation/aggregate-to-ext-remediation: "true"
self-node-remediation-operator: ""
name: self-node-remediation-ext-remediation
rules:
- apiGroups:
- self-node-remediation.medik8s.io
resources:
- selfnoderemediationtemplates
verbs:
- get
- apiGroups:
- self-node-remediation.medik8s.io
resources:
- selfnoderemediations
verbs:
- get
- list
- watch
- create
- update
- patch
- delete
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: v1
data:
controller_manager_config.yaml: |
apiVersion: controller-runtime.sigs.k8s.io/v1alpha1
kind: ControllerManagerConfig
health:
healthProbeBindAddress: :8081
metrics:
bindAddress: 127.0.0.1:8080
webhook:
port: 9443
leaderElection:
leaderElect: true
resourceName: 547f6cb6.medik8s.io
kind: ConfigMap
metadata:
labels:
self-node-remediation-operator: ""
name: self-node-remediation-manager-config
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: null
labels:
self-node-remediation-operator: ""
name: self-node-remediation-metrics-reader
rules:
- nonResourceURLs:
- /metrics
verbs:
- get
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: v1
kind: Service
metadata:
creationTimestamp: null
labels:
self-node-remediation-operator: ""
name: self-node-remediation-webhook-service
spec:
ports:
- port: 443
targetPort: 9443
selector:
control-plane: controller-manager
self-node-remediation-operator: ""
status:
loadBalancer: {}

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.12.0
creationTimestamp: null
labels:
self-node-remediation-operator: ""
name: selfnoderemediationconfigs.self-node-remediation.medik8s.io
spec:
group: self-node-remediation.medik8s.io
names:
kind: SelfNodeRemediationConfig
listKind: SelfNodeRemediationConfigList
plural: selfnoderemediationconfigs
shortNames:
- snrc
- snrconfig
singular: selfnoderemediationconfig
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: SelfNodeRemediationConfig is the Schema for the selfnoderemediationconfigs
API in which a user can configure the self node remediation agents
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: SelfNodeRemediationConfigSpec defines the desired state of
SelfNodeRemediationConfig
properties:
apiCheckInterval:
default: 15s
description: the frequency for api-server connectivity check Valid
time units are "ms", "s", "m", "h". the frequency for api-server
connectivity check
pattern: ^(0|([0-9]+(\.[0-9]+)?(ms|s|m|h)))$
type: string
apiServerTimeout:
default: 5s
description: Valid time units are "ms", "s", "m", "h". timeout for
each api-connectivity check
pattern: ^(0|([0-9]+(\.[0-9]+)?(ms|s|m|h)))$
type: string
customDsTolerations:
description: CustomDsTolerations allows to add custom tolerations
snr agents that are running on the ds in order to support remediation
for different types of nodes.
items:
description: The pod this Toleration is attached to tolerates any
taint that matches the triple <key,value,effect> using the matching
operator <operator>.
properties:
effect:
description: Effect indicates the taint effect to match. Empty
means match all taint effects. When specified, allowed values
are NoSchedule, PreferNoSchedule and NoExecute.
type: string
key:
description: Key is the taint key that the toleration applies
to. Empty means match all taint keys. If the key is empty,
operator must be Exists; this combination means to match all
values and all keys.
type: string
operator:
description: Operator represents a key's relationship to the
value. Valid operators are Exists and Equal. Defaults to Equal.
Exists is equivalent to wildcard for value, so that a pod
can tolerate all taints of a particular category.
type: string
tolerationSeconds:
description: TolerationSeconds represents the period of time
the toleration (which must be of effect NoExecute, otherwise
this field is ignored) tolerates the taint. By default, it
is not set, which means tolerate the taint forever (do not
evict). Zero and negative values will be treated as 0 (evict
immediately) by the system.
format: int64
type: integer
value:
description: Value is the taint value the toleration matches
to. If the operator is Exists, the value should be empty,
otherwise just a regular string.
type: string
type: object
type: array
endpointHealthCheckUrl:
description: EndpointHealthCheckUrl is an url that self node remediation
agents which run on control-plane node will try to access when they
can't contact their peers. This is a part of self diagnostics which
will decide whether the node should be remediated or not. It will
be ignored when empty (which is the default).
type: string
isSoftwareRebootEnabled:
default: true
description: IsSoftwareRebootEnabled indicates whether self node remediation
agent will do software reboot, if the watchdog device can not be
used or will use watchdog only, without a fallback to software reboot
type: boolean
maxApiErrorThreshold:
default: 3
description: after this threshold, the node will start contacting
its peers
minimum: 1
type: integer
peerApiServerTimeout:
default: 5s
description: Valid time units are "ms", "s", "m", "h".
pattern: ^(0|([0-9]+(\.[0-9]+)?(ms|s|m|h)))$
type: string
peerDialTimeout:
default: 5s
description: Valid time units are "ms", "s", "m", "h". timeout for
establishing connection to peer
pattern: ^(0|([0-9]+(\.[0-9]+)?(ms|s|m|h)))$
type: string
peerRequestTimeout:
default: 5s
description: Valid time units are "ms", "s", "m", "h". timeout for
each peer request
pattern: ^(0|([0-9]+(\.[0-9]+)?(ms|s|m|h)))$
type: string
peerUpdateInterval:
default: 15m
description: Valid time units are "ms", "s", "m", "h".
pattern: ^(0|([0-9]+(\.[0-9]+)?(ms|s|m|h)))$
type: string
safeTimeToAssumeNodeRebootedSeconds:
default: 180
description: SafeTimeToAssumeNodeRebootedSeconds is the time after
which the healthy self node remediation agents will assume the unhealthy
node has been rebooted, and it is safe to recover affected workloads.
This is extremely important as starting replacement Pods while they
are still running on the failed node will likely lead to data corruption
and violation of run-once semantics. In an effort to prevent this,
the operator ignores values lower than a minimum calculated from
the ApiCheckInterval, ApiServerTimeout, MaxApiErrorThreshold, PeerDialTimeout,
and PeerRequestTimeout fields.
minimum: 0
type: integer
watchdogFilePath:
default: /dev/watchdog
description: WatchdogFilePath is the watchdog file path that should
be available on each node, e.g. /dev/watchdog
type: string
type: object
status:
description: SelfNodeRemediationConfigStatus defines the observed state
of SelfNodeRemediationConfig
type: object
type: object
served: true
storage: true
subresources:
status: {}
status:
acceptedNames:
kind: ""
plural: ""
conditions: null
storedVersions: null
Loading

0 comments on commit f65b2ad

Please sign in to comment.