-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wasmplugin controller: new approach using Kuadrant Topology (DAG) #317
Conversation
dcade7a
to
b024fbf
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #317 +/- ##
==========================================
- Coverage 81.91% 80.81% -1.11%
==========================================
Files 54 57 +3
Lines 4452 4738 +286
==========================================
+ Hits 3647 3829 +182
- Misses 534 601 +67
- Partials 271 308 +37
Flags with carried forward coverage won't be shown. Click here to find out more.
|
b2734c0
to
2823c1b
Compare
2823c1b
to
010c402
Compare
Relates to Kuadrant/architecture#29 |
2844569
to
a95be73
Compare
a95be73
to
2f943d0
Compare
) | ||
|
||
const ( | ||
HTTPRouteParents = ".metadata.parent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe using a key name that makes it clear we only care about gateway parents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking something like .metadata.gateway
, because the filter function added by controllers.addHTTPRouteByGatewayIndexer
only cares about route parentRefs
that are of kind Gateway
. If one tries to use the filter with a Service
route parent, even if the route declares such parent, it won't return the route.
return wasmPlugin, nil | ||
} | ||
|
||
func (r *RateLimitingWASMPluginReconciler) gatewayAPITopologyFromGateway(ctx context.Context, gw *gatewayapiv1.Gateway) (*common.KuadrantTopology, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this func could be enhanced for more kinds of policies, maybe using generics. The fact that it fetches a RateLimitPolicyList
, which then it iterates through to cast each RateLimitPolicy
to KuadrantPolicy
, seems to be a minor detail. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Gui's comment, we could benefit from using generics and retrieving a PolicyList
. I understand we will be requiring it heavily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of the "Kuadrant Topology" does not use a k8s client to query Gateway API and kuadrant policies. Instead, fetching resources must be done externally and then the kuadrant topology accepts list of resources. It could be done otherwise and have generic function with a k8s client to build topology for a given policy of one kind (for example RLP or KAP). The main reason for the current implementation is to decouple kuadrant topology and the cluster. Meaning that I can build kuadrant topology from a set of resources filtered with some criteria. The usecase used here is that I want to build a kuadrant topology only for one gateway and all policies and routes involved. But only for one gateway.
Interesting. Worth exploring in future PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm all right with the KuadrantTopology
being just a data structure, decoupled from the mechanisms to communicate with the cluster (e.g. API client.)
I still don't see this function as belonging to the WasmPlugin reconciler nevertheless. Despite the RLP kind hard-coded in the core of the function's (current) implementation, it's surrounded by generic code and, in the end, returns something that is not RLP-specific. It feels like the specific policy kind should be a parameter of the function.
}, nil | ||
} | ||
|
||
func (r *RateLimitingWASMPluginReconciler) RouteFromRLP(ctx context.Context, t *common.KuadrantTopology, rlp *kuadrantv1beta2.RateLimitPolicy, gw *gatewayapiv1.Gateway) (*gatewayapiv1.HTTPRoute, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this func belong to the KuadrantTopology
type? I imagine that extracting the scope HTTPRoute for an AuthPolicy would look like pretty much the same, but I could be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method logic is simple: when the policy is targeting a (valid) route, return that route. If the policy is targeting a (valid) gateway, compute an "imaginary" route made up from all the "free" routes attached to the gateway.
I do not think that this logic belongs to the kuadrant topology type which is only, and nothing else than a representation or view of the cluster state. The logic above is the implementation of the current kuadrant's design of policies targeting a gateway and its semantics.
I agree, though, that this RouteFromRLP
method can be generalized to accomodate any kuadrant policy as long as kuadrant implements the same approach for any policy targeting a gateway.
Let me stick with this implementation for now and let's refactor and generalize when the authpolicy controller leverages the kuadrant policy here presented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue the function already is generic enough for all kinds of Kuadrant policies. I think it's literally only the signature that makes it RLP-specific now, no?
|
||
func (r *RateLimitingWASMPluginReconciler) WASMRateLimitPolicy(ctx context.Context, t *common.KuadrantTopology, rlp *kuadrantv1beta2.RateLimitPolicy, gw *gatewayapiv1.Gateway) (*wasm.RateLimitPolicy, error) { | ||
gwHostnamesTmp := common.TargetHostnames(gw) | ||
gwHostnames := common.Map(gwHostnamesTmp, func(str string) gatewayapiv1.Hostname { return gatewayapiv1.Hostname(str) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the reverse of
kuadrant-operator/pkg/common/common.go
Line 187 in d475858
func HostnamesToStrings(hostnames []gatewayapiv1.Hostname) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am aware there is a little mess around the type of the hostnames values. Some functions/APIs require string
while other require gatewayapiv1.Hostname
.
In this particular case, The gateway holds gatewayapiv1.Hostname
, but the method TargetHostnames
converts them to string
. The wasm plugin wants string
but before that we need to process them and filter valid subdomains and the method FilterValidSubdomains
requires gatewayapiv1.Hostname
.
Too many back and forth conversions and copies. Unacceptable for a processing in the data plane. I guess not a big issue in the control plane.
rules := rlptools.WasmRules(rlp, route) | ||
if len(rules) == 0 { | ||
// no need to add the policy if there are no rules; a rlp can return no rules if all its limits fail to match any route rule | ||
return nil, nil | ||
} | ||
|
||
// narrow the list of hostnames specified in the route so we don't generate wasm rules that only apply to other gateways | ||
// this is a no-op for the gateway rlp | ||
hostnames := common.FilterValidSubdomains(gwHostnames, route.Spec.Hostnames) | ||
if len(hostnames) == 0 { // it should only happen when the route specifies no hostnames | ||
hostnames = gwHostnames | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the wasm rules be computed after fixing the hostnames for rules that don't declare any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. When any route selector had not empty hostnames
fields, the routes without hostname were skipped. Added e2e test to cover that use case as well.
Sharing here a few thoughts to continue the conversation online:
|
if pluginConfig == nil || len(pluginConfig.RateLimitPolicies) == 0 { | ||
common.TagObjectToDelete(wasmPlugin) | ||
return wasmPlugin, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: nil
wasmplugin seems to be only returned when len(wasmPlugin.RateLimitPolicies) == 0
already at
kuadrant-operator/controllers/ratelimitpolicy_istio_wasmplugin.go
Lines 231 to 234 in e76cd28
// avoid building a wasm plugin config if there are no rules to apply | |
if len(wasmPlugin.RateLimitPolicies) == 0 { | |
return nil, nil | |
} |
so I think there's no need to check this again here 🤔
if pluginConfig == nil || len(pluginConfig.RateLimitPolicies) == 0 { | |
common.TagObjectToDelete(wasmPlugin) | |
return wasmPlugin, nil | |
} | |
if pluginConfig == nil { | |
common.TagObjectToDelete(wasmPlugin) | |
return wasmPlugin, nil | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment no longer applies to the current codebase in this PR. Let me know if it still does.
type policyNode struct { | ||
Policy KuadrantPolicy | ||
TargetedGateway *gatewayNode | ||
TargetedRoute *routeNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could abstract the TargetedGateway and TargetedRoute under an interface like TargetResource or NetworkResource, then implement the corresponding functions to return the actual type (gatewayNode, routeNode and future ones). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly using generics too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is internal implementation is going to change.
return client.ObjectKeyFromObject(g.Gateway) | ||
} | ||
|
||
type routeNode struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if worth it or not, leaving it to you to decide or a future improvement. Maybe we could have an interface networkNode and implement the different GW, routes, etc.. structs so the topology object could be "leaner" and not change much when we have to implement possible future nodes (grpcRoute, Service, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is internal implementation is going to change.
2f943d0
to
a733363
Compare
d8681ba
to
3ad9e4d
Compare
3ad9e4d
to
c05b0f4
Compare
Sure. That is the next logical step forward. The "Gateway API policy Topology". The topology only needs to know about the
I have been working on this. TL;DR is fortunately yes, the resources are cached across reconciliation events and, even more, across controllers using the same manager. But we must be careful, as this could not be true if we change things here.
To be clear, the topology is not cached, the resources used to build the topology (a set of indexes) are cached. The topology needs to be rebuild (the indexes are re-created) on each reconciliation event. I consider this procedure lightweight as long as the resources are cached and no I/O is involved. It's very clear from controller-runtime doc that for "get" and "list" methods the interface implementation uses a cache. This, however, for the default options when building the client. The kuadrant operator uses (so far) the default options when building the client, via the manager: https://github.com/Kuadrant/kuadrant-operator/blob/main/main.go#L127-L140 options := ctrl.Options{
Scheme: scheme,
Metrics: metricsserver.Options{BindAddress: metricsAddr},
WebhookServer: webhook.NewServer(webhook.Options{Port: 9443}),
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "f139389e.kuadrant.io",
}
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), options)
if err != nil {
setupLog.Error(err, "unable to start manager")
os.Exit(1)
} I have tested the cache using the k8s auditing policy together with I have updated the diff --git a/utils/kind-cluster.yaml b/utils/kind-cluster.yaml
index 4697238..4caa8f1 100644
--- a/utils/kind-cluster.yaml
+++ b/utils/kind-cluster.yaml
@@ -4,3 +4,28 @@ apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.27.3
+ kubeadmConfigPatches:
+ - |
+ kind: ClusterConfiguration
+ apiServer:
+ # enable auditing flags on the API server
+ extraArgs:
+ audit-log-path: /var/log/kubernetes/kube-apiserver-audit.log
+ audit-policy-file: /etc/kubernetes/policies/audit-policy.yaml
+ # mount new files / directories on the control plane
+ extraVolumes:
+ - name: audit-policies
+ hostPath: /etc/kubernetes/policies
+ mountPath: /etc/kubernetes/policies
+ readOnly: true
+ pathType: "DirectoryOrCreate"
+ - name: "audit-logs"
+ hostPath: "/var/log/kubernetes"
+ mountPath: "/var/log/kubernetes"
+ readOnly: false
+ pathType: DirectoryOrCreate
+ # mount the local file on the control plane
+ extraMounts:
+ - hostPath: utils/audit-policy.yaml
+ containerPath: /etc/kubernetes/policies/audit-policy.yaml
+ readOnly: true And then created the audit policy to log only metadata about request for HTTPRoutes, Gateways and kuadrant policies. I also added one more filter to be the user being the kuadrant operator service account. This way the kubectl and k9s requests are not being logged. cat <<EOF > utils/audit-policy.yaml
---
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
users:
- "system:serviceaccount:kuadrant-system:kuadrant-operator-controller-manager"
resources:
- group: "gateway.networking.k8s.io"
resources:
- httproutes
- gateways
- group: "kuadrant.io"
resources:
- ratelimitpolicies
- authpolicies
EOF Then, follow the guide for Simple Rate Limiting for Application Developers Read the logs from API Server
Read the logs from the operator
Let's update RLP deployed from the Simple Rate Limiting for Application Developers
The kuadrant operator logs show two reconciliation events handled by the wasm controller.
And the API server logs show no traffic 🎉 . After a while, the API server will get requests from the operator to renew the cache
Any event to any resource directly (gateway or httproute) or indirectly related (rlp) to a gateway, will trigger a reconciliation event for that gateway. The "desired" wasm plugin will be computed and compared with the existing one. If no changes are required, the wasm plugin will not be updated. |
rateLimitPoliciesSorted := rateLimitPolicies | ||
|
||
// Sort RLPs for consistent comparison with existing objects | ||
sort.Sort(common.PolicyByKey(rateLimitPoliciesSorted)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. It could be a breaking change (depending on the behaviour implemented by the wasm shim.)
We definitely need to order the policies in the config. That's for sure! However, in the previous implementation, the order was dictated by the how the policy refs were stored in the gateway annotation. This was closer to the order of creation of the policies than alphabetical order on the qualified name of the policy.
If, in the wasm shim, the order of the policies in the config affects (today or in the future) the behaviour in the data plane, then depending what name one chooses for one's RLP/RLP namespace, users may observe different behaviours. To avoid that, I would stick with the creation timestamp for sorting the policies instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wasm shim applies the policy with the longest match of the hostname.
Ok, After reading the Gateway API spec, it seems that in the event of multiple policies with exact hostnames,
* The oldest Route based on creation timestamp.
* The Route appearing first in alphabetical order by “{namespace}/{name}”.
I will follow the same approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, ready for review
Logger logr.Logger | ||
} | ||
|
||
func (m *HTTPRouteToParentGatewaysEventMapper) Map(_ context.Context, obj client.Object) []reconcile.Request { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for the type it returns ([]reconcile.Request
vs []string
), isn't this function practically the same as the one defined by controllers.addHTTPRouteByGatewayIndexer
for the list filtering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! I will refactor and avoid DRY
const GatewayProgrammedConditionType = "Programmed" | ||
const ( | ||
GatewayProgrammedConditionType = "Programmed" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linter? I actually prefer the one-line version. Am I the only the one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a matter of personal taste.
I would rather go for grouping several constant declarations, even if it's only one. If another constant needs to be added, the git diff is cleaner.
|
||
requests := make([]reconcile.Request, 0) | ||
|
||
for _, parentRef := range route.Spec.ParentRefs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same filter again? https://github.com/Kuadrant/kuadrant-operator/pull/317/files#r1502498973
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be refactored to avoid DRY
pkg/common/kuadrant_topology.go
Outdated
|
||
// freeRoutes is an index of gateways mapping to HTTPRoutes not targeted by a kuadrant policy | ||
// Gateway -> []HTTPRoute | ||
freeRoutes map[client.ObjectKey][]*gatewayapiv1.HTTPRoute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
freeRoutes map[client.ObjectKey][]*gatewayapiv1.HTTPRoute | |
untargetedRoutes map[client.ObjectKey][]*gatewayapiv1.HTTPRoute |
?
|
||
type routeNode struct { | ||
Parents map[client.ObjectKey]*gatewayNode | ||
DirectPolicy *policyNode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I have a better suggestion for it, but I can see the use of "Direct Policy" becoming confusing, especially in light of the new defs proposed by kubernetes-sigs/gateway-api#2813.
Technically, Kuadrant RLPs and APs are always "Inherited Policies" now, I reckon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reviewing the implementation of the topolocy and trying to run away from "kuadrant design decisions" to Gateway API topology with policies included, I think we do not need that "direct policy" field in either routeNode
and gatewayNode
.
I will give it a try and come back to this later.
7309f5c
to
88af9f2
Compare
728d7b5
to
c58450a
Compare
superseded by #447 |
What
Dedicated WASMPlugin controller
Description about the new approach using Kuadrant Topology.
Main object type
The type of object being reconciled is the
Gateway
. There is one-to-one mapping of gateway and wasmplugin. Another approach would be the type of object being reconciled being the WASMPlugin. Then it would be needed one way to get the gateway name from{name, namespace}
of the wasmplugin. The event mappers would be more complex. Worth exploring anyway.Watched resources and event mappers
The controller watches all resources that affect the configuration (spec) of the wasmplugin. I.e. HTTPRoutes, Gateways and RateLimitPolicies. On events of those resources, the event mapper implemented walks the Gateway API topology, via targetRef and parentRefs up to the peak of the topology, as far as kuadrant is concerned today, i.e. the Gateway. In other words, if you change a rate limit policy, it will generate events on all gateways. If you change a httproute, it will generate events on all gateways involved. Interesting to note that if you change the parentRef of a HTTProute, it will generate events on the gateways that were targeted and also on the new gateways currently being targeted. That allows doing cleaning for those gateways where a RLP is no longer affecting.
The Kuadrant Topology
On gateway reconciliation event, the controller builds what it is named "The Kuadrant Topology". It reads all HTTPRoutes targeting the given gateway as well as ALL rate limit policies in the cluster. With all that info, it builds a set of indexes that can be queried to provide information to build the wasmplugin. Interesting to note here that a field indexer is being added to the manager to index all httproute parenting a gateway. Basically you can use this manager index (not the kuadrant topology index), to query "give me all the HTTPRoutes parenting Gateway A".
The implementation assumes "cheap" operations of reading ALL the ratelimitpolicies and all routes parenting a given gateway. That assumption comes from the controller runtime k8s client that holds a cache. At least for
list
andget
operations. That cache is being kept up to date in the background when there are updates/deletes in the objects triggered by kuadrant controllers or any other third party controllers or any other k8s API server user.TODO: Integration tests:
ratelimitpolicy_controller_test.go
torate_limiting_wasmplugin_controller_test.go