Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: report dependency not installed to kuadrant & policy status #994

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

KevFan
Copy link
Contributor

@KevFan KevFan commented Nov 8, 2024

Description

Closes: #730

Report in the Kuadrant and Policy status when required dependencies are not installed

  • Kuadrant CR
    • Gateway API not installed
    • Gateway Provider (istio / envoy gateway) not intalled
    • Limitador Operator not installed
    • Authorino Operator not installed
  • AuthPolicy CR
    • Gateway API not installed
    • Gateway Provider (istio / envoy gateway) not intalled
    • Authorino Operator not installed
  • RateLimitPolicy CR
    • Gateway API not installed
    • Gateway Provider (istio / envoy gateway) not intalled
    • Limitador Operator not installed
  • TLSPolicy CR (does not require a Gateway Provider)
    • Gateway API not installed
    • CertManager not installed
  • DNSPolicy CR (does not require a Gateway Provider)
    • Gateway API not installed
    • DNS Operator not installed

Verification

  • Gateway API and Gateway API provider status are tested by integration tests, so passing integration tests should be enough

To verify other status:

  • Create cluster
make local-env-setup
  • Delete CRDs to simulate not installed
# DNS Operator
kubectl delete crd dnsrecords.kuadrant.io   
# Authorino Operator                                                                    
kubectl delete crd authconfigs.authorino.kuadrant.io
kubectl delete crd authorinos.operator.authorino.kuadrant.io
# Limitador Operator
kubectl delete crd limitadors.limitador.kuadrant.io
# Cert Manager
kubectl delete crd certificates.cert-manager.io
  • Run operator
make run
  • Apply CRs
kubectl apply -f config/samples/kuadrant_v1_authpolicy.yaml                                      
kubectl apply -f config/samples/kuadrant_v1_dnspolicy.yaml
kubectl apply -f config/samples/kuadrant_v1_ratelimitpolicy.yaml
kubectl apply -f config/samples/kuadrant_v1_tlspolicy.yaml
kubectl apply -f config/samples/kuadrant_v1beta1_kuadrant.yaml
  • Check status
kubectl get kuadrant kuadrant-sample -o yaml | yq '.status'
kubectl get authpolicy authpolicy-sample -o yaml | yq '.status'
kubectl get ratelimitpolicy ratelimitpolicy-sample -o yaml | yq '.status'
kubectl get dnspolicy dnspolicy-sample -o yaml | yq '.status'
kubectl get tlspolicy tlspolicy-sample -o yaml | yq '.status'

image

@KevFan KevFan force-pushed the issues/730 branch 7 times, most recently from d01d7fc to 90b4385 Compare November 15, 2024 09:18
@KevFan KevFan self-assigned this Nov 15, 2024
@KevFan KevFan added kind/enhancement New feature or request size/medium labels Nov 15, 2024
@KevFan KevFan changed the title Issues/730 enhancement: report dependency not installed to kuadrant & policy status Nov 15, 2024
@KevFan KevFan marked this pull request as ready for review November 15, 2024 15:37
@KevFan KevFan force-pushed the issues/730 branch 2 times, most recently from ae8b800 to 997a078 Compare December 2, 2024 12:52
@KevFan KevFan requested a review from a team December 2, 2024 15:50
@Boomatang
Copy link
Contributor

I think there is some improvement needed here. Firstly I rebased this to current main, and everything worked as expected.

The simpler issue is around the message in the status block Limitador Operator is not installed, please restart pod once dependency has been installed. When I first seen this message I ask what pod do I restart. I think we need to be clear that it is the kuadrant operator that the user should restart. This might not be as important for the kuadrant CR directly, but on the policy CR's I think it is need.

The second issue is are clarity, but raises its own issue. Doing the example above the kuadrant CR states that Limitador Operator is missing, but all the resources are missing. The user would fix one issue, restart the operator only to find the next issue. Where they would have to repeat the process over and over. This is not great. I think we should be telling the user all the dependencies that are missing at the same time.

The third point and I don't think it is an issue but more of an opportunity. When I tried the example give, at first I didn't make every CR type, I only made the one for auth. As a user the only on thing I am currently using kuadrant for is auth. Why do I care if limitador and DNS operators are not working. Wouldn't it be nice if kuadrant only told you had a dependency issue if you apply resources to the cluster that depends on that dependency. I also think this could help us on the part of solving this issue: #71

While I am at it I think we might need more log level than info & error for production. I think there needs to be a warning level.

{"level":"info","ts":"2025-01-08T14:21:52Z","logger":"kuadrant-operator","msg":"envoygateway is not installed, skipping related watches and reconcilers","err":null}

The above message is from the logs in pr, and I think it should be a warning over info level log. This possible is out side the scope of this PR, but it is a good example of what would be a warning.

One last point, which is splitting hairs, technically the Limitador Operator is not installed is wrong. It is the limitador CRD that is not installed. Do we want to get that fine grained, I don't think so.

@KevFan
Copy link
Contributor Author

KevFan commented Jan 9, 2025

The simpler issue is around the message in the status block Limitador Operator is not installed, please restart pod once dependency has been installed. When I first seen this message I ask what pod do I restart. I think we need to be clear that it is the kuadrant operator that the user should restart. This might not be as important for the kuadrant CR directly, but on the policy CR's I think it is need.

I think this is a good shout. Are we thinking of just calling it "Kuadrant Operator pod"? 🤔

The second issue is are clarity, but raises its own issue. Doing the example above the kuadrant CR states that Limitador Operator is missing, but all the resources are missing. The user would fix one issue, restart the operator only to find the next issue. Where they would have to repeat the process over and over. This is not great. I think we should be telling the user all the dependencies that are missing at the same time.

Yeah, this makes sense 👍

The third point and I don't think it is an issue but more of an opportunity. When I tried the example give, at first I didn't make every CR type, I only made the one for auth. As a user the only on thing I am currently using kuadrant for is auth. Why do I care if limitador and DNS operators are not working. Wouldn't it be nice if kuadrant only told you had a dependency issue if you apply resources to the cluster that depends on that dependency. I also think this could help us on the part of solving this issue: #71

This one, I'm not sure I understand the scenario you are saying. If you are doing Auth only, this missing dependency will only be reported in Kuadrant and AuthPolicy CR status. If you didn't apply Kuadrant CR, then it will be reported in AuthPolicy CR status only. Kuadrant CR cares about limitador and authorino since it creates an instance of those CRs. The current changes should already provide what you are saying - CRs that kuadrant provide should only report missing dependencies that each CR depend on - unless I'm missing something and you are saying this is not the case 🤔

While I am at it I think we might need more log level than info & error for production. I think there needs to be a warning level.

{"level":"info","ts":"2025-01-08T14:21:52Z","logger":"kuadrant-operator","msg":"envoygateway is not installed, skipping related watches and reconcilers","err":null}

The above message is from the logs in pr, and I think it should be a warning over info level log. This possible is out side the scope of this PR, but it is a good example of what would be a warning.

This would be outside of scope for sure. I don't believe the logger that we use even has a warning log type currently.

One last point, which is splitting hairs, technically the Limitador Operator is not installed is wrong. It is the limitador CRD that is not installed. Do we want to get that fine grained, I don't think so.

This is true, I don't think we want to be that fine grained. Might be able to improve this seperately, but outside of scope for this PR I would say. Kuadrant depends more on the CRD but would also have issues if the Operator is not running. Since we generally report status based on the status of the dependant CRs, I think we indicate something is unexepcted from the environment as our CR status will never go into a Ready/Enforced state 🤔 I can rename if that is our general preference everywhere. Will be a seperate issue/PR though.

@Boomatang
Copy link
Contributor

The third point and I don't think it is an issue but more of an opportunity. When I tried the example give, at first I didn't make every CR type, I only made the one for auth. As a user the only on thing I am currently using kuadrant for is auth. Why do I care if limitador and DNS operators are not working. Wouldn't it be nice if kuadrant only told you had a dependency issue if you apply resources to the cluster that depends on that dependency. I also think this could help us on the part of solving this issue: #71

This one, I'm not sure I understand the scenario you are saying. If you are doing Auth only, this missing dependency will only be reported in Kuadrant and AuthPolicy CR status. If you didn't apply Kuadrant CR, then it will be reported in AuthPolicy CR status only. Kuadrant CR cares about limitador and authorino since it creates an instance of those CRs. The current changes should already provide what you are saying - CRs that kuadrant provide should only report missing dependencies that each CR depend on - unless I'm missing something and you are saying this is not the case 🤔

Okay what i was meaning is the kuadrant CR only gave an error related to Limitador and no error related to authorino. Authorino was what I was trying to use. The issue really is the kuadrant CR status will only list one error at a time. I the case that I looked at the user would have seen the limitador error fixed that error, and restarted kuadrant to only find out there was more errors. If there is more than one error these errors need to be list at the same time.

@KevFan KevFan requested a review from Boomatang January 15, 2025 13:01
Copy link
Contributor

@Boomatang Boomatang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one nit pick around the exit logging statement and few questions around styling that you might change.

Every thing is working as expected, and I think the messaging is much better now.

controllers/state_of_the_world.go Show resolved Hide resolved
controllers/state_of_the_world.go Show resolved Hide resolved
},
Postcondition: finalStepsWorkflow(b.client, b.isIstioInstalled, b.isGatewayAPIInstalled).Run,
Postcondition: finalStepsWorkflow(b.client, b.isGatewayAPIInstalled, b.isIstioInstalled, b.isEnvoyGatewayInstalled).Run,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a future design issue here, the order of the boolean values mean something. I am wonder is a list of options a better thing here. Not asking for this to be changed, more asking what do you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed, I don't quite like this. I had an initial thought of potentially separating the dependencies flags in BootOptionsBuilder into it's own struct:

type BootOptionsBuilder struct {
	logger  logr.Logger
	manager ctrlruntime.Manager
	client  *dynamic.DynamicClient
	dependencies *Dependencies 
}

type Dependencies struct {
	// Internal configurations
	isGatewayAPIInstalled        bool
	isEnvoyGatewayInstalled      bool
	isIstioInstalled             bool
	isCertManagerInstalled       bool
	isConsolePluginInstalled     bool
	isDNSOperatorInstalled       bool
	isLimitadorOperatorInstalled bool
	isAuthorinoOperatorInstalled bool
}

And potentially just pass this into the subsequent reconcilers that require this information. Granted just doing this, would leak unnecessary info to respective reconcilers (e.g. auth would not care about limitador for reporting).

What's your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this, and I am wondering what map[string]bool would be like. It would remove the leaking of unnecessary information to the respective reconcilers. How they are created that I am not sure.

@@ -40,9 +40,9 @@ var (
//+kubebuilder:rbac:groups="cert-manager.io",resources=clusterissuers,verbs=get;list;watch;
//+kubebuilder:rbac:groups="cert-manager.io",resources=certificates,verbs=get;list;watch;create;update;patch;delete

func NewTLSWorkflow(client *dynamic.DynamicClient, scheme *runtime.Scheme, isCertManagerInstalled bool) *controller.Workflow {
func NewTLSWorkflow(client *dynamic.DynamicClient, scheme *runtime.Scheme, isGatewayAPIInstalled, isCertManagerInstalled bool) *controller.Workflow {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to expand on the comment about the options object, this is an example of where I think we could be running to an issue. There is a lot of passing around of flags, and I think when we start to expand the support, this gets messy. Again not asking for a change. Looking for your feedback.

controllers/tlspolicies_validator.go Outdated Show resolved Hide resolved
Comment on lines +24 to +36
func IsAuthorinoOperatorInstalled(restMapper meta.RESTMapper, logger logr.Logger) (bool, error) {
if ok, err := utils.IsCRDInstalled(restMapper, kuadrantv1beta1.AuthorinoGroupKind.Group, kuadrantv1beta1.AuthorinoGroupKind.Kind, authorinooperatorv1beta1.GroupVersion.Version); !ok || err != nil {
logger.V(1).Error(err, "Authorino Operator CRD was not installed", "group", kuadrantv1beta1.AuthorinoGroupKind.Group, "kind", kuadrantv1beta1.AuthorinoGroupKind.Kind, "version", authorinooperatorv1beta1.GroupVersion.Version)
return false, err
}

if ok, err := utils.IsCRDInstalled(restMapper, AuthConfigGroupKind.Group, AuthConfigGroupKind.Kind, authorinov1beta3.GroupVersion.Version); !ok || err != nil {
logger.V(1).Error(err, "Authorino Operator CRD was not installed", "group", AuthConfigGroupKind.Group, "kind", AuthConfigGroupKind.Kind, "version", authorinov1beta3.GroupVersion.Version)
return false, err
}

return true, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the same issue of only reporting one missing CRD, does that matter in this case? It is seeming like we don't use the error in reporting in CRs, only a log message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, still probably better to still log all the missing CRDs but yes, we do not use/report the error in the CR status

Copy link
Contributor

@Boomatang Boomatang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with how this is working now. There was that on debug log that we talked about. If you want to change that here, I am happy to look at it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request size/medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[state-of-the-world] common policy status controller
2 participants