-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transition GCP config-connector to terraform - cleanup cloudResources
#1046
Comments
Creating the scratch bucket is something the config connector should be able to do (on GKE), but I never got it to work #669 We should scope another project, similar to the CI/CD one, that refactors/overhauls our terraform infra to make it consistent. |
It feels like there are two questions to answer here:
For 1, I am not sure what the answer is, but my instinct is that we should document it as a feature and let people ask for it explicitly so we aren't adding unnecessary complexity that may not be used. For 2, I agree that setting up a broader project around standardizing our terraform is a great idea. @sgibson91 do you want to create an issue to start writing down a list of things that we can shape into a broader project? |
I will put it on my to-do list and get to it when I can |
sounds good - I don't think there's a strong hurry there, just want to make sure that we don't forget about it. For this issue, it then sounds like we can close it simply by documenting in |
I would bet this is the underlying issue!
IIUC, we are currently creating that bucket with terraform, ie: https://github.com/2i2c-org/infrastructure/blob/master/terraform/gcp/projects/pangeo-hubs.tfvars#L16-L18 |
I'm not sure there is any documentation yet. I caught this in the LEAP config though. #1074 (comment) |
The GKE config connector was helpful in letting us deploy Google Cloud Service Accounts with permissions for cloud storage directly just from helm. However, it has been difficult to debug, and in 2i2c-org#669 we decided to move away from it and towards creating these cloud resources via Terraform. This commit adds: - Terraform code that will create a Google Service Account, bind it to a given Kubernetes Service Account, for a list of hub namespaces passed in. This means that some hub initial deployments now *can not be done just with CD*, but need manual work with terraform. I think this would be any hub that wants to use requestor pays or scratch buckets. This would need to be documented. - Move meom-ige to use this new scheme. metadata concealment (https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment) which is what we were using earlier as alternative to config-connector + workload identity, is no longer supported by the terraform google provider. In b7b42ce, we changed the default from 'SECURE' to 'UNSPECIFIED', but it looks like 'UNSPECIFIED' really means 'use workload identity' haha. When 2i2c-org#1124 was deployed to meom-ige yesterday, it seems to have enabled workload identity, causing cloud access to stop working, leading to https://2i2c.freshdesk.com/a/tickets/107. Further investigation on what happened here is needed, but I've currently fixed it by just deploying this change for meom-ige. - All hubs are given access to all buckets we create. This is inadequete, and needs to be more fine grained. Ref 2i2c-org#669 Ref 2i2c-org#1046
#1130 is relevant |
The GKE config connector was helpful in letting us deploy Google Cloud Service Accounts with permissions for cloud storage directly just from helm. However, it has been difficult to debug, and in 2i2c-org#669 we decided to move away from it and towards creating these cloud resources via Terraform. This commit adds: - Terraform code that will create a Google Service Account, bind it to a given Kubernetes Service Account, for a list of hub namespaces passed in. This means that some hub initial deployments now *can not be done just with CD*, but need manual work with terraform. I think this would be any hub that wants to use requestor pays or scratch buckets. This would need to be documented. - Move meom-ige to use this new scheme. metadata concealment (https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment) which is what we were using earlier as alternative to config-connector + workload identity, is no longer supported by the terraform google provider. In b7b42ce, we changed the default from 'SECURE' to 'UNSPECIFIED', but it looks like 'UNSPECIFIED' really means 'use workload identity' haha. When 2i2c-org#1124 was deployed to meom-ige yesterday, it seems to have enabled workload identity, causing cloud access to stop working, leading to https://2i2c.freshdesk.com/a/tickets/107. Further investigation on what happened here is needed, but I've currently fixed it by just deploying this change for meom-ige. - All hubs are given access to all buckets we create. This is inadequete, and needs to be more fine grained. Ref 2i2c-org#669 Ref 2i2c-org#1046
We need to basically rip out the config-connector stuff completely, as we've moved all scratch bucket stuff to terraform. Unassigning myself as I'm not currently working on it. |
This comment was marked as duplicate.
This comment was marked as duplicate.
daskhub
didn't enable scratchBuckets by default, now silently ignored faulty config has been removedcloudResources
feature for GCP
cloudResources
feature for GCPcloudResources
Updated issue
The "config connector" system enabled GKE clusters to define resources of custom types representing GCP resources, where GCP then would create these for us. In practice it enabled our chart's to emit k8s resources that led to the creation of GCP resources outside the k8s cluster, such as a GCP service account and a GCP storage bucket.
We have now transitioned to create such service accounts and buckets via terraform instead, but some clusters still rely on the config connector system. If we can transition to using terraform entirely, we would avoid the complexity of retaining config connector related config that is only relevant to some clusters, and we would avoid the quite significant CPU/Memory overhead requested inside the k8s cluster by the config connector system.
Action points
From Yuvi's comment #1046 (comment)
helm-charts/
values.yaml
to support thisOriginal issue
In #1045 I removed this section.
infrastructure/helm-charts/daskhub/values.yaml
Lines 1 to 6 in 689db10
It wasn't functional and instead silently ignored. It should be located under
basehub.jupyterhub.custom.cloudResources.scratchBucket.enabled
to be functional and not ignored.In #1045 I opted to not change any behavior but instead remove the misconfiguration that was silently ignored and open this issue to raise awareness about the situation: daskhub won't create a scratch bucket by default etc, and you must enable this for each hub that wants it.
I lean towards thinking that any custom resources that one may want, should be explicit configuration for a cluster's hubs rather than default configuration. If there is agreement about that, there may be no action point at all but to close this issue.
Action points
Outlined by Yuvi in #1046 (comment)
The text was updated successfully, but these errors were encountered: