Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc documentation updates I'd like to make #2041

Closed
consideRatio opened this issue Jan 11, 2023 · 4 comments
Closed

Misc documentation updates I'd like to make #2041

consideRatio opened this issue Jan 11, 2023 · 4 comments
Assignees
Labels
Documentation A change to our documentation. nominated-to-be-resolved-during-q4-2023 Nomination to be resolved during q4 goal of reducing the technical debt

Comments

@consideRatio
Copy link
Contributor

consideRatio commented Jan 11, 2023

What remains to be resolved

9

In this aws step about granting eksctl access to other users.

I wonder if we should even do that if we can use the deployer script to get credentials though, hmm. Also, I have now not created additional users for that account.

UPDATE: Yes, because we need such credentials when we do operations like adding/removing node pools as well. Why one may ask, but the answer is because for example kubectl drain is used which is a k8s api interaction. The action point is to make it clear why we add this permission.

14

Its unclear to me in the Enable authentication section if we are supposed to add 2i2c members to be able to authenticate, and if so, with what identity provider (GitHub team? Google email accounts from 2i2c.org?).

23

I saw no mention of cleaning up scratch buckets, but I think we should consider that as well in the decomissioning process.

24

This section didn't link to how to create an incident response issue. I asked myself, where? In 2i2c-org/infrastructure?

https://team-compass.2i2c.org/en/latest/projects/managed-hubs/incidents.html#key-terms

25

I'm not sure if /pd trigger works, or in what channel, or similar. I never managed to see a popup like described in https://team-compass.2i2c.org/en/latest/projects/managed-hubs/incidents.html#incident-response-process.

32

In this comment, in step 1, I ask the community reps to help authorize the github oauth application to receive organizational membership info from users instead of asking to become an owner and do it for them. With it, I provided a screenshot example.

#2323 (comment)

33

Setting up a new GCP Project with the existing billing account should make it clear that only new billing accounts as compared to new gcp projects already linked to the 2i2c billing account need to configure cost exports.

https://infrastructure.2i2c.org/hub-deployment-guide/cloud-accounts/new-gcp-project/#create-a-new-gcp-project

34

There should be a final step linking to setting up quotas in https://infrastructure.2i2c.org/hub-deployment-guide/cloud-accounts/new-gcp-project

35

The docs for creating a new gcp project doesn't mention the ability to generate that and cluster config etc via deployer generate-gcp-cluster

https://infrastructure.2i2c.org/hub-deployment-guide/cloud-accounts/new-gcp-project

36

When generating a new GCP cluster:

  • the template .tfvars could benefit an update to not have as much notes about what the default is to etc.
  • the cluster.yaml has a note about "we default to regional cluster", but the zone field is populated and its confusing
  • the fixed k8s versions in the .tfvars isn't set, which could be fine, but we should perhaps pre-populate with comments or similar to be updated
  • There are places where $CLUSTER env could be used in copy past scripts where the literal CLUSTER is used instead

37

The gcp cluster variable prefix is used to generate resources, and if its more than 20 letters, the resource names generated, such as <prefix>-cluster-sa become longer than accepted. We could have validation about this to avoid it.

Is it okay to make catalystproject-latam become latam for example?

I think so, because the following resources seemed to include it in its name

+ cluster      = "this-is-prefix-cluster"
+ account_id   = "this-is-prefix-cluster-sa"
+ account_id   = "this-is-prefix-cd-sa"
+ name         = "this-is-prefix-homedirs"

38

Wrong directory mentioned in leading comment at https://infrastructure.2i2c.org/hub-deployment-guide/new-cluster/new-cluster/#exporting-and-encrypting-the-cluster-access-credentials. Its really about making sure that the deployer gets credentials to the cluster in a file enc-deployer-credentials.secret.json put in config/clusters/cluster_name.

39

From https://infrastructure.2i2c.org/hub-deployment-guide/hubs/new-hub/

is a Zero to JupyterHub configuration

Should be a "Helm chart configuration"

40

Your notebook server is a linux “virtual machine” with its own filesystem. You are not on a shared server; you are on your own private server.

Actually user servers are "containers" running in isolation from each others, but possibly on the same physical machine.

https://docs.2i2c.org/user/topics/data/filesystem/

41

Mention https://cloud.google.com/logging/docs/view/query-library is a good reference for queries in GCP

42

In GPU setup, mention a check for GPU availability in zones

aws ec2 describe-instance-type-offerings --location-type availability-zone --filters="Name=instance-type,Values=g4dn.xlarge" --region us-east-1 --output table

-------------------------------------------------------
|            DescribeInstanceTypeOfferings            |
+-----------------------------------------------------+
||               InstanceTypeOfferings               ||
|+--------------+--------------+---------------------+|
|| InstanceType |  Location    |    LocationType     ||
|+--------------+--------------+---------------------+|
||  g4dn.xlarge |  us-east-1f  |  availability-zone  ||
||  g4dn.xlarge |  us-east-1d  |  availability-zone  ||
||  g4dn.xlarge |  us-east-1b  |  availability-zone  ||
||  g4dn.xlarge |  us-east-1c  |  availability-zone  ||
||  g4dn.xlarge |  us-east-1a  |  availability-zone  ||
|+--------------+--------------+---------------------+|
@consideRatio
Copy link
Contributor Author

consideRatio commented Jan 23, 2023

Stuff resolved or tracked elsewhere

1 - resolved

# documented like this
python3 deployer generate-cluster <cluster-name> aws

# in practice done like this
pip install -e .

deployer generate-aws-cluster --cluster-name=ubc-eoas --hub-type=basehub --cluster-region=ca-central-1

2 - resolved

When having created a .jsonnet file, the zones I got didn't match available availability zones. 1a, 1b, 1c was generated, but only 1a, 1b, 1d existed.

4 - resolved by #2056

Document experiences in upgrading beyond 1.22. I wrote about this in https://2i2c.slack.com/archives/CKJS000F4/p1671374097438499.

// Warning: version 1.23 introduces some breaking changes
// Checkout the docs before upgrading
// ref: https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi-migration-faq.html
version: '1.22'

5 - resolved

jsonnet is a tool to install but I wasn't asked to install it at any point in time as a pre-requisite to deploying a new hub.

6 - resolved

There’s no requirement to commit the *.eksctl.yaml file to the repository since we can regenerate it using the above jsonnet command.

We .gitignore them, so one cant.

7 - resolved

Following a terraform step i got myself a .terraform.lock.hcl file that I didn't understand what it was about in terraform/aws.

We shouldn't version control the .terraform.lock.hcl file, right? See Terraform's documentation about the Dependency Lock File.

Action: add to .gitignore

8 - resolved

I've seen python deployer use-cluster-credentials and python3 deployer and just deployer mentioned. We should stick with assuming it's already pip install -e . I think

10 - resolved

There is a link that is outdated and points to the wrong lines of relevance:

https://github.com/2i2c-org/infrastructure/tree/HEAD/.github/workflows/deploy-hubs.yaml#L31-L36

In practice, we seem to need to add things to the output list of the upgrade-support-and-staging job in deploy-hubs.yaml and in the matrix for the deploy_grafana_dashboards job in deploy-grafana-dashboards.yaml

11 - resolved

Once the deploy chart was deployed and you were able to log into grafana as the admin user, you can generate an API key.

"deploy chart" should be "support chart"

12 - resolved

the following sops-ecrypted file config/clusters//enc-grafana-token.secret.yaml, with a content similar to:

The subsequent indentation is off.

13 - resolved

Typo in:

for a hub are simmilar with the ones for

15 - resolved by #2045

Its unclear to me in CILogon authentication if I should use CILogon directly or involve auth0 somehow.

16 - resolved

Its unclear to me how to configure shown_idps for CILogon. It seems that the listed shown_idps are not always the EntityID name listed in https://cilogon.org/idplist/.

From discussion and trial, it seems that shown_idp should reference the EntityId as listed via https://cilogon.org/idplist/ and exactly that.

17 - tracked in 2i2c-org/default-hub-homepage#19

This page seems outdated, referencing https://2i2c.org/pilot and getting redirected to https://docs.2i2c.org/

image

18 - tracked in 2i2c-org/default-hub-homepage#18

logout button issue in default login page template

19 - previously resolved

The central grafana URL is not clear where it is at this point.

20 - resolved

specific GitHub organization you wish to allow login.

Followed by incorrect indentation

Decomissioning

21 - resolved

This section didn't provide a link to the list of github applications and had incorrect formatting.

https://infrastructure.2i2c.org/en/latest/hub-deployment-guide/hubs/other-hub-ops/delete-hub.html#github-oauth-application

22 - resolved

Mention that deleting data can take a long time in https://infrastructure.2i2c.org/en/latest/hub-deployment-guide/hubs/other-hub-ops/delete-hub.html#delete-data.

24 - resolved

In https://infrastructure.2i2c.org/en/latest/hub-deployment-guide/hubs/other-hub-ops/delete-hub.html#remove-the-hub-values-file

Steps 3 and 4 can be actioned while this PR is reviewed and merged.

"After" merged

26 - resolved

Typo at https://infrastructure.2i2c.org/en/latest/howto/grafana-github-auth.html (For example, ghttps://grafana.pilot.2i2c.cloud)

27 - resolved

I had to create a directory for my new cluster in config/clusters before letting terraform write a file to that location.

28 - resolved

I saw this, but CLUSTER_NAME wasn't prefixed with $

deployer deploy-support CLUSTER_NAME

29 - resolved

When configuring domain names via namecheap, it would be good to link directly to where this is done: https://ap.www.namecheap.com/Domains/DomainControlPanel/2i2c.cloud/advancedns instead of namecheap.com.

See https://infrastructure.2i2c.org/en/latest/hub-deployment-guide/deploy-support/configure-support.html#setting-dns-records

30 - resolved

This command should be without create as a standalone arg

deployer cilogon-client-create create 2i2c dask-staging daskhub dask-staging.2i2c.cloud

31 - resolved

Update AWS account creation docs to suggest use of email sub-addressing, like support+aws-<account name>@2i2c.org instead of creating new emails.

Related to https://github.com/2i2c-org/meta/issues/535

@consideRatio
Copy link
Contributor Author

3 - resolved by #2082

Verification that this generated .jsonnet logic is correct and makes sense.

This looks incorrect but I'm not sure. Or maybe this generates correctly even if basehub != daskhub, giving daskNodes a null value?

image

@damianavila damianavila moved this from Todo 👍 to In Progress ⚡ in Sprint Board Jan 26, 2023
@damianavila damianavila moved this from Needs Shaping / Refinement to In progress in DEPRECATED Engineering and Product Backlog Jan 26, 2023
@damianavila damianavila moved this from In progress to Waiting in DEPRECATED Engineering and Product Backlog Jul 5, 2023
@damianavila
Copy link
Contributor

Most of the things, if not all, are resolved or tracked somewhere else.
Closing now, re-open if you disagree.

@consideRatio
Copy link
Contributor Author

The top post is updated as I've removed things from it as I've resolved them, I can have this closed or manage this privately as it doesn't really merit attention from others in a way though.

I'll open it for now.

@consideRatio consideRatio reopened this Aug 28, 2023
@consideRatio consideRatio moved this from Waiting to Ready to work in DEPRECATED Engineering and Product Backlog Oct 11, 2023
@consideRatio consideRatio added the nominated-to-be-resolved-during-q4-2023 Nomination to be resolved during q4 goal of reducing the technical debt label Oct 11, 2023
@consideRatio consideRatio changed the title Draft notes on documentation updates I'd like to make Misc documentation updates I'd like to make Oct 11, 2023
@consideRatio consideRatio added Documentation A change to our documentation. and removed Engineering:SRE Cloud infrastructure operations and development. labels Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation A change to our documentation. nominated-to-be-resolved-during-q4-2023 Nomination to be resolved during q4 goal of reducing the technical debt
Projects
No open projects
Status: Ready to work
Development

No branches or pull requests

2 participants