Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply spawner overrides from group properties. #4715

Open
wants to merge 1 commit into
base: staging
Choose a base branch
from

Conversation

ryanlovett
Copy link
Collaborator

An admin must set group properties of the form kubespawner_override.{property}, such as kubespawner_override.mem_limit. This can be performed in /hub/admin > Manage Groups.

It will be important to make sure course staff have sufficient privileges to do what they need (stop/start servers, view server lists, etc.) while at the same time ensuring the cluster does not scale more than it needs to. In other words, we may want to create a role for course staff with fewer privileges than admin.

An admin must set group properties of the form kubespawner_override.{property}, such as kubespawner_override.mem_limit. This can be performed in /hub/admin > Manage Groups.

It will be important to make sure course staff have sufficient privileges to do what they need (stop/start servers, view server lists, etc.) while at the same time ensuring the cluster does not scale more than it needs to. In other words, we may want to create a role for course staff with fewer privileges than `admin`.
@ryanlovett
Copy link
Collaborator Author

I tested this on a11y-staging where you can see that for the datahub tech dev's bcourse group course::1524699::group::dev-admins, there are limits for memory:
Screenshot 2023-07-07 at 12 48 27 PM

This would work right now if I merged. One concern is that course staff who happen to have admin privileges could navigate to a group properties page and change limits, if they knew how. So this can be merged now and we'd be protected by security through obscurity, or we can merge after we assign course staff some role other than admin. We would need to figure out exactly what jupyterhub privileges course staff actually need though, create that role, then change the spawner to apply that role to the people we choose.

Hubs that assign full admin rights to people other than datahub tech staff are biology, datahub, dlab, eecs, ischool, prob140, stat159, stat20.

@shaneknapp
Copy link
Contributor

this is great, thanks @ryanlovett ! i also agree that we'll need another level of admin-like granularity to manage access to stuff like this.

let's chat about this next week! should be pretty quick to come up w/a name and unravel perms (i hope). :)

@ryanlovett
Copy link
Collaborator Author

ryanlovett commented Jul 8, 2023

@balajialg shared in slack:

Responding what GSIs do with the admin access - I had detailed user interviews with Data8, Data 100, Public Health, Ischool, and Political Science GSIs sometime back to ask about their instructional workflow. I asked about how they use the admin privileges for the hubs - almost all of them who had admin access replied that they use it for debugging student servers related to issues in their code.

I reviewed the responses. Course staff said they used admin access for:

  1. Viewing the admin panel
  2. Restarting or stopping student servers
  3. Looking at student notebooks
  4. Duplicate and make edits to student notebooks
  5. Fix issues with nbgitpuller (presumably, access student servers to move aside git repos)

Interestingly, one course staff requested:

Better hierarchy of users - Either an admin or not an admin
~Admin at Datahub - Go into anyones datahub and see what they are working on
But problem is admins can see other admins stuff easily
Lot of students don’t know that this is possible
The admin role is too powerful. Access server and we have data
Top of the staff has admin access (People don’t know that this exist)
Have more roles with a different sets of privileges

So it makes sense to set up something like the following: role for staff for a course with a given bcourse ID:

 {
    "name": f"course-staff-{bcourse_id}",
    "scopes": [
        "admin-ui",
        f"list:users!group=course::{bcourse_id}",
        f"admin:servers!group=course::{bcourse_id}",
        f"access:servers!group=course::{bcourse_id}",
    ],
    "groups": [f"course::{bcourse_id}::enrollment_type::teacher", f"course::{bcourse_id}::enrollment_type::tas"]
}

See available scopes, which interestingly contains nearly this exact example for some random course named "data8". (go figure; authored by @minrk!)

The role above gives course staff the ability to access other course staff servers in the same way they do student servers. We could limit it to student servers only, but I feel like course staff don't necessarily want to hide things from other course staff, and may in fact benefit from being able to test things, e.g. starting another course staff's server.

So the idea would be that for each course that opts into elevated privileges, we'd add them to a config list that would then be passed in c.JupyterHub.load_roles. (as we do for stat159)

@ryanlovett
Copy link
Collaborator Author

If we wanted to automatically create course staff roles for every course, we would need to invoke c.JupyterHub.load_roles as above.

If we only want to create such roles for specific courses that ask for "admin privileges", we can use the z2jh hub configuration for this. For each course we would manually add:

# `123456` is an imaginary canvas course ID
hub:
  loadRoles:
    course-staff-123456:
      description: Enable course staff to view and access servers.
      # this role provides permissions to...
      scopes:
        - admin-ui
        - list:users!group=course::123456
        - admin:servers!group=course::123456
        - access:servers!group=course::123456
      # this role will be assigned to...
      groups:
        - course::123456::enrollment_type::teacher
        - course::123456::enrollment_type::tas

This configuration would obviate the need for jupyterhub.custom.group_profiles profiles that only set admin: true, and also restricts course staff from being able to increase resources without tech team oversight. In the short term I think this would be sufficient.

In the long term, I think it'd be better to augment hub/values.yaml so that it dynamically loads roles for all courses on the hub. This would eliminate the need for course staff to ask for it, and is a more natural fit for instructional software where course staff automatically have the privileges they need.

@shaneknapp
Copy link
Contributor

This configuration would obviate the need for jupyterhub.custom.group_profiles profiles that only set admin: true, and also restricts course staff from being able to increase resources without tech team oversight. In the short term I think this would be sufficient.

In the long term, I think it'd be better to augment hub/values.yaml so that it dynamically loads roles for all courses on the hub. This would eliminate the need for course staff to ask for it, and is a more natural fit for instructional software where course staff automatically have the privileges they need.

this is GREAT! thanks @ryanlovett for writing this up.

i honestly believe that we should get this sorted for a course or two asap, and then once we're happy w/how it all works move the config to hub/values.yaml sooner rather than later...

@ryanlovett
Copy link
Collaborator Author

@shaneknapp I'll swap out the admin: true group profiles for data8 and data100 for this new approach as they're the only two using that config. I'm going to first test with a temporary group I'll create in the tech infra bcourse site.

@ryanlovett
Copy link
Collaborator Author

I tested loadRoles and it worked just fine. I'll convert the two courses to this in a PR.

@balajialg
Copy link
Contributor

@ryanlovett Just read the scopes document in detail. This is awesome hacking and excited to see the custom scopes live in Data 8 & 100 setting.

I am assuming that course staff will still interpret the customized scopes as admin access and as a result, our instructor-facing documentation + issue template will not undergo any change (except for the previous template change to update bcourses is?).

@ryanlovett
Copy link
Collaborator Author

@balajialg Course staff should still be able to do what they have been doing, i.e. starting, stopping, accessing servers, etc. Their scope will no longer be what jupyterhub considers "admin" which is everything, but they should be able to function the same way. If not, we should identify what scopes they need and add them.

At some point we should adjust the documentation to be more precise. For example we should change the guide so that it uses "Course Staff Privileges" instead of "Hub Admin Privileges", and replace the word admin where appropriate. We would also be able to remove the caveat on that page about shared admin access on the same hub since the new roles will be safer.

Similarly we should eventually change the wording on the issue template to be "Course Staff Access Request" instead of "Admin Access Request".

If we ultimately go with the more dynamic configuration, where we automatically give course staff expanded roles, then we can eliminate the issue template, and change the docs so that it just mentions how to make use of the access rather than also include how to acquire the privileges.

@balajialg
Copy link
Contributor

balajialg commented Jul 11, 2023

@ryanlovett I am aligned with replacing the word "admin" with "course staff" related verbiage in the curriculum guide and issue template. I will do that once our first few requests are resolved successfully.

Re: Dynamic configuration - Sharing couple of hypotheses in my mind,

  • We will not be receiving admin access requests from courses like Data 8 and Data 100 going forward unless we clear the configs at the end of the semester. That way, every request we serve currently will be the last time serving those requests. (Assuming we don't clear configs at the end of every semester)
  • If we have the bcourses id of all the courses that use a particular hub then we can provide admin access by default to instructional staff from all these courses. Can we eliminate the request to receive admin access through this approach?

@ryanlovett
Copy link
Collaborator Author

We will not be receiving admin access requests from courses like Data 8 and Data 100 going forward unless we clear the configs at the end of the semester. That way, every request we serve currently will be the last time serving those requests. (Assuming we don't clear configs at the end of every semester)

I can confirm that the static configuration stanzas only serve that semester's course, so next semester's Data 8/100 won't have elevated privs automatically. (bcourse IDs are only use during a single term)

We don't make use of the bcourse course end date, so if someone was a member of a course, the hub still considers them to be a member. (just like how bcourses shows you all of your expired courses) This means that if you were once course staff for Data 100, and several semesters on if that static configuration remains, you will still have elevated privileges.

At the end of every semester, we should remove the loadRoles configs that grant elevated privileges. Alternatively, we could augment the CanvasOAuthenticator to pay attention to the end data of the course, and then not elevate privs for course staff for courses that have expired. This is not a great option because sometimes courses are still active for a little while after the term.

We also need to reset the jupyterhub.sqlite files at the end of each term, since role and scope data is kept there after it has been loaded once.

If we have the bcourses id of all the courses that use a particular hub then we can provide admin access by default to instructional staff from all these courses. Can we eliminate the request to receive admin access through this approach?

Yes, if we knew the bcourse IDs in advance, we could elevate privileges for course staff and eliminate admin requests.

@balajialg
Copy link
Contributor

balajialg commented Jul 12, 2023

Thanks for the clarity about the validity of bcourses id and the complication with identifying the end date of the course. Claring sqlite db at the end of semester makes sense.

I am curious if I can write a python program that queries bcourses APIs to fetch the bcourses id for all the courses listed in this sheet? I remember doing that sometime ago but not able to exactly identify the context

Edit: I did the reverse in this notebook/ Queried the name of the course from the bcourses API by sharing the bcourses id.

@ryanlovett
Copy link
Collaborator Author

There is an API that lets you search for all courses, https://canvas.instructure.com/doc/api/search.html#method.search.all_courses. If that output included course names, you'd be able to match it with courses in your spreadsheet and then extract the IDs.

@shaneknapp
Copy link
Contributor

pinging @ryanlovett -- i recall that we spoke about this at some length last summer and decided that 'security through obscurity' for some decidedly powerful abilities wasn't something that we wanted to deploy.

if so, we can probably close this. if not, we could potentially discuss more.

@ryanlovett
Copy link
Collaborator Author

@shaneknapp There used to be a security through obscurity issue, however we no longer give admin access to course staff.

So this can be merged now and we'd be protected by security through obscurity, or we can merge after we assign course staff some role other than admin.

We do now assign course staff a role other than full admin.

Changes made through the jupyterhub UI can't be easily communicated back to the deployment -- they would only exist in the jupyterhub ORM. Should the database file get deleted, any changes made from the web UI and not also set in our deployment configuration would be lost. This is probably not a big deal though. The databases are not intentionally deleted until between terms, and even then it has not been completely regular. I think a database was only ever accidentally corrupted once by someone who was editing it with sqlite.

Lastly, and perhaps more importantly, there isn't a great story around oversight. While it'd be great to give course staff access to temporarily bump resources, reducing the involvement of DataHub staff and the time it takes for changes to go through CI, course staff would be able to change singleuser params like memory allocations without fully understanding costs issues, node resources, etc. There is no feature which could help us set memory ranges or limit time periods, or just straight up communicates the costs involved. Perhaps if/when the DataHub financial model changes this could be re-evaluated.

All that being said, if there was a singleuser server param where it'd be convenient for course staff to be able to change directly that would not also impact compute/financial resources (like env vars for gh scoped creds 😄 ), we could enable the feature but filter for the safe params.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants