Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User PII dictionary table #529

Merged
merged 10 commits into from
Dec 11, 2023
Merged

User PII dictionary table #529

merged 10 commits into from
Dec 11, 2023

Conversation

pomegranited
Copy link
Contributor

@pomegranited pomegranited commented Nov 29, 2023

Adds event_sink.user_pii dictionary table created by joining external_id and user_profile tables, using the most recent user_profile row. This dictionary is cached in memory, refreshing every ASPECTS_PII_CACHE_LIFETIME seconds.

Also partitions the user_profile (and external_id) tables to prepare for user retirement deletions.

Note: A user may have more than one external_user_id, and so event_sink.user_pii may show more than one row per user.

Related information

Addresses #449
Depends on openedx-unsupported/openedx-event-sink-clickhouse#63
Based off #521 which adds the event_sink.external_id table.

Testing instructions

  1. Enable Aspects PII data sinks:
    tutor config save -s ASPECTS_ENABLE_PII=True
  2. Shorten the lifetime of the user_pii dict to 10s to make testing data changes easier:
    tutor config save -s ASPECTS_PII_CACHE_LIFETIME=10
  3. Apply migrations from this PR:
    tutor dev do alembic -c "upgrade head"
  4. Login to Superset as a superuser, and navigate to the SQL Lab
  5. Check that the user_pii table is empty:
    select * from event_sink.user_pii
  6. Create a new user:
    tutor dev do createuser any [email protected]
  7. Re-check the user_pii table -- should have an entry for the new user
  8. Login to the LMS as the new user and change some of the Account details
  9. Re-check the user_pii table -- should show the updated data for the user within 10s.
  10. Enrol in a course (this should create an external_id row).
  11. Re-check the user_pii table -- should show the updated external_user_id for the user within 10s.
  12. Check migration rollback works:
    tutor dev do alembic -c "downgrade -3"

To do before merge:

Ian2012 and others added 3 commits November 17, 2023 15:11
fix: add missing external_id to list of models

fix: add username for event sink

chore: reformat for quality checks
@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Nov 29, 2023
@openedx-webhooks
Copy link

Thanks for the pull request, @pomegranited! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@itsjeyd itsjeyd added the core contributor PR author is a Core Contributor (who may or may not have write access to this repo). label Nov 29, 2023
@Ian2012 Ian2012 force-pushed the cag/create_external_id_table branch from bcd308f to 3c50bd7 Compare November 29, 2023 16:11
Base automatically changed from cag/create_external_id_table to main November 29, 2023 17:21
@Ian2012 Ian2012 changed the base branch from main to bmtcril/bump_version November 29, 2023 17:24
@Ian2012 Ian2012 changed the base branch from bmtcril/bump_version to main November 29, 2023 17:24
Copy link
Contributor

@Ian2012 Ian2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve, but I think @bmtcril would have an opinion on this one

@pomegranited pomegranited changed the title User PII table User PII dictionary table Nov 30, 2023
run off an in-memory dictionary joining the PII tables.
password '{{ CLICKHOUSE_ADMIN_PASSWORD }}'
db '{{ ASPECTS_EVENT_SINK_DATABASE }}'
query "
with most_recent_user_profile as (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will fan out, I don't see any limiting that only pulls the most recent profile or most recent external id in either of these queries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @bmtcril -- I lost this clause somewhere along the line, but I've re-added it, see this line and this one.

I'm only pulling the most recent user_profile row, and all the external_id rows, since a single user may have more than one external ID (depending on the external_id_type, or if external IDs get regenerated with a new secret key).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I didn't know external_id's could be changed. I'm reasonably sure they are currently immutable (or at least nothing updates them) 1:1 with user id / type. If nothing else we want to limit the external_id_type to xapi, but if there's a place where secret key is involved can you point it out? That would complicate several things in unexpected ways. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I didn't know external_id's could be changed. I'm reasonably sure they are currently immutable (or at least nothing updates them) 1:1 with user id / type.

Ah, my apologies.. I was thinking of the old anonymous user ids, which were hashed on the Django app SECRET_KEY. External IDs are immutable, as you expected.

If nothing else we want to limit the external_id_type to xapi

Ok that makes sense. I wasn't sure if there were any other types of external_ids used in our events. Will update the query that generates the dict/table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see 57a4f56.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I also added a dataset for the new user_pii table with 9ceb57b.

so that user retirements can delete user PII records.
Copy link
Contributor

@Ian2012 Ian2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase on main?

@pomegranited
Copy link
Contributor Author

FYI @Ian2012 I've rebased , but I want to address this comment before this gets merged.

so the event sink plugin knows which PII tables are in use.
or NULL if the user is not yet assigned an external_id.
sourced from the event_sink.user_pii table.
Copy link
Contributor

@bmtcril bmtcril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, looks good to me! 🎉

@Ian2012 Ian2012 merged commit 29566cf into main Dec 11, 2023
5 checks passed
@Ian2012 Ian2012 deleted the jill/user_pii branch December 11, 2023 22:04
@openedx-webhooks
Copy link

@pomegranited 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core contributor PR author is a Core Contributor (who may or may not have write access to this repo). open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants