-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance problem deleting users from moodle system #509
base: main
Are you sure you want to change the base?
Performance problem deleting users from moodle system #509
Conversation
It was observed, that deleting users took a very long time. The observed time ranges were 5-14 hours. The problem was brought to our attention because it caused other adhoc task to pile up behind it. Checking the mysql processlist consistently showed the same queries and these could be tracked down into the mod_studentquiz dataprivacy integration. Running the SQLs manually against our staging system with a mostly idleing MariaDB showed the same behavior (the manual tests were interrupted after 1/2 hour). To give an insight into the order of magnitude of our DB, here are the numbers for the affected tables: mdl_context 1726200 mdl_studentquiz 132 mdl_question_categories 89541 mdl_question_bank_entries 690319 mdl_question_versions 748974 mdl_question 748974 mdl_question_references 168766 mdl_studentquiz_question 1227 mdl_studentquiz_rate 16953 mdl_studentquiz_comment 931 mdl_studentquiz_progress 19742 mdl_studentquiz_attempt 6468 mdl_studentquiz_comment_history 971 mdl_studentquiz_notification 335 mdl_studentquiz_state_history 2391 Looking at the SQL queries and the postprocessing code, it was observed that there are left joins, that could directly be eliminated because they can't yield usable results anyway. These leafs are: - commenthistory - progress - attempt - notificationjoin - statehistory Instead of first doing the join and then filter for the targetuser via where this change moves filtering for the targetuser into the join.
Thanks for creating the pull request. Could you please help us with more details on how much improvement did you see after these changes in the query and also, would like to confirm if you have verified the results of query before and after the changes were done. This will help build confidence that the changes aren't inadvertently affecting the results. Thanks, |
@AnupamaSarjoshi thank you for looking into this. Indeed I ran a test on our stage system and verified, that the problem is present there. I build a reproducer and got this (sorry for the german output!) for the unmodified run (get all data for a freshly created user):
and this for the modified case (date is before the base run, as I used the already patched stage system to gather it):
As you can see in the failing case the MariaDB Server cuts the connection after about 2 hours with the collection query for mod_studentquiz as the last statement executed. In contrast to that the second run shows, that the query is fully processed. 80s is still a long time to process a single user, but at least manageable. My reasoning why I modified the queries as I did: in both cases ( I pulled an explain from the database and got this: Before:
After:
What I notice:
|
@AnupamaSarjoshi did the reply answer your questions? |
@matthiasblaesing for me, there is one bit that is still not obvious. Does the query, after your changes, return identical data to the original query in all situations? You say: "I only modified LEFT JOIN entries. Even before my change these left joins were filtered afterwards by the WHERE condition and in the case of export_user_data by PHP. The change moves the filtering from WHERE/PHP into the join condition." But, it is not that simple. You did not move the filtering. You added filtering to the LEFT JOINs, wihtout removing it from the WHERE. And, there WHERE involves a combination of AND and OR. So, as I say, it is not clear that the SQL is equivalent. I am not saying that it is definitely different, just that it is not clear it is the same - and we need to be sure the query results have not change before it is safe to merge them. And, possibly we could work this out for ourselves by staring at the SQL for a few hours, but we have not had time to do that yet. (I do think that your approach to fixing this, by making the LEFT JOINs more precise looks like the right thing to do - if it works in all cases.) |
@timhunt I'm not to interested, whether the SQLs return the same set of rows, I'm interest in the question whether they return the relevant data. Lets start with
In the SQL only the context ID is queried. The leaf "LEFT JOINS" can in this case only act as filters. The "information for the specified user" part is now crucial. The target of the leaf join is only relevant when it satisfies that requirement, which is coded as the filter over the userid. For
The general idea is the same as for the contexts. The difference is, that in this case we might be interested in the data from the join. In this case the post processing becomes relevant. When you look at the new lines 376-449 you notice the pattern, that each block for Example for the comments: moodle-mod_studentquiz/classes/privacy/provider.php Lines 392 to 401 in 395a189
The first filter in line 393 is only relevant if no comment can be matched, missing data can't be exportet, so lets ignore that. The core is the second condition. Only comments matching the target user id will be exportet. That is the reason, that I think modifying the "LEFT JOIN"-On Condition to directly filter on the userid is safe. Before this PR we would have seen rows, that might hold comments from other users, the PHP condition would have filtered this data out. In theory the second condition could be removed, as filtering is now done on the SQL level, but I think it is better to keep the PHP code consistent. What is more the change should be kept minimally invasive. Does that make sense? |
FWIW: We decided to ship the patch in production. We decided, that we have to be able to delete users and accept data requests and would not be possible with the current runtimes. |
Anything more I can do to help? |
It was observed, that deleting users took a very long time. The observed time ranges were 5-14 hours. The problem was brought to our attention because it caused other adhoc task to pile up behind it.
Checking the mysql processlist consistently showed the same queries and these could be tracked down into the mod_studentquiz dataprivacy integration.
Running the SQLs manually against our staging system with a mostly idleing MariaDB showed the same behavior (the manual tests were interrupted after 1/2 hour).
To give an insight into the order of magnitude of our DB, here are the numbers for the affected tables:
Looking at the SQL queries and the postprocessing code, it was observed that there are left joins, that could directly be eliminated because they can't yield usable results anyway.
These leafs are:
Instead of first doing the join and then filter for the targetuser via where this change moves filtering for the targetuser into the join.