-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add contest exporter #142
Add contest exporter #142
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a couple of proposed changes (completely untested), let me kow what you think:
- add-contest-exporter-v3a-small-fixes-wip @ c11d30e [multiple commits]
- add-contest-exporter-v3b-refactor-get-submissions-wip @ d51c5df
- add-contest-exporter-v3c-efficient-submissions-export-wip @ 88786ea
What are the use cases for the export feature? The format is very rough:
- The JSON files are a raw database dump using the internal representation, which is not user-friendly and seems inconvenient when using from a script. Would it be worthwhile to design a better format?
- The submissions are currently in a single flat directory (with the submission ID as the filename), which is not user-friendly. It would be much easier to browse if they were organised into subdirectories by user and problem.
Would it be useful to create separate download links for just scoreboard.csv (and unofficial_scoreboard.csv)? (Most of the files in the zip file are machine-readable and not user friendly, while the scoreboard CSV is pretty much the opposite; so to me it makes sense to separate them.)
I'm curious about the commit 1d4e205 "Disallow organisers from exporting contests", what's the reasoning behind that?
app/models/contest.rb
Outdated
def get_submissions(user_id, problem_id) | ||
relation = self.contest_relations.where(:user_id => user_id).first | ||
Submission.where(:user_id => user_id, :problem_id => problem_id, :created_at => relation.started_at..relation.finish_at) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this fails if the user is not in the contest (relation
will be nil so relation.started_at
will fail).
I'm not sure if that's good or bad; I'm inclined to leave it as is (even though it's slightly inconsistent with the other methods, which return nil).
contest.scoreboard.each do |record| | ||
is_ranked = !contest.only_rank_official_contestants || (record.school && record.school_year) | ||
next unless is_ranked || include_unranked_contestants | ||
|
||
if is_ranked | ||
num_ranked += 1 | ||
rank = num_ranked unless previous_record && previous_record.score == record.score && previous_record.time_taken == record.time_taken | ||
previous_record = record | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This complex logic is the same as in app/views/contests/scoreboard.html.erb, it should really be refactored.
But it can wait for separate PR in the interest of moving forward with other things.
<% if policy(Contest).export? %> | ||
<br> | ||
<b> Export Contest</b> | ||
<%= form_tag( export_contest_path(@contest), :method => :get) do %> | ||
<%= submit_tag "Download" %> | ||
<% end %> | ||
<% end %> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer for the export UI and the existing admin UI to move into "info" tab (#123 (comment)), I'll make a separate PR.
contest.scoreboard.each do |record| | ||
next if record.user.nil? | ||
contest.problem_set.problems.each do |prob| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to compute the scoreboard for getting the users and submissions, we can just use
contest.contestant_records.pluck(:user_id).compact
(though actually that might incur an extra DB query if the query from the scoreboard
method is cached... need to investigate).
Ideally we'd save the scoreboard in local variable regardless (used twice above).
submissions += contest.get_submissions(record.user.id, prob.id) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The underlying query in get_submissions
supports an array of problems/IDs, so we can use one query per user:
submissions += contest.get_submissions(record.user.id, contest.problem_set.problems)
(This is a little sneaky. We could rename the parameter problem_id
of get_submissions
to problem_ids
or problems
to document this.)
Thanks for reviewing. The proposed changes in add-contest-exporter-v3c-efficient-submissions-export-wip all look reasonable to me; I can try testing them in the weekend. This feature is pretty much only being used for NZIC, as far as I'm aware. The JSON files are intended to provide a dump of all the contest metadata that could be used for various scripts (e.g. to generate statistics, detect cheating), but they do include some sensitive data (full names, email addresses) which organisers don't have access to, which was the reason for 1d4e205. We already have a few scripts that work with the current format so I'd prefer to leave them as is, but I wouldn't mind if we changed the directory structure for submissions to |
The directories directly under app/ are root directories and their names aren't part of the namespace. So app/exporters/base_exporter.rb should implement BaseExporter, not Exporters::BaseExporters. See: https://guides.rubyonrails.org/autoloading_and_reloading_constants.html#introduction https://guides.rubyonrails.org/autoloading_and_reloading_constants.html#config-autoload-paths
Whe selecting the submissions made during the contest, change the time interval from `..` (inclusive-inclusive) to `...` (inclusive-exclusive) for consistency with the existing logic in other places (e.g. app/models/contest_score.rb has a very similar query).
…contest exporter They might be different if the user changes school after the contest.
The previous approach loaded the source code of all the submissions into memory at once, which may be problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested my proposed changes, fixed a typo (5be2eb0), then rebased and pushed to this PR's branch.
I'm approving this version, and will use separate PRs for the bigger changes.
I propose to squash-and-merge with the following commit message:
Add contest exporter (#141)
Co-authored-by: Tom Levy <[email protected]>
The "deployed" branch currently has several commits that are not in "master". They consist of the first commit from this PR, and several temporary commits for email workarounds that have been reverted.
The server is at the "deployed" branch but also an uncommitted patch to fix a divide-by-zero in the progress bars; I created #160 to mirror that patch (+ small refactoring).
So if we merge both PRs, we can safely hard-reset the "deployed" branch (and the server) to "master" without losing anything. The only changes that will be introduced are small fixes and refactoring.
We already have a few scripts that work with the current format so I'd prefer to leave them as is
The biggest issue I have with the current representation is how the individual scores are stored (details below). I'd very much like to change that (in a separate PR). The scripts will need to be tweaked a little, I hope that's not a problem; I'd be happy to help.
Currently the ContestScore fields (which are per-problem-per-user) are embedded in the parent ContestRelation object (which is per-user), with the problem ID appended to each field name. Example (pseudo-JSON, with two problems whose IDs are 111 and 222):
scoreboard: [
{
user_id, score, time_taken, school_id,
score_111, attempt_111, attempts_111, sub_111,
score_222, attempt_222, attempts_222, sub_222
},
{
user_id, ...
}
]
I think it would be much nicer (both for the app's code and for the scripts) if the ContestScore objects were stored in an array, in the same order as the problems:
scoreboard: [
{
user_id, score, time_taken, school_id,
scores: [
{ score, attempt, attempts, sub },
{ score, attempt, attempts, sub }
]
},
{
user_id, ...
}
]
Such an array can be produced using the following snippet:
contest_relation.contest_scores.index_by(&:problem_id).values_at(*problem_ids)
Another option is to store them in a hash, with the problem ID as the key. This corresponds very closely to the old representation.
Yet another option (which I really don't like) is to use an unordered array of ContestScore objects. I don't like it because it is awkward to use from scripts, but it is worth mentioning because it directly corresponds to the existing contest_scores
association.
Oh sure, feel free to make changes to anything under the contest JSON. I thought it could be useful at some point but we're currently only using the submissions JSON. |
Shall I deploy? (Hard-reset etc.) |
Sure |
Done! |
Co-authored-by: Tom Levy <[email protected]>
No description provided.