Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add contest exporter #142

Merged
merged 8 commits into from
Feb 10, 2023
Merged

Add contest exporter #142

merged 8 commits into from
Feb 10, 2023

Conversation

Holmes98
Copy link
Member

No description provided.

Copy link
Contributor

@tom93 tom93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a couple of proposed changes (completely untested), let me kow what you think:

  • add-contest-exporter-v3a-small-fixes-wip @ c11d30e [multiple commits]
  • add-contest-exporter-v3b-refactor-get-submissions-wip @ d51c5df
  • add-contest-exporter-v3c-efficient-submissions-export-wip @ 88786ea

What are the use cases for the export feature? The format is very rough:

  • The JSON files are a raw database dump using the internal representation, which is not user-friendly and seems inconvenient when using from a script. Would it be worthwhile to design a better format?
  • The submissions are currently in a single flat directory (with the submission ID as the filename), which is not user-friendly. It would be much easier to browse if they were organised into subdirectories by user and problem.

Would it be useful to create separate download links for just scoreboard.csv (and unofficial_scoreboard.csv)? (Most of the files in the zip file are machine-readable and not user friendly, while the scoreboard CSV is pretty much the opposite; so to me it makes sense to separate them.)

I'm curious about the commit 1d4e205 "Disallow organisers from exporting contests", what's the reasoning behind that?

Comment on lines 99 to 102
def get_submissions(user_id, problem_id)
relation = self.contest_relations.where(:user_id => user_id).first
Submission.where(:user_id => user_id, :problem_id => problem_id, :created_at => relation.started_at..relation.finish_at)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this fails if the user is not in the contest (relation will be nil so relation.started_at will fail).
I'm not sure if that's good or bad; I'm inclined to leave it as is (even though it's slightly inconsistent with the other methods, which return nil).

Comment on lines +18 to +26
contest.scoreboard.each do |record|
is_ranked = !contest.only_rank_official_contestants || (record.school && record.school_year)
next unless is_ranked || include_unranked_contestants

if is_ranked
num_ranked += 1
rank = num_ranked unless previous_record && previous_record.score == record.score && previous_record.time_taken == record.time_taken
previous_record = record
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This complex logic is the same as in app/views/contests/scoreboard.html.erb, it should really be refactored.
But it can wait for separate PR in the interest of moving forward with other things.

Comment on lines +19 to +26
<% if policy(Contest).export? %>
<br>
<b> Export Contest</b>
<%= form_tag( export_contest_path(@contest), :method => :get) do %>
<%= submit_tag "Download" %>
<% end %>
<% end %>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer for the export UI and the existing admin UI to move into "info" tab (#123 (comment)), I'll make a separate PR.

Comment on lines +59 to +61
contest.scoreboard.each do |record|
next if record.user.nil?
contest.problem_set.problems.each do |prob|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to compute the scoreboard for getting the users and submissions, we can just use
contest.contestant_records.pluck(:user_id).compact
(though actually that might incur an extra DB query if the query from the scoreboard method is cached... need to investigate).
Ideally we'd save the scoreboard in local variable regardless (used twice above).

Comment on lines 62 to 63
submissions += contest.get_submissions(record.user.id, prob.id)
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The underlying query in get_submissions supports an array of problems/IDs, so we can use one query per user:

submissions += contest.get_submissions(record.user.id, contest.problem_set.problems)

(This is a little sneaky. We could rename the parameter problem_id of get_submissions to problem_ids or problems to document this.)

@Holmes98
Copy link
Member Author

Holmes98 commented Feb 8, 2023

Thanks for reviewing. The proposed changes in add-contest-exporter-v3c-efficient-submissions-export-wip all look reasonable to me; I can try testing them in the weekend.

This feature is pretty much only being used for NZIC, as far as I'm aware. The JSON files are intended to provide a dump of all the contest metadata that could be used for various scripts (e.g. to generate statistics, detect cheating), but they do include some sensitive data (full names, email addresses) which organisers don't have access to, which was the reason for 1d4e205.

We already have a few scripts that work with the current format so I'd prefer to leave them as is, but I wouldn't mind if we changed the directory structure for submissions to /{user_id}/{problem_id}/{submission_id}.{extension} or something similar. I also wouldn't be opposed to having extra download links for just the scoreboards, but it wouldn't make a difference for NZIC since we'd always do a full export.

The directories directly under app/ are root directories and their
names aren't part of the namespace. So app/exporters/base_exporter.rb
should implement BaseExporter, not Exporters::BaseExporters.

See:
https://guides.rubyonrails.org/autoloading_and_reloading_constants.html#introduction
https://guides.rubyonrails.org/autoloading_and_reloading_constants.html#config-autoload-paths
Whe selecting the submissions made during the contest, change the time
interval from `..` (inclusive-inclusive) to `...`
(inclusive-exclusive) for consistency with the existing logic in other
places (e.g. app/models/contest_score.rb has a very similar query).
…contest exporter

They might be different if the user changes school after the contest.
The previous approach loaded the source code of all the submissions
into memory at once, which may be problematic.
Copy link
Contributor

@tom93 tom93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested my proposed changes, fixed a typo (5be2eb0), then rebased and pushed to this PR's branch.

I'm approving this version, and will use separate PRs for the bigger changes.

I propose to squash-and-merge with the following commit message:

Add contest exporter (#141)
Co-authored-by: Tom Levy <[email protected]>

The "deployed" branch currently has several commits that are not in "master". They consist of the first commit from this PR, and several temporary commits for email workarounds that have been reverted.

The server is at the "deployed" branch but also an uncommitted patch to fix a divide-by-zero in the progress bars; I created #160 to mirror that patch (+ small refactoring).

So if we merge both PRs, we can safely hard-reset the "deployed" branch (and the server) to "master" without losing anything. The only changes that will be introduced are small fixes and refactoring.


We already have a few scripts that work with the current format so I'd prefer to leave them as is

The biggest issue I have with the current representation is how the individual scores are stored (details below). I'd very much like to change that (in a separate PR). The scripts will need to be tweaked a little, I hope that's not a problem; I'd be happy to help.

Currently the ContestScore fields (which are per-problem-per-user) are embedded in the parent ContestRelation object (which is per-user), with the problem ID appended to each field name. Example (pseudo-JSON, with two problems whose IDs are 111 and 222):

scoreboard: [
    {
        user_id, score, time_taken, school_id,
        score_111, attempt_111, attempts_111, sub_111,
        score_222, attempt_222, attempts_222, sub_222
    },
    {
        user_id, ...
    }
]

I think it would be much nicer (both for the app's code and for the scripts) if the ContestScore objects were stored in an array, in the same order as the problems:

scoreboard: [
    {
        user_id, score, time_taken, school_id,
        scores: [
            { score, attempt, attempts, sub },
            { score, attempt, attempts, sub }
        ]
    },
    {
        user_id, ...
    }
]

Such an array can be produced using the following snippet:

contest_relation.contest_scores.index_by(&:problem_id).values_at(*problem_ids)

Another option is to store them in a hash, with the problem ID as the key. This corresponds very closely to the old representation.

Yet another option (which I really don't like) is to use an unordered array of ContestScore objects. I don't like it because it is awkward to use from scripts, but it is worth mentioning because it directly corresponds to the existing contest_scores association.

@Holmes98
Copy link
Member Author

The biggest issue I have with the current representation is how the individual scores are stored (details below). I'd very much like to change that (in a separate PR).

Oh sure, feel free to make changes to anything under the contest JSON. I thought it could be useful at some point but we're currently only using the submissions JSON.

@Holmes98 Holmes98 merged commit c741e3e into master Feb 10, 2023
@Holmes98 Holmes98 deleted the add-contest-exporter-v2 branch February 10, 2023 06:04
@tom93
Copy link
Contributor

tom93 commented Feb 10, 2023

Shall I deploy? (Hard-reset etc.)

@Holmes98
Copy link
Member Author

Sure

@tom93
Copy link
Contributor

tom93 commented Feb 10, 2023

Done!

bagedevimo pushed a commit to bagedevimo/nztrain that referenced this pull request Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants