Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1529990 - Fix Mobile bustage #313

Merged
merged 2 commits into from
Mar 5, 2019

Conversation

JohanLorenzo
Copy link
Contributor

And add more checks to detect PRs

@JohanLorenzo JohanLorenzo marked this pull request as ready for review February 25, 2019 13:36
@JohanLorenzo JohanLorenzo reopened this Feb 25, 2019
@coveralls
Copy link

coveralls commented Feb 25, 2019

Coverage Status

Coverage remained the same at 100.0% when pulling df97b06 on JohanLorenzo:more-pr-checks into dc088ba on mozilla-releng:master.

@JohanLorenzo JohanLorenzo changed the title [WIP] Fix Mobile bustage Bug 1529990 - Fix Mobile bustage Feb 25, 2019

"""
if isinstance(url, str):
return url.startswith('https://github.com/')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we also need to support the [email protected] ssh urls. This should work if we don't. I'm fine with addressing this in a followup if we need to support ssh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of not, I'm not aware of the usage of [email protected] nor ssh:// within chain of trust. Let's add it whenever the need comes up, I believe

to keep this.

This checks for the following things::

* ``task.extra.env.tasks_for`` == "github-pull-request"
* ``task.payload.env.MOBILE_HEAD_REPOSITORY`` doesn't come from an official repo
* ``task.metadata.source`` doesn't come from an official repo, either
* The last 2 items are landed on the official repo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good check. A thought: we may want to look into restricting by branch as well (must have landed on official release branches on the official repo). I'm aware that may be difficult -- we may have to restrict it to the last N commits on a branch. If this is doable and wanted, this is probably a followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This implementation paves the way to #273. We just need to parse the bits of HTML returned by the branch_commits endpoint and we can know what branch the commit lives on.

I'm not sure how we can check if it's part of the last N commits of a branch, yet. The real github APIv3 may have something already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought it might be easier to detect the last N commits of a branch, rather than just knowing which branch(es) a commit is on. If the latter is easier, it's preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, I thought there was a way to get the last N commits of a branch, but this doesn't seem to be exposed after all: https://developer.github.com/v3/repos/branches/

branch_commits is our best alternative.

"""
return any((
is_try(task, context.config['source_env_prefix']),
is_pull_request(context, task),
# XXX Cron tasks don't usually define 'taskcluster-github' as their schedulerId as they
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we use standalone taskgraph in our github repos, we can look at standardizing how the taskgraph generates cron tasks so we can detect it here.

return official_repo_owner == repo_owner


def _is_git_full_hash(commit_hash):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since the input of this function isn't always a commit hash (it may be a branch or a tag), perhaps this should be called "revision" :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Fixed.

Copy link
Contributor

@mitchhentges mitchhentges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 :)

"""
# Revision may be a tag name. `branch_commits` doesn't work on tags
if not _is_git_full_hash(commit_hash=revision):
revision = self.get_tag_hash(tag_name=revision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that in this case, revision isn't a tag_name but a branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, that's not a supported case. get_tag_hash() will raise an error because the tag doesn't exist. I'm not aware of a case where we want to support it. If that comes up, I'm fine adding this support.


repo = self._github_repository.html_url

url = '/'.join([repo.rstrip('/'), 'branch_commits', revision])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I probably went on the same goose chase as you:

  1. "The Github API probably has a way to look up if a commit is in my repository (not including PRs)"
  2. "No, but maybe the Github API has a way to find out what branch(es) a commit is on?"
  3. "Ugh, no, but this guy pointed out the semi-API/semi-HTML-y "https://github.com/:user/:repo/branch_commits/:commit"

It's crazy that that's the nicest way of doing this 😠 😛

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's exactly what happened! 😅


def _check_github_url_is_supported(url):
if not is_github_url(url):
raise ValueError('"{}" is not a supported GitHub URL!'.format(url))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit that isn't relevant for this PR: I wonder if it would make sense to have earlier validation for things like this?
Currently, we don't need if checks to assert that our variables are strings/numbers/whatever deep within our functions because we perform earlier validation of the task, and that allows the functions that operate on the task to be so much easier to read (and write).

In the same way. it could be beneficial to have validation of github urls in advance? It could make logic cleaner, since we could leverage expectations with a data type (e.g.: "my function parameter is GithubUrl, so it better be a Github URL") rather than a function call ("what does _check_github_url_is_supported(...) do? Oh, it can throw an error, gotcha").


This is a potentially subjective note. Not that big of a concern :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If, as you suggest, we weren't using strings everywhere (particularly if we could lean on type-checking), we could get away without these checks.

However, given that this is security sensitive code, I'd rather err on the side of having more sanity checks, rather than less.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm not recommending removing these checks, but rather moving them to the entry-point of untrusted data and doing validation there.
To represent that the string has been validated as the specialized type (github url string), we could use a type at that point.

But yeah, I agree, I don't think that we should lessen our sanity checks :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on keeping this check here, for now.


Args:
context (scriptworker.context.Context): the scriptworker context.
task (dict): the task definition to check.

Returns:
bool: True if it's a pull-request
bool: True if it's a pull-request. False if it either comes from the official repos or if
the origin can't be determined. In fact, some valid scriptworker tasks don't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this comment is cut off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Fixed.

)
for repo_url, revision, can_skip in urls_revisions_and_can_skip:
# XXX In the case of scriptworker tasks, neither the repo nor the revision is defined
if not repo_url and can_skip:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if not repo_url and can_skip is False? We expect extract_github_repo_and_name(...) to throw an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. repo_url won't be a valid github URL and then _check_github_url_is_supported() raises a ValueError


github_repository = GitHubRepository(repo_owner, repo_name, context.config['github_oauth_token'])
conditions.append(not await github_repository.has_commit_landed_on_repository(context, revision))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

@JohanLorenzo
Copy link
Contributor Author

Chatted with @tomprince, yesterday on Slack: he's fine with me landing the patch.

@JohanLorenzo JohanLorenzo removed the request for review from tomprince March 5, 2019 14:51
@JohanLorenzo JohanLorenzo merged commit 5876847 into mozilla-releng:master Mar 5, 2019
@JohanLorenzo JohanLorenzo deleted the more-pr-checks branch March 5, 2019 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants