Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move repository to the dependency-check organization #7235

Open
jeremylong opened this issue Dec 9, 2024 · 21 comments
Open

Move repository to the dependency-check organization #7235

jeremylong opened this issue Dec 9, 2024 · 21 comments
Milestone

Comments

@jeremylong
Copy link
Owner

To better support a growing team of contributors we plan to move the project from a personal repository to the https://github.com/dependency-check organization. The gradle plugin has been there along with other related projects.

This means the documentation pages will move to a new URL. I believe GH puts in redirects when the project moves. The possible issues:

  1. Many users have automation to check the current release and download the latest from this repo. Even if redirects exist it could break the automation
  2. The schemas all point to the personal repo:
    targetNamespace="https://jeremylong.github.io/DependencyCheck/dependency-check.4.1.xsd" xmlns:dc="https://jeremylong.github.io/DependencyCheck/dependency-check.4.1.xsd">

@aikebah any concerns with this move? Can you think of any other issues?

@aikebah
Copy link
Collaborator

aikebah commented Dec 9, 2024

On item 2) Strictly spoken the schema's don't point to your personal repo. An xmlns is simply a resource identifier to uniquely identify a schema, but many people tend to get that wrong when URL-conforming URIs are used, so I think it might be beneficial to even update those (would require a new major if we do so as people need to update the namespace of their filter/hint files to migrate them to the new namespace)

For retrieval location specification there is the schemaLocation attribute in the XML Schema Instance namespace (https://www.w3.org/TR/xmlschema-1/#schema-loc)

I think the main part besides ensuring all our docs are updated for the new location is ensuring that the various integrations out there as far as we know about them (from memory I know azure devops, jenkins plugin, sonarqube plugin) get aligned on the project location change-over so that they can also publish an update once the change-over is completed with a release at the new location so as to not rely on Github forwarding them.

Can't think of further items to take into account (apart from me joining up on the dependency-check org on github in an appropriate role), but have never had to deal with a GH project move, so there might be some gotchas we only find out about when the move is done.

@marcelstoer
Copy link
Contributor

marcelstoer commented Dec 9, 2024

https://docs.github.com/en/repositories/creating-and-managing-repositories/transferring-a-repository might have more hints, I only scanned it briefly.

The term "move" hasn't been clearly defined yet. You can transfer the GitHub project as a whole or can selectively move assets (eg. Git repository).

The latter means creating a new empty project in the target location leaving the old one as-is but locked. You would then push the Git repository incl. its history to the new location. New releases would only be created off the new project. Issues would need to be replicated to the new location - not sure if GitHub offers something out of the box or whether a custom script is required. There are obvious downsides to this approach but the big advantage is that nothing will break for integrators. Users can do the switch at their own pace.

@jeremylong
Copy link
Owner Author

I've transferred a repository before. It isn't complicated. In the project settings, under "danger zone" you'll see:
image

When you do this GH automatically creates some redirects from the old to new repo. This is likely going to delay the 12.0 release a few days (I was planning on releasing today - but held off because of this issue). It would be best to synch the 12.0 release with the move.

@jeremylong jeremylong added this to the 12.0.0 milestone Dec 9, 2024
@jeremylong jeremylong pinned this issue Dec 9, 2024
@chadlwilson
Copy link
Contributor

I've transferred repos a number of times too. There is some kind of redirect kept in place but not sure if it applies to everything (raw repo links, artifacts, releases etc) and for how long.

@chadlwilson
Copy link
Contributor

Can't think of further items to take into account (apart from me joining up on the dependency-check org on github in an appropriate role), but have never had to deal with a GH project move, so there might be some gotchas we only find out about when the move is done.

There is also the URI for fetching the publishedSuppressions which is off Github pages, not a raw repo link.

hosted.suppressions.url=https://jeremylong.github.io/DependencyCheck/suppressions/publishedSuppressions.xml

public static final String DEFAULT_SUPPRESSIONS_URL = "https://jeremylong.github.io/DependencyCheck/suppressions/publishedSuppressions.xml";

From the docs

If the transferred repository contains a GitHub Pages site, then links to the Git repository on the Web and through Git activity are redirected. However, we don't redirect GitHub Pages associated with the repository.

This makes it sound to me that all the Github pages links will be broken, only direct repository links are redirected? I've never moved a GH pages repository, so don't know empirically. If that's the case, this move would probably be quite bad, as lots of things will break including old versions fetching the published suppressions, engine.version.url retrievals from GH Pages etc.

We might want to do a test somewhere unless we can find something more definite.

Is this a worry with backward compat as well?

if (systemId != null && systemId.startsWith("https://jeremylong.github.io/DependencyCheck/")) {

@marcelstoer
Copy link
Contributor

Another reason why I proposed to consider an alternative approach is to reduce pressure on the current maintainer(s). Since it is far less risky, it would likely keep the stress level far lower. You could take your time building the new highway before rerouting traffic to it.

@jeremylong
Copy link
Owner Author

if we forked the repo for the move the issues/PRs/history/etc would not move - unless I'm misunderstanding the proposal. In most cases, the redirects GH puts in should handle things (except GH pages...).

I vote we pick a day, open issues on the repos we know about announcing the upcoming move and just transfer the repo on that date. It might be a minor disruption to some - but this is a one-time issue. The only issue is timing with the holidays coming up. Any objection to Dec 22nd? It is a Sunday, so if this does break for people it is going into the work week (unfortunately that is Christmas week). Or should we hold off until after the 1st? I feel like we need to get 12.0.0 out due to the CVSSv4 issues, but I'd also like to combine the transfer. Thoughts?

@chadlwilson
Copy link
Contributor

if we forked the repo for the move the issues/PRs/history/etc would not move - unless I'm misunderstanding the proposal. In most cases, the redirects GH puts in should handle things (except GH pages...).

Not sure if you're talking to me or @marcelstoer but I think if indeed the GH pages links will all be broken, it's possibly not a good idea to do this change as it'd break all old versions (again) when they cannot retrieve suppressions.

I vote we pick a day, open issues on the repos we know about announcing the upcoming move and just transfer the repo on that date. It might be a minor disruption to some - but this is a one-time issue. The only issue is timing with the holidays coming up. Any objection to Dec 22nd? It is a Sunday, so if this does break for people it is going into the work week (unfortunately that is Christmas week). Or should we hold off until after the 1st? I feel like we need to get 12.0.0 out due to the CVSSv4 issues, but I'd also like to combine the transfer. Thoughts?

If indeed we are going to break everyone on older versions or think we might break everyone, probably not a good idea to do this over this Decemberish time period?

@marcelstoer
Copy link
Contributor

marcelstoer commented Dec 10, 2024

if we forked the repo for the move the issues/PRs/history/etc would not move

By "history" you didn't mean "Git history", did you (as that is preserved)? For PRs and issues there are tools to mitigate the cost of transferring them "manually": https://jloh.co/posts/bulk-migrate-issues-github-cli/, https://github.com/NicholasBoll/github-migration. Pages would be automatically rebuilt at the new place by the GH actions.

I don't oppose a bing-bang transfer of course. For a project with such a large community though, I just wouldn't be brave enough to pull this off if it were mine.

@jeremylong
Copy link
Owner Author

@marcelstoer Honestly, this is why I have never moved the repo - fear of breaking things. But sometimes you just have to "rip the bandaid off". Yeah, it might hurt. Some users might get upset; but for the health, sanity, etc. of the maintainers (mainly me at this point) - the repo needs to move so we can attempt to build a bigger community of maintainers.

I agree with both @marcelstoer and @chadlwilson that we risk breaking a lot of existing users because gh-pages. I'm going to do some testing on options - transfer a repo, then fork it back and recreate the releases and gh-pages. My gut is that to maintain links, etc. we will need to fork it over to the dependency-check organization and then copy the issues over - but I'm going to run some tests.

Any opinions on Jan 5th for the "move" and release of 12.0.0?

@chadlwilson
Copy link
Contributor

chadlwilson commented Dec 10, 2024

Well, if indeed the GH pages don't work ... another (slower) alternative might be to attach a custom domain to the GH pages site so that changes can be made on 11.x releases, or communicated to folks in advance to update their links with overrides to the URLs if they intend to stay on older versions, and then the switch can be done without downtime since you'd control the indirection via CNAMEs :-) Although domains cost money too. So maybe not a great plan.

If you recreate a repo with the same name back in jeremylong org (either standalone repo with same name or a fork) such that you can create a new Pages site with a redirect, then the GH-level redirects of the raw content, release links web-visits etc will definitely stop working (the GH documentation explicitly notes this is how it works)

I'd much prefer transferring the repo with all issues, actions etc attached (and breaking things for versions at some point) than contemplating forking into the new org and having to copy issues. Apart from being messy, that's likely to create more confusion and still no redirect magic for raw content links or github.com links.

As a side note, do you really get more "control" managing contributor permissionss via an org, rather than adding outside collaborators to an individual repo such as this? Although I have I consumed and admined both, I wasn't aware of that (except for convenient in granting permissions to multiple repos via an org). Just to make sure we are doing it for the right reasons!)

Any opinions on Jan 5th for the "move" and release of 12.0.0?

If we want to go ahead and GH pages is the main thing that'll be broken, I think that'd be pretty reasonable. We can probably prep a pinned GitHub issue showing people how to override the URL links on older versions for various releases/plugins in advance to "soften the blow" for people who cannot upgrade to v12 for whatever reason. Possible for the hosted suppressions URL at least, not sure about all others :-)

@jeremylong
Copy link
Owner Author

With an org you have more control over what contributors can do: triage, contributor, admin, etc. With a personal repo you just get to add people as a contributor.

@jeremylong
Copy link
Owner Author

I mean, I'm okay just adding people as contributors if I also added a codeowners file (the current contributors) and added a code owner's approval rule to the branch protection. This might be the safest, easiest solution for now.

@nhumblot
Copy link
Collaborator

nhumblot commented Dec 12, 2024

Hi 👋

I am all for a repository transfer but as of now, it seems like it would not be in line with our security policy to support the previous major version, we would break version 11. If I would be the owner of such a task, I think I would tend to investigate and prepare the migration, or ensure retro-compatibility, to prevent any disruption. Maybe linking it to the 12.0.0 release is a bit of a rush if we have alternatives.

@aikebah and @chadlwilson already shared great insights on what could be impacted.

Some links I found regarding github pages redirection:

Also to prevent breaking things, maybe we could consider having a mirror with the old name once the transfer is made? 🤔

@chadlwilson
Copy link
Contributor

chadlwilson commented Dec 12, 2024

Also to prevent breaking things, maybe we could consider having a mirror with the old name once the transfer is made? 🤔

There is no way to create a mirror with the old name without breaking the Github/repo level redirects, issue link redirects etc - to my knowledge.

People often won't mention this, possibly as they are perhaps not using a single repo (like this one) as a hub for a wider OSS community and set of tools, including with raw githubusercontent links in released code/config like ODC does. In their case often some standalone simple static website in its own repo for which breaking issue/code links etc isn't a big concern.

Seems for a given repository you pick 1 of the below

  1. use meta-refresh to redirect top level links to new GH pages
  2. have automagic GH redirects for the repository and its content, issues etc.

@chadlwilson
Copy link
Contributor

chadlwilson commented Dec 16, 2024

@FyiurAmron can you please share your specific experience? why is transferring a repo 'rewriting history' (actually it retains it, including for issues.and PRs and releases etc)? Did you have a project with history in thousands of issues, hundreds open to migrate across? And a community of thousands of users?

I see an opinion and an assertion of experience, but would like specific reasoning or storytelling to back that up, or it's just an opinion.

For me, forking 'wont get [me] what [I] want' since abandoning the issues here in one repo while starting a new one won't lead to good outcomes for my own effort to engage in soon to be abandoned issues, nor manually migrating issues in a way that removes all user subscriptions and thus severs whatever community there is. I do not consider that low risk.

Personally I can't stand it when projects do that to their communities. I'd rather just leave the repo here, if it were my choice and that's all we have as an option if transfer is too painful.

@FyiurAmron
Copy link

FyiurAmron commented Dec 16, 2024

@chadlwilson the issue with the issues is that there is 500 of open ones ATM, virtually all of them old (only ~100 ones more recent than 6 months) . I don't think that anyone will ever be able to triage them effectively without any monetary incentive, especially since new ones will be coming as well. I also don't think they create any kind of "community" - they are noise at this moment, most likely. Community is created by the will of people to collaborate, not an issue or a ticket in a tracker or any other electronic medium. "abandoning" (kind of) the issues is actually what was being proposed before in #7231 (i.e. having a bot that would close the old ones). The PRs are: bot, one draft, one actual PR. I see no harm here. This is mainly a code repo, not a discussion forum.
As to "rewriting history" - I probably used the wrong wording, sigh. l haven't used GH transfer in a long time. I deleted my original comment (yes, that's rewriting history :} Still, the rest of my point stays. It's often better to start with a clean slate.

@chadlwilson
Copy link
Contributor

Probably overindexing on the word 'community' and its meaning there - even having a single place to collaborate and find things is useful, rather than splitting things. People help each other. Old false positives have been looked into and provide guidance to folks. It snot the ideal knowledge repository, but it's better than none and leaving it behind would be negative in my opinion.

FWIW transferring a repo doesn't maintain a fork in old location, it redirects links and repo pulls (but not GH pages) as discussed above.

@FyiurAmron
Copy link

FyiurAmron commented Dec 16, 2024

Probably overindexing on the word 'community' and its meaning there

maybe. Maybe not. Who's to say what "community" really is, who has the mandate for that? :D I think that community is made by people, not places or machines, and it's the community that creates and manages those places and machines and not the other way around, and thus the places and ways should be adapted to the communities, and not the other way around. Don't you agree?

even having a single place to collaborate and find things is useful, rather than splitting things.

Well, yes, but no, but yes, but no. There would still be a single place, as an archived repo is no longer accessible for collaboration. Nothing would be "split", apart from the backlog of outdated and impractical issues maybe vs actionable ones. The old ones would still be indexed and searchable, Google doesn't forget about us.

Well, People help each other.

This they do sometimes, yes. There is definitely value in that, I'm not arguing it.

Old false positives have been looked into and provide guidance to folks.

They would still do. Nothing would be removed, I think.

It snot the ideal knowledge repository, but it's better than none and leaving it behind would be negative in my opinion.

Some things should be left behind. Trying to act like issue tracker is a knowledge base makes it impossible to do the main thing it's supposed to do - namely track currently occurring and actionable/solvable issues.

FWIW transferring a repo doesn't maintain a fork in old location, it redirects links and repo pulls (but not GH pages) as discussed above.

Good to know, maybe I'll even have time to check it out myself in the nearest future.


BTW, you asked for proof other than anecdotes before. From the top of my head, https://github.com/yidongnan/grpc-spring-boot-starter . A couple other projects that got migrated under Kotlin and Spring/Spring Boot umbrellas as well. It's still anecdote TBH, since it's never strict engineering with those things.

@jeremylong
Copy link
Owner Author

I really think we can:

  1. transfer the repo to the dependency-check org (and update in the code/MD any URIs)
  2. fork the repo back to jeremylong
  3. recreate the releases with the files
  4. update the README on the fork to indicate it's archived, use the other repo
  5. create and pin and issue and PR that only states something like "All [PR/Issues] can be found on https://github.com/dependency-check/DependencCheck"
  6. archive the fork - leaving the GH pages and releases in place so older versions of ODC don't break

This will break any existing hard links to PRs and Issues. Would there be any other concerns?

If we really wanted to keep the hard links to PRs/Issues working we could just generate a bunch of issues and PRs that have the same title and simply contain a description with a link to the PR/Issue on the new repo.

@jeremylong
Copy link
Owner Author

Several personal commitments have interfered with my timeline. 12.0.0 will be released soon. Following that I will work on seamlessly transferring the repo to the organization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants