Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of /index.html URLs #52

Open
anjackson opened this issue Nov 28, 2019 · 2 comments
Open

Better handling of /index.html URLs #52

anjackson opened this issue Nov 28, 2019 · 2 comments
Labels
question Further information is requested

Comments

@anjackson
Copy link
Contributor

We have apparent 'gaps' under OutbackCDX + pywb, as records for

https://www.webarchive.org.uk/wayback/archive/*/http://www.webarchive.org.uk/index.html

are separate from records for

https://www.webarchive.org.uk/wayback/archive/*/http://www.webarchive.org.uk/

Our users expect to see these together. This seems to be a URL canonicalisation issue with OutbackCDX, but I'm recording the issue here for now.

@ldbiz ldbiz added the question Further information is requested label Sep 19, 2023
@ldbiz
Copy link
Contributor

ldbiz commented Sep 19, 2023

Issue a few years old - is it still to be investigated here?

@anjackson
Copy link
Contributor Author

This is related to ukwa/ukwa-services#81 and ukwa/w3act#614

I experimented with a solution to this for ukwa/ukwa-services#81, by adding an alias record to OutbackCDX. But I got stuck because I wasn't sure which URLs were aliases. This might be a good thing to work through with @nicolabingham ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants