Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonical ref does not match URL for builds that use dirhtml builder #1406

Closed
benjaoming opened this issue Nov 17, 2022 · 21 comments
Closed
Assignees
Labels
Bug A bug Needed: more information A reply from issue author is required

Comments

@benjaoming
Copy link
Contributor

benjaoming commented Nov 17, 2022

Details

Upstream issue: sphinx-doc/sphinx#9730

Expected Result

<link rel="canonical" href="[https://packaging.python.org/en/latest/tutorials/packaging-projects.html" />

Actual Result

<link rel="canonical" href="[https://packaging.python.org/en/latest/tutorials/packaging-projects/" />
@benjaoming benjaoming added the Bug A bug label Nov 17, 2022
@benjaoming
Copy link
Contributor Author

All this might be a Sphinx issue though. I haven't had the time to check.

@benjaoming benjaoming added the Needed: more information A reply from issue author is required label Nov 17, 2022
@stsewd
Copy link
Member

stsewd commented Nov 17, 2022

I think this was a problem when installing an outdated version of sphinx, and then installing a new one over it. They have their requirements set explicitly https://github.com/pypa/packaging.python.org/blob/main/requirements.txt, so we should be fine enabling the install latest feature flag.

@benjaoming
Copy link
Contributor Author

I think this was a problem when installing an outdated version of sphinx, and then installing a new one over it.

So the issue originates from this behavior?

https://github.com/readthedocs/readthedocs.org/blob/fc7f68b9848e4c6a39014801829d395df2aede86/readthedocs/doc_builder/python_environments.py#L203-L222

How does sphinx==4.5.0 end up having its build outputs affected by the previous Sphinx installation?

@benjaoming
Copy link
Contributor Author

benjaoming commented Nov 17, 2022

Seems that the theme, python-docs-theme, is the reason for being stuck with sphinx 4.x until the work by Adam is merged - https://github.com/python/python-docs-theme/pull/99

@stsewd
Copy link
Member

stsewd commented Nov 17, 2022

o we should be fine enabling the install latest feature flag.

And... that didn't work :D

@benjaoming
Copy link
Contributor Author

Similar issue here: https://test-builds.readthedocs.io/en/htmldir/folder/

@benjaoming
Copy link
Contributor Author

I upgraded Sphinx to the latest version in https://test-builds.readthedocs.io/en/htmldir/ but the issue persists.

@ericholscher
Copy link
Member

Yea, this is a bug, we should prioritize fixing it. It's likely causing lots of search issues.

@benjaoming
Copy link
Contributor Author

@ericholscher I just saw that the issue description here has broken down where the bug might be found: #1303

@benjaoming
Copy link
Contributor Author

This is indeed an issue in the theme. .html is hard-coded regardless of the builder.

https://github.com/readthedocs/sphinx_rtd_theme/blob/master/sphinx_rtd_theme/layout.html#L54

Moving this issue and opening a PR since I'm already in the area 👍

@benjaoming benjaoming transferred this issue from readthedocs/readthedocs.org Jan 11, 2023
@benjaoming
Copy link
Contributor Author

benjaoming commented Jan 11, 2023

This is the upstream issue by the way: sphinx-doc/sphinx#9730

@benjaoming
Copy link
Contributor Author

This is only fixing the theme.

If we want to implement a work-around on everything hosted on RTD, we should figure out why canonical_url as defined here doesn't seem to be applied: https://github.com/readthedocs/readthedocs-sphinx-ext/blob/main/readthedocs_ext/_templates/readthedocs-insert.html.tmpl

@ericholscher ericholscher moved this from Planned to In progress in 📍Roadmap Feb 14, 2023
@ericholscher
Copy link
Member

I was able to fix teh canonical issue with a HTML -> HTMLDir redirect: https://docs.readthedocs.io/en/latest/user-defined-redirects.html#sphinx-redirects

@humitos
Copy link
Member

humitos commented Mar 22, 2023

The easiest fix for users here is to remove the theme configuration canonical_url (which is already deprecated) and use html_baseurl Sphinx's option instead. When html_baseurl is defined, Sphinx automatically generates the pageurl and inject it into the context.

Read the Docs already does this "automagically" at https://github.com/readthedocs/readthedocs.org/blob/4b2eb75bebfce04390ac3f84d1c57ef7f7126fe8/readthedocs/doc_builder/templates/doc_builder/conf.py.tmpl#L151-L156

Besides, I think that now Read the Docs is exposing READTHEDOCS_CANONICAL_URL we should use that environment variable from this theme to automatically set html_baseurl from this theme instead of relying on Read the Docs itself.

@benjaoming
Copy link
Contributor Author

@humitos did something get fixed in Sphinx? The upstream issue here is that Sphinx appends .html to pageurl when it shouldn't do that for the DirHTML builder.

If we want to fix the issue quickly in the theme, I proposed #1407 as a backwards and forwards compatible workaround.

@humitos
Copy link
Member

humitos commented Mar 22, 2023

@benjaoming gotcha! I miss-read the Sphinx code and thought that self.out_suffix on its code will do exactly what we need, but it will probably just adds .html always instead of only when it's required.

@benjaoming
Copy link
Contributor Author

@humitos yes, that's it.. and there's also a PR to fix exactly that part... I'd put in a proposal to add a canonical_url to the page context.. but holding back with more work, since the simple fix in that PR has been sitting stale for a long time now.

@humitos
Copy link
Member

humitos commented Jan 29, 2024

I'm a little lost here about how to solve this issue. What are the actionable steps to move forward from our side?

@ericholscher
Copy link
Member

I think it's a Sphinx issue that we could work around, but I think Sphinx should likely fix it.

@humitos
Copy link
Member

humitos commented Sep 10, 2024

I think we should revisit this now that we are working on a 3.0.0rc1 release. I think this could also be related with the Sphinx context change we are doing; we should double check that we are setting the canonical link by default:

@humitos humitos self-assigned this Sep 10, 2024
@humitos
Copy link
Member

humitos commented Sep 12, 2024

I checked this locally by using html_baseurl and building the documentation with -b html and -b dirhtml. It works as expected 💯

However, the deprecated canonical_url does not work correctly with -b dirhtml because that's the chunk of code that has the .html hardcoded.

Since that option is deprecated since version 0.6.x we can call this a non-issue.


I perform the tests with those projects, and I see it works fine:

▶ curl -s https://test-builds.readthedocs.io/en/htmldir/ | grep canonical                            
    <link rel="canonical" href="https://test-builds.readthedocs.io/en/htmldir/index.html" />
...
▶ curl -s https://test-builds-disable-sphinx-manipulation.readthedocs.io/en/htmldir/ | grep canonical
    <link rel="canonical" href="https://test-builds-disable-sphinx-manipulation.readthedocs.io/en/htmldir/index.html" />

@humitos humitos closed this as completed Sep 12, 2024
@github-project-automation github-project-automation bot moved this from In progress to Done in 📍Roadmap Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A bug Needed: more information A reply from issue author is required
Projects
Archived in project
Development

No branches or pull requests

4 participants