Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certbot fails to generate Let's Encrypt certs on the first attempt #173

Open
jessuppi opened this issue Aug 30, 2022 · 11 comments
Open

Certbot fails to generate Let's Encrypt certs on the first attempt #173

jessuppi opened this issue Aug 30, 2022 · 11 comments

Comments

@jessuppi
Copy link
Member

jessuppi commented Aug 30, 2022

This has been an ongoing issue for several months, and it's confusing many new users.

We discovered that because many users who are new to SlickStack and LEMP don't realize that OpenSSL works fine and is much easier esp. when behind Cloudflare, they have apparently been choosing the letsencrypt option during setup, seeing that error, and assuming SlickStack doesn't work and then ditching it altogether.

After feedback on this confusion in our Discord chat room, we decided to default to openssl going forward AND hide the option from the setup wizard to avoid frustrating newbies.

However, this doesn't solve the issue of Certbot failing to issue the certificates on the first attempt, which seems to happen on virgin installations. The first time you run ss-install on a brand new server, everything tends to work fine except for Certbot, which "hangs" and then returns an "unauthorized" error... however, after running ss-install again, the certificates are issued properly with a SUCCESS message.

We've tried for a while to figure out what's causing this... we suspected it was IPv6 / Cloudflare related because of several other related cases on the forums and around the web, but it might be this:

However, you should keep an eye on whether there are any web forwards configured (some DNS providers allow this) e.g. if you forward www to non-www or vice-versa, this may trip up Certbot. In which case remove the domain you are forwarding using DNS from your certificate. This should resolve the issue.

Ref: https://webdock.io/en/docs/webdock-control-panel/ssl-certificate-guides/common-certbot-errors

I can personally confirm this issue still happens even when choosing the "Full SSL" setting in Cloudflare SSL tab, and even when IPv6 exists in the DNS records and resolves in the Nginx server, so this seems unrelated:

Ref: https://support.plesk.com/hc/en-us/articles/360016816274-Could-not-issue-a-Let-s-Encrypt-certificate-DNS-zone-contains-an-AAAA-record-but-the-domain-is-not-assigned-an-IPv6-address-in-Plesk

@jessuppi
Copy link
Member Author

jessuppi commented Sep 3, 2022

Example error message on new installs:

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Account registered.
Requesting a certificate for example.com and www.example.com

Certbot failed to authenticate some domains (authenticator: webroot). The Certificate Authority reported these problems:
  Domain: example.com
  Type:   unauthorized
  Detail: 2606:4700:3034::6815:238a: Invalid response from http://example.com/.well-known/acme-challenge/5vnlI6sdSN5ixd0467ij9wZgoaWr2NiS3dsmdmj54k4: 404

  Domain: www.example.com
  Type:   unauthorized
  Detail: 2606:4700:3034::6815:238a: Invalid response from http://example.com/.well-known/acme-challenge/1SgtlSd0B60jZWGy2LEUlHZ4jgBIhjouVeqH65OS44Q: 404

Hint: The Certificate Authority failed to download the temporary challenge files created by Certbot. Ensure that the listed domains serve their content from the provided --webroot-path/-w and that files created there can be downloaded from the internet.

Some challenges have failed.
Ask for help or search for solutions at https://community.letsencrypt.org. See the logfile /var/log/letsencrypt/letsencrypt.log or re-run Certbot with -v for more details.

This is strange, because unless the Certbot team has carelessly written their error messages (unlikely) then it means the verification tests for both example.com and www.example.com are trying to load from http://example.com which doesn't make very much sense to me... I would expect separate domain verifications.

And because SlickStack runs over HTTPS by default and has HSTS enabled by default, the HTTP verification is going to fail which means we need a way to tell Certbot to run the tests over https://... instead I think.

@jessuppi
Copy link
Member Author

jessuppi commented Sep 3, 2022

@jessuppi
Copy link
Member Author

jessuppi commented Nov 26, 2022

Also see:

https://community.letsencrypt.org/t/must-run-slickstack-install-twice-to-generate-lets-encrypt-certs/183023

As per this discussion, the Certbot Let's Encrypt gurus think that something is not redirecting to HTTPS version properly on the first attempt at generating certs... however the Certbot errors are HTTP, which makes even less sense. How could SlickStack be properly redirecting www to non-www scheme, but failing to redirect http to https?

Our production server block force redirects all requests to HTTPS and HSTS is also enabled too...

@rom0x
Copy link

rom0x commented Nov 28, 2022

Hey. I can confirm the issue still exists (even after running the installation multiple times + a reboot). Just got a brand new server instance and tried installing SlickStack (Now using KVM at Hetzner.de). Kind regards.

Maybe it would be possible to provide SlickStack with "Full SSL Strict" enabled, using the SSL certs you can create directly under CloudFlare instead of having self-signed certificates through letsencrypt?

@jessuppi
Copy link
Member Author

@jessuppi
Copy link
Member Author

So part of my confusion was discussing this issue with the Lets Encrypt community, who maybe were unaware of some Certbot-specific issues, specifically that port 80 is still required. I also failed to properly specify how SlickStack was integrating Certbot with our Nginx configuration and server blocks, i.e. only port 443 for canonical.

But after confirming HSTS was not an issue, and thoroughly discussing this and reviewing dozens of forum threads, Stack Exchange threads, blog posts, and beyond... I think this is the cause:

Ref: https://letsencrypt.org/docs/allow-port-80/

We occasionally get reports from people who have trouble using the HTTP-01 challenge type because they’ve firewalled off port 80 to their web server. Our recommendation is that all servers meant for general web use should offer both HTTP on port 80 and HTTPS on port 443. They should also send redirects for all port 80 requests, and possibly an HSTS header (on port 443 requests).

Allowing port 80 doesn’t introduce a larger attack surface on your server, because requests on port 80 are generally served by the same software that runs on port 443.

Since SlickStack only allows port 80 on the catch-all Nginx server block (which is not domain-matched), then Certbot is very likely "hanging" because SlickStack doesn't 301 redirect those requests to HTTPS via port 80. In other words, simply redirecting HTTP to HTTPS via Cloudflare or Nginx is not enough...

If we don't want to change our Nginx configuration, we would have to use TLS-ALPN-01 and switch to a different ACME client instead of Certbot.

Ref: https://community.letsencrypt.org/t/which-client-support-tls-alpn-challenge/75859
Ref: https://community.letsencrypt.org/t/confused-about-tls-alpn-01-authorization-type-for-certbot/170881

And for the record, this is not impossible, but it's more involved than I would hope for really:

https://samdecrock.medium.com/deploying-lets-encrypt-certificates-using-tls-alpn-01-https-18b9b1e05edf

@jessuppi
Copy link
Member Author

TLDR I'm not entirely opposed to ditching Certbot for another ACME client, however, because Certbot is sponsored by EFF among other established organizations, there's perhaps stronger trust and longevity with their project... and Certbot is already supported in the Ubuntu packages and such.

What if Certbot or Nginx begin supporting TLS-ALPN-01 in a few years, are we going to switch back again? Although strong security is a top priority for SlickStack, and I hate having to support port 80, I think the logical solution here is the one with the least dependencies and complications... adding a port 80 specific server block in Nginx.

@jessuppi
Copy link
Member Author

Update: adding the below snippet to production, staging, and development server blocks seems to have improved things, and the "404 Not Found" error is no longer returned by Certbot on brand new servers:

#### for Certbot only ##
server {
    listen 80;
	listen [::]:80;
    server_name @SITE_DOMAIN_INCLUDING_WWW @SITE_DOMAIN_EXCLUDING_WWW;
	return 301 https://@SITE_DOMAIN$request_uri;
}

However, there was still a 52x error returned by Certbot on my last attempt on a virgin SlickStack server, this might be related to the Linux kernel issue we've been discussing separately.

@jessuppi
Copy link
Member Author

jessuppi commented Feb 15, 2023

I checked the Nginx access log, there was no attempt by Certbot shown until the 2nd install, which was:

2400:cb00:397:1024::ac46:7fa0 - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/73sWayf0TQJd_m8v956DrOM7cKZsGJnSJQjcrYQOmwY HTTP/1.1" 301 162 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2400:cb00:98:1024::ac44:2312 - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/73sWayf0TQJd_m8v956DrOM7cKZsGJnSJQjcrYQOmwY HTTP/1.1" 301 162 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
172.71.147.30 - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/q873S9W4yfmxiwLWs4Pu0ldxxSHK1m1L7zL5jkr33wM HTTP/1.1" 301 162 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2400:cb00:28:1024::6ca2:f504 - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/73sWayf0TQJd_m8v956DrOM7cKZsGJnSJQjcrYQOmwY HTTP/1.1" 301 162 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
172.68.34.69 - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/q873S9W4yfmxiwLWs4Pu0ldxxSHK1m1L7zL5jkr33wM HTTP/1.1" 301 162 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
172.70.126.238 - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/q873S9W4yfmxiwLWs4Pu0ldxxSHK1m1L7zL5jkr33wM HTTP/1.1" 301 162 "-" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2400:cb00:612:1024::ac47:fe30 - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/73sWayf0TQJd_m8v956DrOM7cKZsGJnSJQjcrYQOmwY HTTP/2.0" 200 87 "http://example.com/.well-known/acme-challenge/73sWayf0TQJd_m8v956DrOM7cKZsGJnSJQjcrYQOmwY" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2400:cb00:98:1024::ac44:2311 - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/q873S9W4yfmxiwLWs4Pu0ldxxSHK1m1L7zL5jkr33wM HTTP/2.0" 200 87 "http://www.example.com/.well-known/acme-challenge/q873S9W4yfmxiwLWs4Pu0ldxxSHK1m1L7zL5jkr33wM" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2400:cb00:98:1024::ac44:22f4 - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/73sWayf0TQJd_m8v956DrOM7cKZsGJnSJQjcrYQOmwY HTTP/2.0" 200 87 "http://example.com/.well-known/acme-challenge/73sWayf0TQJd_m8v956DrOM7cKZsGJnSJQjcrYQOmwY" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2400:cb00:398:1024::ac46:8285 - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/q873S9W4yfmxiwLWs4Pu0ldxxSHK1m1L7zL5jkr33wM HTTP/2.0" 200 87 "http://www.example.com/.well-known/acme-challenge/q873S9W4yfmxiwLWs4Pu0ldxxSHK1m1L7zL5jkr33wM" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2400:cb00:28:1024::6ca2:f56c - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/q873S9W4yfmxiwLWs4Pu0ldxxSHK1m1L7zL5jkr33wM HTTP/2.0" 200 87 "http://www.example.com/.well-known/acme-challenge/q873S9W4yfmxiwLWs4Pu0ldxxSHK1m1L7zL5jkr33wM" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"
2400:cb00:543:1024::ac47:9670 - - [12/Feb/2023:15:43:25 +0000] "GET /.well-known/acme-challenge/73sWayf0TQJd_m8v956DrOM7cKZsGJnSJQjcrYQOmwY HTTP/2.0" 200 87 "http://example.com/.well-known/acme-challenge/73sWayf0TQJd_m8v956DrOM7cKZsGJnSJQjcrYQOmwY" "Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)"

So there's no longer 404 errors, but timeout errors like 522 instead.

I assume this means Nginx is not even properly active at the time of the first attempts...

@jessuppi
Copy link
Member Author

@jessuppi
Copy link
Member Author

Changed server blocks a bit:

server {
	listen 80 default_server;
	listen [::]:80 default_server;
	server_name @SITE_DOMAIN_INCLUDING_WWW @SITE_DOMAIN_EXCLUDING_WWW;

	location /.well-known/acme-challenge/ {
		allow all;
		auth_basic off;
		default_type "text/plain";
		try_files $uri =404;
		root /var/www/html;
	}

    .....

Now getting "Timeout during connect (likely firewall problem)" but port 80 scan shows fine. Not sure if IPv6 related, however in the Certbot logs it shows the connection attempt was to the IPv4 address so should be unrelated...

Really blows my mind how finicky Certbot is, not sure we should keep using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants