Abstract

On May 21, 2019, LinkedIn’s URL shortener went down. The certificate had expired. Millions of people cried out in terror when they couldn’t click on AI link bait.

The interesting part: LinkedIn had renewed the certificate ten days earlier. The renewal succeeded. The certificate just never made it to the server. The renewed cert existed somewhere, but the server still served the old one.

Most certificate automation is built to prevent the “I forgot to renew” problem. Certbot solves that problem pretty well. The new problem is subtler: automation that appears to succeed while the old certificate keeps serving. No one notices, and users get a browser error.

Renewed doesn’t mean deployed

Certificate issuance isn’t the same thing as certificate automation. Certbot renewing a certificate writes new files to disk. Your web server picking them up requires a separate reload or restart, and if you’re running multiple servers, you also have to distribute the cert to multiple nodes. There’s no built-in mechanism for that. You write the Powershell and bash, and you own every way that it can fail.

There are a lot of ways this goes wrong. The deploy hook didn’t run. The web server reloaded but the new files weren’t in the expected location. The certificate deployed to one server in a load-balanced pool but not the others. The cert is on disk but the process has it cached in memory and won’t release it until a full restart.

Unfortunately, there’s nothing in the Certbot logs that tells you whether the certificate you just renewed is actually being served. You also have to verify it yourself, and there’s three levels to verification:

1. Is the certificate valid and for the right domain?

This is the simplest possible monitoring: Connect to the endpoint, read the certificate, make sure that it actually includes the domain in the Subject Alternative Names list. Check the Not-After date. Alert it if it’s expired.

This catches simple certificate expirations and process failures where the wrong certificate gets installed. Maybe a wildcard was replaced with a single-domain cert, or a certificate from one environment got deployed to another.

2. Is the certificate from a trusted chain?

A certificate can have a valid expiry date and still fail validation if the chain back to a trusted root is broken or incomplete.

On September 30, 2021, the DST Root CA X3 root certificate used to cross-sign Let’s Encrypt’s certificates expired. Certificates that had valid expiry dates stopped working for clients that couldn’t handle the chain transition. Services including Xero, Slack, and Fortinet were affected.

Chain validation checks that every certificate in the chain is valid and that the chain terminates at a trusted root. If your server is sending an incomplete chain, or if an intermediate was revoked, basic expiry monitoring won’t see it.

3. Is it the certificate you intended to deploy?

This is the check almost no one does, but it catches the silent deployment failures.

Every certificate has a thumbprint: a hash of the certificate’s contents. When you renew a certificate, the thumbprint changes. If you know what thumbprint you just issued, you can compare it against what the endpoint is actually serving. If they don’t match, the renewal didn’t deploy.

If you control certificate issuance, as CertKit does, you know the expected thumbprint before you ever check the endpoint. A mismatch means that a new certificate was issued, but the automation failed somewhere. And you know about it days before any user notices, you lucky dog.

What happens when you don’t verify

On December 26, 2025, the Bazel build system’s website went down. Their Google-managed SSL certificate had expired. The postmortem published in January 2026 describes what happened:

The auto-renewal had been silently failing for 30 days. A DNS record had been removed for one subdomain in the certificate. The managed certificate system required all domains to be reachable for renewal to succeed. “When one wasn’t, renewals failed, and the Google-managed SSL certificate renewal failures did not trigger any notifications.”

The outage lasted 13 hours, over a holiday week, because there was nothing to catch the silent failure before it became an expired certificate.

Right now, you’re probably renewing certificates once or twice a year. Each renewal is an opportunity for silent failure, but the gap between failure and expired certificate is long enough that someone might notice.

In March 2026, certificate lifetimes drop to 200 days. In 2029, they’re down to 47 days. With 47-day certificates, that’s roughly eight renewals a year per certificate. Eight chances for a deploy hook to fail silently.

Verification makes certificate automation sustainable

The issue → deploy → verify loop isn’t just about catching failures. It’s about having confidence that your automation is actually working.

Certbot gets you most of the way to issue. Getting the certificate onto every server that needs it is deploy, and that’s a whole separate problem we’ll cover in another post. Verify is the feedback loop that closes the system: it’s how you know the other two steps did what you think they did.

CertKit monitors all three levels (expiry, chain, and thumbprint comparison against the certificate it issued) and alerts you when something doesn’t match. If your certificate renewed but isn’t serving, you’ll know before your users do.


CertKit automates certificate lifecycle management.

Comments