Abstract
CertBot assumes every server that needs a certificate should also know how to request one, validate domain ownership, handle renewals, and manage failures.
This makes sense with a handful of servers. One server, one cert, done. But infrastructures grow. Now you’ve got web farms sharing wildcards, load balancers, mail servers, VPN appliances. The “every server for itself” model doesn’t scale and isn’t sustainable.
Even the Let’s Encrypt community knows it. When asked what would make Certbot scale better, a maintainer responded bluntly: “If someone has ‘a large number of certificates’ they should not be using Certbot… Certbot has been positioned as the ‘entry level’ and ‘swiss army knife’ of ACME clients.”
Entry level is not exactly a ringing endorsement for managing your production infrastructure.
How ACME breaks down
With CertBot, ACME is a distributed responsibility model. Each server handles its own validation, each server manages its own renewals, and each server needs to handle its own errors.
Certificate distribution
Dozens of servers often need to share a wildcard certificate. How do you handle that? One server requests it, then you distribute it to the others. The official CertBot guidance for this scenario is basically “figure it out yourself.”
So we build workarounds. Rsync cron jobs. NFS mounts. Ansible playbooks that copy certificates around. You’ve poorly reinvented a centralized certificate manager.
The Let’s Encrypt forums are full of confused admins trying to solve this:
I’ve read through a number of topics but can’t decide on the best approach to use when Let’s Encrypt is to be used with multiple servers.
The problem arrives when I tried to introduce a load balancer and additional nodes… I’m afraid the auto renew process will fail as the challenge might be distributed to a different node.
Nobody has a great answer.
Monitoring a distributed system
Distributed certificate automation has two failure points: renewing the certificate, and getting the running service to use it. Both can fail silently, anywhere in your infrastructure. It’s on you to monitor every system to make sure neither breaks.
Epic Games had certificate monitoring. When a wildcard cert expired in April 2021, they identified the problem within 12 minutes. Recovery still took 5.5 hours.
Why? The certificate was used across “hundreds of internal back-end service-to-service calls.” Renewing it was just the first step. Then they had to roll it out to every service that needed it, verify each one picked up the new cert, and deal with the cascading failures that had already started. They later admitted they “believed that we were more protected than we actually were.”
Knowing about the problem isn’t the same as fixing it when your certificates are scattered across infrastructure.
The skeleton key problem
Distributed validation doesn’t just create operational headaches. It creates security exposure.
HTTP-01 validation requires every server to expose port 80 and serve challenge files. That’s attack surface multiplied across your entire infrastructure. In January 2026, researchers disclosed a Cloudflare WAF bypass that exploited ACME challenge paths, where security controls were deliberately relaxed to allow certificate validation.
DNS-01 validation is worse. Every server with DNS credentials holds keys to your entire domain. The EFF warns explicitly: “If the machine handling the process gets compromised, so will the DNS credentials, and this is where the real danger lies.”
DNS credentials don’t just issue certificates. They control email routing, traffic direction, everything. One compromised web server and an attacker can redirect your domain, issue valid certificates for it, or intercept email by modifying MX records.
Flipping ACME on its head
The issue is that certificate validation and certificate usage are different problems. CertBot conflates them.
Your nginx server doesn’t need to understand ACME. Your mail server doesn’t need DNS API credentials. Your VPN appliance probably can’t run CertBot.
They just need a certificate file.
CertKit separates these concerns. CertKit is the ACME client. One system handles domain validation, certificate requests, and renewals. Your servers never talk to the certificate authority. They never hold DNS credentials. They don’t need to understand ACME at all.
Instead of every server renewing its own certificates, CertKit lets your server subscribe to the certificates they need, and pull them automatically whenever they change. You install a lightweight agent that says “give me the cert for example.com” When the cert renews, the agent gets the new one automatically. No special ports. No DNS credentials on the box. No ACME knowledge required.
The certificates exist independently of your servers. Even if you never installed a single agent, CertKit would keep requesting, renewing, and storing valid certificates for your domains. Your infrastructure subscribes to certificates it needs.
This matters more every year
Certificate lifespans keep shrinking. 47-day certificates arrive in 2029. What’s merely annoying with annual renewals becomes impossible at that pace.
One sysadmin’s reaction to the 47-day proposal captured the frustration: “This is somewhat nightmarish. I have about 20 appliance-like services that have no support for automation.” VPN servers, load balancers, proxy servers, network gear. None of these can run CertBot.
But they can all receive a certificate file.
CertKit automates certificate lifecycle management. Currently in beta.
Comments