Never move a private key

There is no good reason to ever move a private key. Your private keys should live their entire life, from birth to death, within a single trust domain. The only exception applies to people who are working on eensy weensy microcontrollers that don’t have a hardware noise source. Are you one of those people? No? Well in that case you shouldn’t be moving a private key.

Too many times, I’ve heard people say:

“I need to SSH into the bastion host, so I’ll send the sysadmin team my private key”
“When a new node boots up, I’ll securely send it a keypair so that it can join the cluster”
“I want to scale out my website, so I need to copy my TLS private key onto each of my frontend load balancers.”
“I got a new laptop, so I’ll copy my existing SSH keys onto the new one and everything will just work.”
“I’ll store the key for signing kernel modules in my password manager”

All of these statements seem to have forgotten that the P in PKI stands for “public”. The public keys are the ones that move. The private keys stay put.

In public key cryptography, the only requisite ingredient should be a single integrity-protected packet sent during the setup phase of the communication. That’s the miracle of Diffie and Hellman’s invention: we never require a confidential channel, only an integrity-protected one. For instance, my operating system shipped with a set of trusted root certificates built-in; that trust bundle is the single integrity-protected packet which allows me to surf the web securely. I only need to get my OS from a trusted source, and the rest is gravy. Of course, if I’m getting my operating system from like…North Korea, then I have no way of knowing whether I’m getting the “real” Internet. Or maybe North Korea has the real Internet, and all of us westerners are stuck using a faux Internet 🤔. Anyway…

Example 1: SSH keys to the sysadmin team

Use your local entropy source to generate a private key. Send the public portion to the sysadmin team. They come up with some reason to trust you, like “he’s standing right in front of me and reciting the key’s fingerprint”, or “Josh’s key was signed by his manager’s key, and I have his manager’s key in the database already.” They then reconfigure the bastion host to trust your key, either by putting it into a database of trusted keys, or by signing it with their key, which turns it into a certificate.

Example 2: Node joining a cluster

Same deal: come up with a reason to trust the node, and it sends you its public key. Let’s say you’re working in the cloud, so: you send the node a challenge, and the node replies with a pre-signed request to the cloud’s identity service. You examine the pre-signed request carefully, validate that it matches your expectations, and then send it to the cloud provider. If the cloud provider responds in the affirmative, you now know the node’s identity, and you can mark the node’s public keys as trusted. Typically you’ll sign the node’s public key with your private key; the resulting certificate entitles the node to join the cluster.

Example 3: Web PKI

This one is tragic. The way that web PKI should work is that the owner of any given domain name gets a certificate authority (CA) certificate of their very own, with the Name Constraints extension set to exactly the domains that were validated by the upstream CA. This would allow the single long-lived private key to sit in an enclave somewhere, and when new webservers join the cluster, they have their public keys signed by the enclave on a short-lived basis, with renewals every hour or so.

Under such a system, it would be pretty hard to compromise your key through the webserver itself. At worst, the attacker gets a private key that expires in an hour or so. Heartbleed? Who is she? We don’t know her: our devops team just updated libssl, did a rolling reboot, and then went out to lunch. They’ve forgotten about Heartbleed by happy hour, and they certainly aren’t whining about it on their blog a decade later.

Sadly, the Name Constraints extension gathered dust starting from its original standardization by ITU (or was it IETF?) in June 1996 as part of the original set of extensions in version 3 of the X509 standard. Most client software didn’t implement it correctly, and some didn’t implement it at all. The level of correctness shot way up starting around when Ian Haken (Netflix) published BetterTLS in 2017.

At this point, the chief obstacle to this kind of usage is the CA/B forum, whose baseline requirements (section 8.7) impose an auditing requirement on all CA certificates, even ones with strict name constraints. LetsEncrypt (to name one example) can’t issue the average jane a name constrained certificate, on pain of being kicked out of the cool CAs club.

Even if we can’t have name-constrained certificates, we can still do better than hosting our TLS private keys within the same memory domain as our data plane traffic. OpenSSL supports configuring PKCS11 engines, which means you can set it up to use AWS KMS with aws-kms-pkcs11 (more of Ian Haken’s work)¹. Cloudflare supports this (since 2014), although you have to pay them beaucoup money.

Example 4: Backing up SSH keys

If you can back up your SSH keys, so can someone who compromises your system. Solve this problem with PKI: have your Yubikey (or any other secure enclave) sign your SSH logins. Or have your enclave sign a short-lived SSH certificate. Sadly Github feature-gates SSH certificates into their enterprise tier. But pretty much every other form of SSH server that I interact with in my daily life will happily take an SSH certificate.

Example 5: Signing software

Code signing is the hard mode of digital signatures: the signatures are meant to last forever, and revocation isn’t frequently implemented. It’s no surprise that teams screw it up all the time (1, 2, 3, 4). For most people the right move is going to be to outsource the hard part of key management to professionals. If you can’t do that, outsource to your favorite KMS. Do this with aws-kms-pkcs11, or its equivalent that works on your favorite cloud provider.

Sadly even a KMS isn’t a silver bullet: if you ever allow your attacker to sign certificates (e.g. because you misconfigured the KMS, or leaked a credential that accesses it, or because the KMS itself has a vulnerability), then you have to deal with the “I don’t know what I’ve signed” problem². It would actually be really cool if some popular KMS vendor offered a way to configure a KMS-hosted key in a “mandatory transparency” mode. That way you would always know what has been signed; and so can anyone you share the transparency ledger with.

There are also ways to do this with HSMs, which I assume people do because they are masochists or have compliance obligations (what’s the difference, really?). ↩
Actually you always have to deal with the “I don’t know what I’ve signed” problem, because you’re never 100% sure you haven’t misconfigured your KMS key policies. ↩