Software artifacts: hash, don't sign

In server-side software, cryptography is like war: a last resort for when you’ve tried all of your other options, and they all failed.

– myself, November 19, 2021

Despite my misgivings about cryptography, there is one kind of cryptography that I am happy to use in almost any circumstance: hashing. Cryptographic hashes are miraculous. They can summarize1 the entire state of a large-scale distributed system, leaving zero degrees of freedom for an attacker (or misconfiguration) to exploit. Hashes have no keys to leak or misuse. Hashes are configuration-free: anyone can verify that a given input hashes to a certain output. There is no greater tool for protecting the integrity of a system.

Signing

Once you have hashed a document, a little demon may appear and offer to “upgrade” that hash into a digital signature. (Every modern digital signature algorithm takes a cryptographic hash as its input; cryptographic hashing is always the precursor step to signing2). In most cases you should send the imp packing.

Signing is hashing’s evil twin. A digital signature is proof that someone, at some point, had possession of a cryptographic key. Who is that someone? Beats me. Do we trust them? Maybe. Will they leak their key at some point in the future? Sorry, I misplaced my crystal ball. Were they bribed, tricked, or coerced into signing? I hope not. If they had the opportunity to sign again, would they? I don’t know…ask them.

I’ll admit that artifact signing has some limited usefulness when transferring software between organizations. In that context it can prevent an attacker who gains access to a software publisher’s website from undetectably modifying artifacts3. It is important that the signatures be time-bound: hosting a recent copy of the Ubuntu package archive is an incredible public service; hosting a stale copy is a replay attack (intentional or not).

As a defender, no matter what, incorporating a third party artifact into your build should involve the same three step procedure:

  1. verify the signature
  2. write a lockfile with the hash you verified
  3. build against that lockfile

ACLs rule everything around me

Within an organization it is better to build artifact management around a plain-old RDBMS. When the CI system builds an artifact, it inserts a row to the database, storing information like: the current time, the configuration of the build environment, the SHA of the git commit it processed, the hash of the output artifact, etc. Plain-old ACLs (and stored procedures, if you like) can be used to ensure that only authorized CI control plane nodes can write into the DB.

If you don’t trust DB ACLs, then the CI system can even (gasp!) sign the database row as a way of attesting to the entire build provenance and context, rather than just the contents of the output artifact. Honestly though, this doesn’t improve much, and may even make your situation worse. Yes, signing allows you to exclude your cloud provider’s relational database service (RDS) from your threat model. But you’re now adding your provider’s key management service (KMS) to the threat model (you are storing your key in at least a Level 1 enclave, right? CI machines don’t get direct access to the key material, right?).

It’s an ACL either way: you either build an ACL around access to the DB, or you build an ACL around access to the signing key. The problem with the latter approach is that if an attacker ever gets illicit access to the signing key, all rows in the DB are potentially suspect until eternity, and you have to start over with a new signing key. By contrast, if you misconfigure your DB ACLs and an attacker manages to stuff the DB with bad rows, the defender can simply restore from last night’s backup and move on with life.

At deploy time, the procedure is quite boring. Your deploy tool selects an artifact that it likes from the DB. It transmits the SHA256 hash of that artifact to the workload machines over an integrity-protected channel (TLS will do). The workload machines download the artifact and verify its hash. Done.


  1. Usually in the form of a Merkle tree. Docker images, git commits, and TPM PCRs are all Merkle trees. 

  2. There are some highly technical exceptions, like PureEdDSA, where the cryptographic hashing is performed as an inseparable part of the signing process. 

  3. It won’t do anything to stop someone from detectably corrupting artifacts, like Jia Tan did. In fact, a signature might even lure you into a false sense of security.