Shimming S3

This post summarizes my talk “Patterns in S3 data access: Protecting and enhancing access to data banks, lakes, and bases“¹. Anyone who is interested in implementing any of these techniques should also watch Becky Weiss’s talk “Solving-large scale data access challenges with Amazon S3“².

Complex-pattern data

Let’s imagine I want to build a photo sharing service, something like Google Photos. It will have millions of users, billions of photos, and sharing features that allow users to share photo albums with each other. We could also say that I want to build a data lake with granular ACLs: it turns out they’re the same problem.

The issue, in both cases, is the degree of complexity needed to describe the access policy for any given piece of data. If we were building a photo storage service, then our access policy would be simple: deliver photos only to their original uploader. In the case of a storage service, there’s really only one access policy, copy-pasted for each of my millions of users. For a photo sharing service, on the other hand, there will be millions of relationships between photos, albums, and authorized viewers.

Lacking any better terminology, I call this a “complex-pattern data” problem. It has these attributes:

untrusted callers: we can’t build the authorization layer into the client itself
authorization can’t follow data shape: we can’t just group all data with the same access policy under a certain prefix in the blobstore
an authorization database that
- …is non-trivial in size
- …changes due to user actions

AWS IAM

The AWS IAM control-plane service lives in a single region (us-east-1), and is really good at solving its own narrow set of problems. Tracking a large and fast-moving authorization dataset is not one of them. Every IAM object has limits associated with the size of its authorization policies, and the IAM service as a whole will ratelimit you if you try to change anything too rapidly. IAM is meant to accept changes when your application’s relationship with the world changes. It isn’t meant to accommodate when your users’ relationships with the world change.

Capability-based security

Back in the 1970s, capability-based security was the solution for complex-pattern data. A capability is “communicable, unforgeable token of authority” (Wikipedia). In the ‘70s the operating system provided the communicability and unforgeability. The most common example is file access: one process can use its authority to open a file, and then pass the resulting capability (a file descriptor) to another process for it to use.

With a little help from cryptography, we can implement capability-based security over the network. And pretty much every AWS API already has the requisite tools, built right in. Spoiler alert: the rest of this post will cover how to make capabilities flow through our network, and what we can do with that skill.

AWS SigV4

When requests are sent to AWS, they are authenticated using a process called signature version 4. SigV4 never sends the user’s credentials across the wire: instead it sends HASH(credentials || my request)³ and shoves the result in an HTTP Authorization header. The recipient service performs the same computation on its end. If the results match, then the request is considered authenticated. Eric Brandwine has a great talk that discusses some of the nuances here; it’s required watching if you want to understand the method behind the madness.

Sidenote: if you’re using another cloud object store that emulates S3 (Google Cloud Storage, Cloudflare R2, Backblaze B2), you’re also using SigV4. So anything that works for S3 also works for your blobstore.

Off-ramps

While learning about how AWS does request signing is fun-and-all, it isn’t the kind of complexity we should be reaching for if we have any alternatives. So, before we go implement anything bespoke, we should evaluate all of our available off-ramps to determine if there are other—simpler—ways to solve a given problem using built-in AWS primitives. Becky’s talk goes into detail on three such techniques:

IAM policy with S3 prefixes can be quite useful, especially when combined with ABAC techniques. S3 Access Points provide another layer where policy can be enforced: they are a good option if you’re hitting the 20kB bucket policy limit, since you can have 10,000 access points per account-region.

STS session brokers are an extremely powerful technique that use the security token service (STS) AssumeRole API to mint new credentials on the fly. We mentioned earlier that IAM is a control plane service; STS is its high-throughput data-plane counterpart. With STS, we gain two powerful capabilities:

write inline policy on-the-fly, and apply it to an existing IAM role⁵
associate arbitrary session tags with a role. These tags can then be used to match against conditions in pre-existing IAM policy.

Each technique allows us to change IAM authorization behavior without making writes against the low-throughput IAM service. But there are two drawbacks:

STS has an undisclosed account-wide ratelimit, after which you’ll be throttled.
Inline policies can be at most 2kb. This is because the policy itself is actually embedded in the security token returned by STS, and must be sent in-band with every future API call.

If the STS session broker pattern fits your needs, you’d implement it by building a high-privilege authorization service that calls sts:AssumeRole to mint low-privilege credentials. You’d then return the low-privilege credentials to the caller (e.g. a data lake access job).

As a final off-ramp, we should mention permissions boundaries. Permissions boundaries are a mechanism for inter-team delegation within a company. Team A (often a Security team) writes a permission boundary, and allows Team B to build whatever policy they want to, within the boundary. This could be a good solution if the number of entities is small (~1000), and doesn’t change too often (~1 per hour).

Let’s proxy S3!

Please don’t. At any time, the S3 front door stands ready to handle more request rate than any proxy could ever manage. If we are going to proxy S3, then the proxy would quickly become the bottleneck in the data plane. We could try to pre-scale our proxy with enough capacity to match S3’s terabits per second, but then we’d just be burning a hole in our collective wallets with a cluster of proxies meant to handle the high watermark of throughput. Instead, we should recognize that we have two separable problems: (1) deciding whether to serve data to a given caller, (2) serving it.

IMO proxying is a valid solution for any other service, except massively scalable throughput-oriented ones. DynamoDB and EC2 are great choices. S3 is not. If you don’t believe me, listen to Becky.

Signers

Now that we’ve exhausted our other options, let’s talk signing. In the typical case, an AWS SDK signs a request using SigV4 and then sends the signed payload directly to the AWS service. But what happens if the SDK sends that signed request somewhere else in the network? If you do that unintentionally, we call it a data leak: a major incident. If you do it intentionally, we’ve built a signer service that can provide for the needs of other services on the network.

The signer service could sit anywhere in the network: provided it has the requisite credentials to access S3, it can do its job. The signer could even be a Lambda, if you prefer to stay serverless.

The basic flow of a signer service is:

authenticate your caller. This could occur with an HTTP cookie, in the case of the photo sharing service, or with something like an x509 certificate, in the case of a data lake job.
evaluate your authorization logic in the context of their request
sign a request to S3
send the signature to the client

Operations 1 and 2 are highly pluggable, so you can integrate your organization’s own logic in there, and tweak it over time to suit new needs. In the example code that accompanies this talk, I implemented authentication with HTTP basic auth.

The signer gets complete control of the request being sent by the client. If the signer decides to add a header, then the client must send exactly that header to S3 when redeeming the capability. One such header is the x-amz-content-sha256 header, which means that the signer can force a client to commit to the hash data being uploaded.


$ curl -s -u josh:password \
  -d '{
    "url":"https://s3.amazonaws.com/permanent/mykey",
    "method": "GET",
    "headers": {}
  }' \
  http://127.0.0.1:8000 | jq "."

{
  "url": "https://permanent-quoic7ui7jhvtjt6.s3.us-west-2.amazonaws.com/josh/mykey",
  "headers": {
    "x-amz-content-sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    "x-amz-expected-bucket-owner": "111111111111",
    "X-Amz-Date": "20230610T124405Z",
    "Authorization": "AWS4-HMAC-SHA256 Credential=AKIA3LROMZGCV47J47W6/20230610/us-west-2/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-expected-bucket-owner, Signature=7415160f39168fdb4fa47f67702c578ea3bcc97471743ff69933d518103fe473"
  }
}

In the above example, a client sends a request for an S3 object to the signer. The signer has decided to:

change the bucket name, and add an AWS region
prefix the requested key with the caller’s username, giving the caller the illusion of having the bucket to themselves.
add the x-amz-expected-bucket-owner header, which is a best-practice that most clients leave out.

Shimming clients

One common—and extremely reasonable—objection to this approach is that it requires a custom client implementation. But it turns out that the client-side logic can be performed in a minimally invasive shim against common S3 libraries, leaving the surrounding code unchanged. In the example code, I override the native signer and implement a plugin for aws-cli in 36 lines (as displayed on a slide).

Signers also interact naturally and natively with S3 multipart requests (both upload and download): the multipart request APIs are SigV4 signed, after all. If you want to get fancy, you can even have a signer anticipate the next few parts the client might request, and send them proactively to the client for it to cache. With this approach the number of roundtrips to the signer can be minimized.

Performance and reliability

Speaking of performance and round-trips, what kind of degradation should we expect by intermediating all of our calls to S3 with a signer? For a well-architected signer, the answer is just about none. If we break the latency of the signer down into its constituent components, they are:

Network latency
Authorization
Signing

Network-wise, S3’s TTFB is 15ms (at best), and a 100MB blob takes about 1 second to download, so spending a few hundred μs performing an intra-datacenter roundtrip is insignificant. Authorization is up to us: if we want it to be fast, we probably should maintain a local cache of the requisite authorization info, either as a total (pre-filled) cache or a partial (demand-filled) one. The final step (signing) is a non-entity: a SigV4 takes on the order of 1μs to compute, and needs to happen somewhere no matter what.

So latency doesn’t really matter. The important concern is always going to be reliability: a signer’s uptime will directly determine the uptime of whatever service it powers.

Changing responses

By their very nature, signers produce a capability that is usable to make requests to S3, and S3 is going to respond according to its own logic. For that reason, a pure signer cannot change responses. But it turns out that there’s no law that binds the signer to solely refer its clients to S3. Just like we saw above when the signer changed the bucket being requested, the signer could equally decide to point the client towards a service within my network, like a caching proxy.

In the example above, the signer attempts to give the caller the illusion of having the bucket to themselves. But a simple ListObjectsV2 call would show the actual objects in the bucket, piercing the illusion. The solution is to have the signer selectively route ListObjectsV2 calls back to itself, so that it can change the response data in addition to the request data. Unlike the S3 dataplane calls (e.g. GetObject, PutObject), ListObjectsV2 is meant to send only hundreds or thousands of bytes in a single response, not billions. This makes it a much better candidate for proxying, compared to “all of S3”.

Maybe we want to redact some documents, but only a small minority of the objects in the bucket qualify. A signer can enact that distinction by sending some requests to an S3 object lambda and the rest straight to S3.

Data abstractions atop S3

Let’s face it: S3 has a lousy metadata layer. But with a signer, we can build our own metadata layer! Here are some ideas.

Bucket migrations: track the progress of a data migration from Bucket A to Bucket B, and direct the client to the bucket that holds the most recent copy of whatever object they requested.
Atomic rename/copy: make a file appear to move around a bucket, atomically. In fact, we can do the same thing with whole directories!
Symlinking: make one file appear at multiple places within a single bucket.
Copy-on-write: two different keys can be backed by the same storage, right up until one of them gets overwritten
Smearing: transparently replicate keys that receive a high request rate.
Time travel queries: show a consistent snapshot of the bucket’s contents as of any point-in-time.
Data positioning: use a lookup table to store one customer’s data in the US, and another customer’s data in the EU.

The thing I love about these techniques is that they allow a Security or Data team to build abstractions that can work for an entire fleet of services, and do not require the involvement of an application team to deploy or reconfigure. You can be in a situation where you have thousands of S3 buckets in the “wrong” AWS account, and instead of needing to involve every single service owner in a laborious process of hand-held migration, you can simply ship a shim for their S3 client. The rest happens automagically.

fwd:CloudSec, 2023-06-12; slides, abstract ↩
re:Invent, 2022-12-01; slides ↩
Warning: this is DANGEROUSLY oversimplified. Click some of the links if you want the full story. ↩
A day in the life of a billion requests, re:Invent, 2022-11-30; slides ↩
The resulting policy is the intersection of the original policy and the inline policy, so only downscoping is possible: no upscoping. ↩