When the API Gateway Stops Scaling With You

Written by Joachim Spalink | Jul 1, 2026 7:20:25 AM

A central API gateway manages authentication, routing, and rate limiting across services. At scale, it becomes a coordination bottleneck. Here, we cover how replacing it with per-team AWS API Gateways, backed by a Kafka-driven authorizer and a central developer portal, restores team ownership without sacrificing governance.

Key takeaways

A central API gateway becomes a business bottleneck at scale: it slows delivery, increases shared incident risk, and turns the platform team into a gating function for everyone else.
Per-team gateway instances let product teams deploy new APIs without depending on a shared release window, reducing onboarding to a single-team task.
Centralizing auth policy in a developer portal preserves governance without coupling compliance changes to infrastructure deployments.
Policy updates propagate to every gateway in seconds via Kafka events, so compliance and security changes land without coordinated releases.
Putting each team's gateway in their own AWS account makes API traffic costs visible where the product decisions that drive them are made.
The platform team stops being a bottleneck for every API launch and instead owns the module, the authorizer, and the event contract.

Your API gateway sits at the edge of your application environment. It is where traffic from mobile apps, partner platforms, internal tools, and other services is received before it reaches the backend systems that process it. For a long time, running one centrally makes sense; routing, throttling, authentication, and observability all go through one place, operated by one team, under one contract that every service team has to honor.

Then the organization grows, and the same gateway starts costing you in ways that aren't on the architecture diagram. The traffic chart climbs, and so does the license bill, because the commercial model is priced per request. The release calendar fills up, but every change touches the same configuration, so changes get queued. A misconfigured route can take down endpoints belonging to teams that have never met. The platform team that owns the gateway becomes the bottleneck no one wants to be, and the only team capable of unblocking everyone else.

At a certain scale, the gateway stops being a piece of infrastructure and starts being a coordination problem.

That is the point at which we, working with a large enterprise client whose engineering organization had grown to dozens of teams across multiple AWS accounts, decided to take it apart.

Figure 1: The same authorizer logic and policy source, two very different deployment topologies.

The problem wasn't the gateway: It was where it lived

Centralization is fine while there are five services behind it. It becomes expensive, financially and organizationally, when there are dozens, owned by teams who deploy on different cadences, in different AWS accounts, with very different latency and compliance profiles.

Three pressures pushed us toward a decentralized model:

Cost visibility. With a single shared gateway, every team's traffic vanished into one line item. There was no way for a product team to see that their new feature had doubled API throughput, and therefore no way for them to weigh that traffic against the business value it generated. Decisions that should have been local were being made far away from the people qualified to make them.
Blast radius. A typo in a route, an over-eager rate-limit, or an authorizer misconfiguration could affect APIs across the company. Shared infrastructure means shared fate, and shared fate eventually means shared incidents.
Change throughput. Every new service, auth rule update, and header rewrite went through one team. That team was good. It was still one team.

None of these are unusual problems. They tend to surface together, and they tend to surface around the same time a team starts running quarterly capacity reviews for the gateway itself. If your central gateway is showing up in too many architecture review meetings, you are probably already paying these costs without naming them.

What we changed

We replaced the central gateway with a small, repeatable building block: an AWS API Gateway plus a Lambda authorizer, packaged as a deployable Terraform module. Every service team deploys an instance into their own AWS account, in front of their own services. They own the routes, the deploys, and the bill.

That description is easy to write and harder to live with. The hard part isn't deploying gateways; it's making sure consistency doesn't collapse the moment you let go of the single shared one. Three design choices did most of that work.

The authorizer is the contract

Routing and rate-limiting can vary per team; authentication cannot. Every gateway runs the same authorizer logic, distributed as part of the same Terraform module. If a team consumes the module, they get the org-wide auth behavior for free. They cannot accidentally diverge from it. That is the lever that lets the platform team relax control of where gateways run without losing control of how they authorize.

Configuration flows through the developer portal as events

Authorizer configuration is published from the central developer portal as Kafka messages. This covers which clients are valid, which scopes map to which APIs, and which keys have been rotated. Each authorizer subscribes to the topic and updates its in-memory view. A scope change made in the portal at 10:01 is in effect across every gateway within seconds, without a deploy.

The portal remains the single source of truth for what the policy is; the gateways are simply the places that enforce it. Governance doesn't have to be centralized infrastructure; it can be a centralized data flow over decentralized infrastructure. That distinction is the core of the approach.

Configuration can also be hardcoded

Kafka availability is not the right thing to bet your authentication path on. The same authorizer module accepts a baked-in configuration at deploy time. This matters in three situations: cold starts, where the authorizer should not wait on a topic catch-up before serving its first request; Kafka outages, where the authorizer keeps working on its last known good configuration rather than failing closed for reasons unrelated to authentication; and edge or restricted environments where the developer-portal connection is not available.

The portal is the source of truth in steady state. The hardcoded config is the floor.

Two authorizer variants, picked by workload

There are two builds of the authorizer. The Rust variant serves latency-sensitive paths where every millisecond of authorization overhead is visible in product metrics or cold-start sensitivity. The TypeScript variant is for teams whose APIs are not on the hot path and whose engineers are already fluent in Node. Both consume the same configuration and enforce the same rules. The choice is an ergonomics-versus-performance trade-off the team makes locally, not a decision the platform team has to make on their behalf.

Where this shows up in practice

The shift is most visible at three moments in a team's life.

Onboarding a new service. Previously, onboarding a new API meant a ticket to the platform team, a review cycle, a configuration change in the shared gateway, and a deploy coordinated against everyone else's release window. Now it is terraform apply against a module the team already understands, with auth behavior they don't have to reason about because it comes with the module. Onboarding becomes a single-team task rather than a multi-team coordination problem.

Rolling out an auth change. A scope that needs to be revoked across the estate used to be a coordinated rollout: a change in the central gateway, a window, a deploy, and a verification pass. Now it is a change in the developer portal that propagates as a Kafka event. Across dozens of gateways, the new policy is live within seconds. The platform team's job has moved up the stack. They no longer operate the gateway; they own the module, the authorizer code, and the event contract.

Reading the bill. Because each team's gateway lives in their account, their traffic is their cost. A team that ships a chatty client and triples its request volume sees the cost in their own AWS bill the same month. That signal lands with the people who can act on it, the team that owns the API, instead of being absorbed into a platform line item nobody reads. Meanwhile, the traffic-priced license on the old central gateway simply went away; that cost wasn't redistributed; it was eliminated.

Figure 2: A single change in the developer portal reaches every gateway in the fleet via a Kafka event, with a hardcoded fallback for cold starts and outages.

Trade-offs worth being honest about

Decentralization is not free. A few of the trade-offs worth flagging if you are evaluating this pattern.

You take on a fleet. You no longer run one gateway; you run dozens. The module makes them homogeneous, but observability, version rollout, and end-to-end testing all need to be designed for the fleet, not for the single instance.

The Kafka topic between the developer portal and the authorizers becomes a critical dependency. Its schema, retention, and replay semantics all matter. The hardcoded fallback mitigates outages but doesn't replace the need to operate that contract carefully.

Coordination shifts, but doesn't disappear. Service teams now coordinate less with the platform team but more with each other on cross-cutting concerns like CORS policy, error formats, and observability conventions. The module is where these conventions live. Keeping it opinionated is more valuable than keeping it flexible.

None of these are reasons not to do it. They are reasons to be deliberate about how you do it.

What this means for teams

For engineers, the day-to-day change is about ownership and friction. The gateway is no longer a piece of remote infrastructure they file tickets against; it is a module in their repo. They can read it, debug it locally, and reason about its failure modes. The authorizer is only a black box in the sense that teams don't have to think about it; they can still read and debug it when needed.

For decision-makers, the more important shift is structural. API costs become legible at the team level, letting engineering and product leadership have honest conversations about traffic, growth, and unit economics. Governance and infrastructure also stop being coupled: the developer portal can stay strict (one place, one policy) while the enforcement infrastructure can sprawl as widely as the organization needs. The platform team scales sub-linearly with the rest of engineering. They are no longer the gating function for every new service; instead, they maintain the module, the authorizer, and the event contract.

That last point is where the "scaling" in this story actually lives. Hardware and throughput scale easily. The thing that doesn't scale, in a growing engineering organization, is the human coordination overhead of any piece of infrastructure that every team has to share. Decentralizing the gateway is one of the cleaner ways to push that overhead out of the critical path.

Want to talk through a similar move?

If your central API gateway is showing up in capacity reviews, license renewals, or incident retrospectives more often than feels healthy, three moves are worth considering.

The most important move is decoupling policy from enforcement. Keep the developer portal as the source of truth and let enforcement points multiply. Events are a better distribution mechanism for policy than tickets are. Package the gateway as a module rather than a service the platform team operates: their product becomes the IaC, the authorizer, and the update path, not the running infrastructure.

Making cost a local signal is the third move. Putting gateways in team-owned accounts turns API traffic into a number the team that generates it can actually see, rather than a platform line item nobody reads.

The shape of the answer is straightforward. The hard work is in the contracts: the module's API, the authorizer's configuration schema, and the event flow between the developer portal and the fleet. Get those right, and the rest follows. Whether you are weighing a decentralization step, designing a developer portal, or rethinking how policy reaches your runtime, our DevOps and Platform Engineering and Cloud Engineering teams would be glad to compare notes.

View full post