Designing OpenAPI-First Backend Systems

There are two ways to build an API. The first is “code-first”: write handlers, generate the spec from annotations, and ship it. The second is “design-first”: write the OpenAPI spec, get it reviewed, and let everything downstream — server skeletons, client SDKs, validation, mocks, tests — derive from that single artifact. Most teams claim to do the second and actually do the first, because the tooling around code-first is mature and the discipline around design-first is not.

The pay-off for getting design-first right is real and compounds over time. The spec becomes a contract that everyone agrees on before code exists, integration bugs surface in PR review rather than production, and the API surface stays consistent because no one is reinventing pagination patterns inside individual handlers. This post is about how to actually do that — what the workflow looks like, what tools are worth adopting, and where the trade-offs sit.

What “Design-First” Actually Means

It does not mean “write the spec before writing code.” It means “the spec is the authoritative source of truth, and code is derived or validated against it.” The distinction matters because the first phrasing collapses into code-first the moment anyone makes a change in the handler that does not appear in the spec.

A working design-first workflow:

  1. Engineer writes or updates openapi.yaml.
  2. PR review is on the spec itself — endpoints, schemas, examples, errors.
  3. Server scaffolding is generated or regenerated.
  4. Handlers implement business logic for the generated interfaces.
  5. CI validates that the running server matches the spec (contract tests).
  6. Client SDKs and documentation are regenerated automatically.

The spec is checked into the repository, versioned, and treated with the same review rigor as code. Any divergence between spec and implementation is a CI failure.

The Spec as an Engineering Artifact

OpenAPI 3.1 (which aligns with JSON Schema 2020-12) is the standard worth targeting in 2026. A few practices separate a useful spec from a wall of YAML:

  • Components and $ref everywhere. Define every schema, parameter, response, and error once in components/. A spec where the same User schema is inlined in eight responses will diverge within a sprint.
  • Examples are part of the contract. Every request and response should have at least one realistic example. Consumers read examples first and schema second; mocks generate from examples.
  • Errors as schemas. Standardize on a single error envelope (ProblemDetails from RFC 9457 is a defensible default), reference it from every error response, and enumerate error codes as enums.
  • Operation IDs are stable. operationId is what code generators use to name functions. Treat them as a public API; renames break SDK consumers.
  • Tags map to bounded contexts. Group operations by domain concept, not by HTTP verb. A users tag is useful; a get tag is not.

A small, illustrative example:

openapi: 3.1.0
info: { title: Orders API, version: 1.4.0 }
paths:
/orders/{orderId}:
get:
operationId: getOrder
tags: [orders]
parameters:
- { $ref: '#/components/parameters/OrderId' }
responses:
'200':
description: Order found
content:
application/json:
schema: { $ref: '#/components/schemas/Order' }
'404':
$ref: '#/components/responses/NotFound'
default:
$ref: '#/components/responses/Error'
components:
parameters:
OrderId:
name: orderId
in: path
required: true
schema: { type: string, format: uuid }
schemas:
Order: ...
Problem:
type: object
required: [type, title, status]
properties:
type: { type: string, format: uri }
title: { type: string }
status: { type: integer }
detail: { type: string }
instance: { type: string }
responses:
NotFound:
description: Resource not found
content:
application/problem+json:
schema: { $ref: '#/components/schemas/Problem' }

Code Generation: What Actually Works

The OpenAPI generator ecosystem is large, uneven, and occasionally hostile. A few generators that have held up under production use:

  • openapi-generator (Java). The Swiss army knife. Generates servers and clients for 50+ languages. Quality varies sharply by target language; the Python, Go, and TypeScript generators are reliable; some others are not.
  • oapi-codegen (Go). Generates idiomatic Go server interfaces and clients. The current best-in-class for Go.
  • orval / openapi-typescript-codegen for TypeScript clients with React Query / SWR / Axios bindings.
  • fastapi-code-generator / datamodel-code-generator for Python — generates Pydantic models and FastAPI route stubs.
  • speakeasy / Fern / Stainless — managed SDK generators with stronger output quality at the cost of a vendor dependency.

The right strategy is server-side strict, client-side generous. Generate strongly-typed server interfaces that force handlers to match the spec exactly; generate ergonomic clients that handle edge cases (retries, deserialization tolerance, error mapping) without leaking spec mechanics to consumers.

Server Stubs in FastAPI

FastAPI itself derives OpenAPI from code, which is design-first’s natural enemy. The discipline that makes it work:

from fastapi import FastAPI, APIRouter
from pydantic import BaseModel, Field
class Order(BaseModel):
id: str
status: Literal["pending", "shipped", "delivered"]
total_cents: int = Field(ge=0)
router = APIRouter(tags=["orders"])
@router.get(
"/orders/{order_id}",
response_model=Order,
responses={404: {"model": Problem}},
operation_id="getOrder",
)
async def get_order(order_id: UUID) -> Order: ...

Then in CI, dump the running app’s spec and diff it against the source spec:

Terminal window
python -c "import app, json; print(json.dumps(app.app.openapi()))" > generated.json
openapi-diff source.yaml generated.json --fail-on-incompatible

oasdiff is the right tool here — it understands semantic compatibility (added optional fields are fine; removed fields are breaking) rather than naive textual diffs.

Validation: Where the Spec Earns Its Keep

Once you have a contract, enforce it. Three checkpoints:

Request Validation

Generated server stubs handle this for typed parameters and bodies. For dynamically-typed languages, attach a request validator middleware (express-openapi-validator, FastAPI’s built-in Pydantic, connexion for Flask) that rejects requests violating the schema with 400 Bad Request before they reach handlers. Removing input-validation code from handlers is the single largest reduction in handler complexity that design-first delivers.

Response Validation

Generated responses should be validated against the spec in non-production environments. A response that drifts from the schema is a contract bug that will silently break clients. In production this is usually too expensive; instead, run a sampling validator (1–5% of responses) and alert on schema mismatches.

Contract Tests

For service-to-service integrations, consumer-driven contract testing with Pact or schema-level tests with dredd / schemathesis catch the most damaging breakage class: a provider changing the response shape without telling consumers. schemathesis in particular is excellent — it generates property-based tests from the spec and finds edge cases (negative numbers, very long strings, unexpected nulls) that handlers rarely consider.

Terminal window
schemathesis run --checks all --hypothesis-seed=42 http://localhost:8000/openapi.json

Linting and Style Enforcement

A spec that compiles is not necessarily a spec that is good. Spectral (Stoplight) is the de facto linter, with rule sets for OWASP, ergonomics, and consistency.

A baseline ruleset worth enforcing:

  • Every operation has operationId, summary, description, and at least one tag.
  • Every 4xx/5xx response references a shared error schema.
  • Path parameters are kebab-case or {camelCase} (pick one) and consistent across endpoints.
  • Pagination uses a standard pattern (cursor or limit/offset, not both ad hoc).
  • All 2xx responses define a content type explicitly.
  • Schema property names are consistent in case (camelCase vs snake_case — choose one).

Run Spectral in pre-commit hooks and in CI. Auto-failing on lint violations keeps the spec from rotting.

API Versioning

Versioning is one of the few design decisions that is genuinely irreversible — once a v1 is in use you cannot reshape it. Two patterns survive contact with production:

  • URI versioning. /v1/orders, /v2/orders. Brutally simple, cache-friendly, debuggable. The downside is that v1 and v2 cannot share a code path easily.
  • Header versioning. Accept: application/vnd.example.v2+json. Cleaner URLs, harder to debug, requires more discipline from clients.

Whichever you choose, version the contract not individual endpoints. Mixed versioning (“/v1/users and /v2/orders”) is a maintenance nightmare. Use additive evolution (add fields, never remove them) within a major version and reserve breaking changes for major version bumps.

Mocks, SDKs, and Documentation

Three deliverables fall out of a well-maintained spec for free:

  • Mock servers. prism (Stoplight), imposter, and Postman all generate mock servers directly from a spec. Frontend teams can start integrating before the backend exists.
  • SDKs. Generated TypeScript, Python, Go, etc. — automatically. Publish them to internal package registries. SDK consumers never write fetch logic.
  • API documentation. Redoc and Swagger UI render the spec directly. Stripe-quality documentation is mostly a function of well-written examples and descriptions in the spec itself.

These deliverables, taken together, change the economics of cross-team integration. A spec change goes live as a typed SDK release the same day, and the consuming team’s compiler tells them what changed.

Failure Modes

Design-first has its own anti-patterns. The ones to watch for:

  • The spec lies. The biggest single failure: spec and implementation drift, no CI gate catches it, consumers trust the spec, and incidents follow. The contract-test gate is the answer; do not skip it.
  • Premature abstraction in schemas. Inheritance chains and oneOf polymorphism in OpenAPI work but generate awkward code in most languages. Prefer flat structures with a discriminator field over deep type hierarchies.
  • Treating the spec as a write-once artifact. Specs need refactoring just like code — components extracted, naming normalized, dead operations removed. Spec rot is real.
  • Generator lock-in. Some generators are abandoned or low-quality. Read their output before adopting. The cost of switching generators mid-project is high.
  • Over-strict validation in production. A request validator that rejects requests for a legitimate-but-unspecified header is a self-inflicted incident. Validate strictly in dev; in production, validate inputs and log (rather than reject) unexpected output mismatches.

Operational Pay-Off

When design-first is working, you can observe specific operational improvements:

  • Reduced “what does this API do?” tickets because docs are accurate.
  • Faster onboarding for new services because they integrate against generated clients.
  • Fewer regressions on breaking changes because CI catches them at PR time.
  • Cleaner service boundaries because the spec forces explicit consideration of every shared concept.
  • Better incident response because API behavior is documented and reproducible.

The cost is the discipline required to keep the spec authoritative — and that discipline must be enforced by tooling, not by convention.

Closing

Design-first is not a methodology religion; it is the recognition that an API is, before anything else, a contract between services and people. The spec is the artifact of that contract. Code that derives from the spec stays consistent with it; code that the spec is generated from drifts the moment anyone is in a hurry. The investment is upfront: write the YAML, set up the generators, wire the contract tests into CI, lint the spec like you lint code. The return is durable: cross-team integrations stop being a source of incidents, SDKs are typed and current, documentation stays accurate, and the API surface stays coherent as the system grows. The trick is to treat the spec as authoritative and let the tooling enforce that. Anything else is design-first in name only.