How LAP Compresses API Specs 10×

Yesterday we talked about the problem: API specs are bloated, and agents pay the price. Today, the solution.

LAP compresses API specs by an average of 10× across our registry of 1,500+ APIs. Not by lossy summarization or AI-generated shortcuts — by systematic removal of everything an agent doesn’t need.

Here’s how.

Tier 1: Field Pruning

The easiest wins come from removing fields that serve human readers, not machine consumers.

A typical OpenAPI endpoint looks like this:

/users/{id}:
  get:
    summary: "Get a user by ID"
    description: "Retrieves a single user object by their unique identifier.
      The response includes all public profile fields. For private fields,
      use the /users/{id}/private endpoint with appropriate scopes.
      Rate limited to 100 requests per minute."
    operationId: getUserById
    tags: ["Users", "Core"]
    externalDocs:
      url: "https://docs.example.com/users"
    x-custom-field: "internal-tracking"

After LAP processing:

GET /users/{id} → Get a user by ID

What got removed:

description — the summary says it all. Descriptions often repeat the summary with extra prose
tags — organizational metadata for doc generators
externalDocs — agents can’t browse documentation URLs
operationId — internal identifier, not needed for calling the API
x-* extensions — vendor-specific metadata

This alone typically cuts 30-40% of token count.

Tier 2: Schema Deduplication

OpenAPI specs are full of repeated schema definitions. A User object might appear in GET /users, POST /users, PATCH /users, and GET /users/{id} — four copies of the same structure.

LAP resolves all $ref chains and deduplicates at the structural level. If two schemas are identical or near-identical (differing only in optionality), they’re collapsed into a single definition referenced by name.

The impact is format-dependent:

OpenAPI specs with deep $ref nesting: 40-60% schema reduction
Flat specs with inline schemas: 10-20% reduction

Tier 3: Enum Sampling

Some APIs define enums with dozens or hundreds of values. Currency codes (180 values), country codes (249), time zones (500+). An agent doesn’t need all of them to understand the parameter type.

LAP samples enums down to a representative subset — typically 3-5 values — with a count indicator:

Before: currency: enum [USD, EUR, GBP, JPY, AUD, CAD, CHF, CNY, ... 180 values]
After:  currency: enum [USD, EUR, GBP, ...+177]

The agent knows it’s a currency code, knows the format, and can infer valid values. The other 177 entries added nothing.

Tier 4: Description Truncation

When descriptions survive pruning (because no summary exists), they get truncated to the first meaningful sentence. API descriptions tend to follow a pattern: one sentence of what it does, three paragraphs of edge cases and caveats.

Before: "Creates a new payment intent. Payment intents guide you through
  the process of collecting a payment from your customer. They track
  the lifecycle of a customer checkout flow and trigger additional
  authentication steps when required by regulatory mandates, custom
  Radar rules, or redirect-based payment methods. For a list of
  supported payment methods..."

After: "Creates a new payment intent."

Agents that need the edge cases will discover them through API responses, not spec descriptions.

Tier 5: Response Simplification

Full response schemas are the biggest token sinks. A GET /orders response might include nested objects for line items, shipping, billing, tax breakdowns, refunds, and metadata — hundreds of schema lines for a single endpoint.

LAP reduces response schemas to their top-level structure. The agent knows what fields come back without drowning in nested definitions it doesn’t need upfront.

Format-Specific Handling

LAP isn’t just an OpenAPI tool. It handles five input formats, each with its own compression strategies:

Format	Avg. Input	Avg. Output	Compression
OpenAPI 3.x	89K tokens	12K tokens	7.4×
Swagger 2.0	65K tokens	9K tokens	7.2×
GraphQL	34K tokens	8K tokens	4.3×
AsyncAPI	28K tokens	7K tokens	4.0×
Protobuf	22K tokens	6K tokens	3.7×

OpenAPI benefits most because it’s the most verbose format. GraphQL is already relatively compact — its schema language was designed for machines from the start.

What Stays

Compression is only useful if agents can still do their job. LAP preserves:

Every endpoint — no routes are removed
Parameter names, types, and constraints — required/optional, min/max, patterns
Auth requirements — security schemes and where credentials go
Request body structure — what the API expects
Top-level response shape — what comes back
One-line descriptions — enough to choose the right endpoint

The Result

Across 500 benchmark runs with Claude Sonnet, agents using LAP-compressed specs achieved a 0.851 success rate compared to 0.824 on full specs. Better performance with half the tokens.

The compression isn’t magic. It’s the recognition that 90% of an API spec is written for a human audience that’s increasingly not the one reading it.

See it yourself:

pip install lapsh
lapsh compile petstore.yaml -t lean

Then diff the output against the original. You’ll see exactly what was removed — and realize none of it mattered for the agent.

⭐ Star the repo on GitHub if this approach resonates.