How LAP Compresses API Specs 10×
Yesterday we talked about the problem: API specs are bloated, and agents pay the price. Today, the solution.
LAP compresses API specs by an average of 10× across our registry of 1,500+ APIs. Not by lossy summarization or AI-generated shortcuts — by systematic removal of everything an agent doesn’t need.
Here’s how.
Tier 1: Field Pruning
The easiest wins come from removing fields that serve human readers, not machine consumers.
A typical OpenAPI endpoint looks like this:
/users/{id}:
get:
summary: "Get a user by ID"
description: "Retrieves a single user object by their unique identifier.
The response includes all public profile fields. For private fields,
use the /users/{id}/private endpoint with appropriate scopes.
Rate limited to 100 requests per minute."
operationId: getUserById
tags: ["Users", "Core"]
externalDocs:
url: "https://docs.example.com/users"
x-custom-field: "internal-tracking"
After LAP processing:
GET /users/{id} → Get a user by ID
What got removed:
description— thesummarysays it all. Descriptions often repeat the summary with extra prosetags— organizational metadata for doc generatorsexternalDocs— agents can’t browse documentation URLsoperationId— internal identifier, not needed for calling the APIx-*extensions — vendor-specific metadata
This alone typically cuts 30-40% of token count.
Tier 2: Schema Deduplication
OpenAPI specs are full of repeated schema definitions. A User object might appear in GET /users, POST /users, PATCH /users, and GET /users/{id} — four copies of the same structure.
LAP resolves all $ref chains and deduplicates at the structural level. If two schemas are identical or near-identical (differing only in optionality), they’re collapsed into a single definition referenced by name.
The impact is format-dependent:
- OpenAPI specs with deep
$refnesting: 40-60% schema reduction - Flat specs with inline schemas: 10-20% reduction
Tier 3: Enum Sampling
Some APIs define enums with dozens or hundreds of values. Currency codes (180 values), country codes (249), time zones (500+). An agent doesn’t need all of them to understand the parameter type.
LAP samples enums down to a representative subset — typically 3-5 values — with a count indicator:
Before: currency: enum [USD, EUR, GBP, JPY, AUD, CAD, CHF, CNY, ... 180 values]
After: currency: enum [USD, EUR, GBP, ...+177]
The agent knows it’s a currency code, knows the format, and can infer valid values. The other 177 entries added nothing.
Tier 4: Description Truncation
When descriptions survive pruning (because no summary exists), they get truncated to the first meaningful sentence. API descriptions tend to follow a pattern: one sentence of what it does, three paragraphs of edge cases and caveats.
Before: "Creates a new payment intent. Payment intents guide you through
the process of collecting a payment from your customer. They track
the lifecycle of a customer checkout flow and trigger additional
authentication steps when required by regulatory mandates, custom
Radar rules, or redirect-based payment methods. For a list of
supported payment methods..."
After: "Creates a new payment intent."
Agents that need the edge cases will discover them through API responses, not spec descriptions.
Tier 5: Response Simplification
Full response schemas are the biggest token sinks. A GET /orders response might include nested objects for line items, shipping, billing, tax breakdowns, refunds, and metadata — hundreds of schema lines for a single endpoint.
LAP reduces response schemas to their top-level structure. The agent knows what fields come back without drowning in nested definitions it doesn’t need upfront.
Format-Specific Handling
LAP isn’t just an OpenAPI tool. It handles five input formats, each with its own compression strategies:
| Format | Avg. Input | Avg. Output | Compression |
|---|---|---|---|
| OpenAPI 3.x | 89K tokens | 12K tokens | 7.4× |
| Swagger 2.0 | 65K tokens | 9K tokens | 7.2× |
| GraphQL | 34K tokens | 8K tokens | 4.3× |
| AsyncAPI | 28K tokens | 7K tokens | 4.0× |
| Protobuf | 22K tokens | 6K tokens | 3.7× |
OpenAPI benefits most because it’s the most verbose format. GraphQL is already relatively compact — its schema language was designed for machines from the start.
What Stays
Compression is only useful if agents can still do their job. LAP preserves:
- Every endpoint — no routes are removed
- Parameter names, types, and constraints — required/optional, min/max, patterns
- Auth requirements — security schemes and where credentials go
- Request body structure — what the API expects
- Top-level response shape — what comes back
- One-line descriptions — enough to choose the right endpoint
The Result
Across 500 benchmark runs with Claude Sonnet, agents using LAP-compressed specs achieved a 0.851 success rate compared to 0.824 on full specs. Better performance with half the tokens.
The compression isn’t magic. It’s the recognition that 90% of an API spec is written for a human audience that’s increasingly not the one reading it.
See it yourself:
pip install lapsh
lapsh compile petstore.yaml -t lean
Then diff the output against the original. You’ll see exactly what was removed — and realize none of it mattered for the agent.
⭐ Star the repo on GitHub if this approach resonates.