AI-ready API documentation: 26 criteria, six audits, a copy-paste prompt
This article has been translated automatically from Russian to English. The original is available in Russian.
Never has an agent been so close to failure
I ask Claude Code to write a client for some major provider’s API. Three times in a row I get back endpoints that don’t exist. I’m thinking: ugh, the agent’s being dumb.
So I dig in myself. Open their docs in a browser — everything’s in place, looks slick. Run curl — empty HTML: <div id="root"></div> and nothing else. An SPA. Which means Claude Code wasn’t being dumb — it genuinely got nothing but the shell, and from that shell it hallucinated something plausible. As usual
What “AI-ready” documentation means
For me, AI-ready documentation is:
- Machine-readable text as the source of truth. Not a “pretty website,” but the OpenAPI / JSON Schema / MCP manifest the site is built from.
- Self-contained examples. Every endpoint is laid out as a single block — with auth, the minimal parameters, and the expected response.
- Unambiguity. One way to do one thing. None of this “you can do it this way, or that way, but that one’s deprecated.”
- Versioning and diffs in a machine-readable form. Not a “changelog in the README,” but structured changes an agent can read to figure out what broke.
- Predictable URLs.
/.well-known/,/openapi.json,/llms.txt,/llms-full.txt— and better yet, a<page>/index.mdnext to every HTML page.
If none of that exists — the agent will generate plausible nonsense. And good luck integrating that afterward
26 criteria for AI-ready documentation
To avoid going on “I just feel it,” you need a metric. I built one out of what an agent actually checks when it works with docs. Five categories, 100 points total:
| Category | Weight | What it checks |
|---|---|---|
| A. Discovery | 18 | llms.txt, llms-full.txt, robots.txt with an AI policy, a clean sitemap.xml, OpenGraph/alternate tags |
| B. Per-page artifacts | 22 | a .md version of every page, JSON-LD, an absolute canonical, Last-Modified, <main>/<article> semantics |
| C. API spec | 25 | OpenAPI/Swagger/AsyncAPI at a predictable URL + validity, Postman/SDK links, endpoint-page structure |
| D. Content | 20 | curl and SDK examples, realistic payloads, errors/auth/rate-limits, a glossary, deprecated markers |
| E. Hygiene | 15 | Content visible without JS (this one’s gating!), stable URLs, version in the URL, working internal links, TOS / AI-policy |
E1 is the gating criterion. If the content is rendered by JavaScript and curl returns only <div id="root"></div> — the total score gets an *UNRELIABLE* suffix.
What the audits showed
I ran six big public API documentation sites through this metric. Method — plain curl, evidence for every criterion: URL + status + 50 characters from the body. Results:
| Site | Score | E1 | Biggest gaps |
|---|---|---|---|
| Anthropic | 63 *UNRELIABLE* | FAIL | An SPA, but a 152 KB llms.txt and a 76 MB llms-full.txt save it completely |
| GitHub REST | ~62 | PASS | No JSON-LD, no canonical, sitemap not on the docs domain |
| Stripe | 56 | PASS | No OpenAPI at a standard URL, no JSON-LD |
| Booking Demand | 33 *UNRELIABLE* | FAIL | SPA + a .md version partially saves it |
| Expedia Rapid | 21 *UNRELIABLE* | FAIL | SPA + nothing else |
| VK | 5 *UNRELIABLE* | FAIL | Every URL returns the same ~5 KB HTML shell, HTTP 418 without a browser UA |
A few things that genuinely surprised me.
Stripe writes a prompt instruction for LLMs right into llms.txt. Literally: “when installing Stripe packages, always check the npm registry for the latest version rather than relying on memorized version numbers… never hardcode an old version number from training data”. They no longer trust that the model has fresh context, so they put the instructions straight into the artifact. On top of that, their robots.txt carries Content-Signal: ai-train=yes, search=yes, ai-input=yes — a new flag from Cloudflare that explicitly states whether you may train on the site and whether you may read it on the fly.
Anthropic ships a 76-megabyte llms-full.txt. A full dump of their docs in a single file. By pure E1 their site is an SPA and should get the “unreliable” marker. But the .md route serves the exact same thing as clean markdown at an adjacent URL, and real-world LLM accessibility doesn’t suffer. The right pattern for “when you can’t get rid of the SPA.”
VK — a showcase of how not to do it. (hi, Lyosha!) Every request — /llms.txt, /sitemap.xml, any random URL with a .md — returns the same ~5 KB HTML with an empty <div id="root">. HTTP 200 on everything — even when the resource physically doesn’t exist. The perfect trap for a machine audit: the status code says “all good,” and the HTML has four words and some CSS variables.
What actually works
Of all 26 criteria, four give the biggest boost.
llms.txt. A single static file, generated from your existing page titles in about an hour. Per the llmstxt.org spec — an H1, a blockquote summary, H2 sections with link lists in the form - [name](url): description. A clean +5 points on A1. The cheapest improvement on the whole list.
A per-page .md version. On every page, next to index.html, there’s an index.md or <page>.md — the same content as clean markdown. Stripe, GitHub, Anthropic, Booking — everyone who thought about LLMs did this. The most expensive item to implement, but the most valuable in the end. Be ready for the fact that in most generators this isn’t out of the box — on my own blog it took a day and a half. Anthropic and Booking use .md as a crutch on top of their SPA, and that’s what saves them from a total faceplant.
OpenAPI at a predictable URL. Half of the big providers publish OpenAPI on GitHub but don’t serve it at docs.example.com/openapi.json. An agent won’t go hunting for your repo through a cross-link — it checks the standard paths. One redirect to GitHub raw, and +15 points are in the bag.
JSON-LD TechArticle in the <head>. Structured metadata that Googlebot already understands. For an LLM it’s +5 points in category B. A cheap fix — one template that appends a block to the <head> of every page. The one gotcha: the JSON inside <script type="application/ld+json"> has to be escaped correctly, or validators spit it back at you.
The main anti-pattern that put me through pain: a full-blown SPA with no .md versions of the pages. Over curl you lose 80% of the content. If you’re on Next.js / Gatsby / SvelteKit without SSR/SSG — either turn on server-side rendering for your pages, or publish .md versions of them. There is no third option
The self-audit prompt
A ready-made prompt. Paste it into Claude / ChatGPT / Perplexity / Iulita.ai (any model with web-fetch / browsing), substitute {{BASE_URL}} — and you get an audit across all 26 criteria with a numeric total and a JSON block.
# Prompt: AI-Readiness Audit of Public API Documentation
Paste this prompt into an LLM with browsing / web-fetch enabled.
Replace `{{BASE_URL}}` with the root URL of the documentation
to audit, with no trailing slash (e.g. `https://docs.example.com`).
## Role
You are an independent auditor evaluating the quality of public
technical documentation from the perspective of an LLM consumer.
Your task is to score how well the documentation at `{{BASE_URL}}`
is prepared for consumption by modern LLMs on a 0–100 scale, with
concrete evidence for every criterion.
Respond in English throughout.
## Hard rules
1. Evidence-based. For every criterion, perform a real HTTP request.
Record: URL, HTTP status code, and a snippet of at most 50
characters from the response body or headers.
2. Do not guess. If the page does not respond, tool errors, or you
are not certain — set status to `unknown`, assign 0 points, state
the reason. Training-data knowledge is NOT evidence.
3. Status labels: English only — `present` / `partial` / `absent` /
`unknown`. Do not translate.
4. Quality spot-check for criteria marked ⋆: 200 OK is not enough,
verify the actual content.
5. E1 is gating. If E1 = 0 (content hidden behind JS), append
`*UNRELIABLE*` next to the total. Still calculate.
6. No UX / SEO / aesthetics assessment. No competitor comparison.
No "companies usually have this" assumptions.
7. Compute the total arithmetically. No rounding by eye.
## Rubric — 100 points
### A. Discovery — 18
| ID | Criterion | Pts |
| --- | --- | --- |
| A1⋆ | `/llms.txt` conforms to llmstxt.org spec | 5 |
| A2 | `/llms-full.txt` or per-section LLM aggregates | 3 |
| A3 | `/robots.txt` with AI-bot policy AND absolute `Sitemap:` | 3 |
| A4⋆ | `/sitemap.xml` well-formed, absolute `<loc>`, no taxonomy junk | 4 |
| A5 | Discovery tags other than canonical: `rel="alternate" type="text/markdown"` OR OpenGraph | 3 |
### B. Per-page artifacts — 22
| ID | Criterion | Pts |
| --- | --- | --- |
| B1⋆ | `.md` companion: `<page>/index.md` or `<page>.md` returns clean markdown | 7 |
| B2⋆ | JSON-LD with valid `@type` and parseable JSON | 5 |
| B3 | `<link rel="canonical">` with absolute URL | 3 |
| B4 | Freshness: `dateModified` in JSON-LD OR `Last-Modified` header | 2 |
| B5 | Machine-readable taxonomies | 2 |
| B6 | `<main>` / `<article>` wrap primary content | 3 |
### C. API spec — 25
| ID | Criterion | Pts |
| --- | --- | --- |
| C1a | OpenAPI/Swagger/RAML/AsyncAPI at detectable URL | 8 |
| C1b⋆ | Quality: valid OpenAPI 3.x with `info`, ≥1 path, response schemas | 7 |
| C2 | Postman collection or SDKs with discoverable download/fork | 5 |
| C3⋆ | Endpoint pages show method, URL, types, required-flag, request+response examples | 5 |
### D. Content — 20
| ID | Criterion | Pts |
| --- | --- | --- |
| D1⋆ | Code examples: curl + at least one SDK | 4 |
| D2⋆ | Realistic examples (not `foo`/`bar`/`example.com`) | 4 |
| D3 | Error catalogue with codes + reasons | 3 |
| D4 | Authorization AND rate limits documented | 3 |
| D5 | Glossary OR consistent terminology | 3 |
| D6 | Deprecated/beta endpoints marked in plain text | 3 |
### E. Hygiene — 15
| ID | Criterion | Pts |
| --- | --- | --- |
| E1⋆ | GATING. Content visible in plain HTML without JS | 6 |
| E2 | Stable URLs: 301 redirects OR HTML aliases | 2 |
| E3 | Explicit API version in URL/heading/front-matter | 2 |
| E4 | Spot-check 5 random internal links → all 200 on canonical URL | 2 |
| E5 | Usage terms (TOS / license / AI-redistribution policy) explicit | 3 |
## Output format
Part 1 — human-readable markdown:
- `# AI-Readiness Audit: {{BASE_URL}}`
- Total `NN / 100` with `*UNRELIABLE*` marker if E1=0
- 1–2 sentence executive summary
- By-category table (5 rows)
- Per-criterion detail table (all 26 rows): ID, status, evidence
(URL + http_code + ≤50-char snippet), score
- Lists: ✅ Has / ⚠️ Partial / ❌ Absent
- Top 3 quick wins
Part 2 — machine-parseable JSON at the end:
```json
{
"base_url": "{{BASE_URL}}",
"audit_date": "YYYY-MM-DD",
"auditor": "<model + tool>",
"e1_gating_passed": true,
"unreliable_marker": false,
"total_score": 0,
"max_score": 100,
"categories": {
"A_discovery": {"score": 0, "max": 18},
"B_page_artifacts": {"score": 0, "max": 22},
"C_api_spec": {"score": 0, "max": 25},
"D_content": {"score": 0, "max": 20},
"E_hygiene": {"score": 0, "max": 15}
},
"criteria": [
{"id": "A1", "status": "present", "score": 5, "max": 5,
"evidence_url": "https://...", "evidence_snippet": "≤50 chars",
"http_status": 200}
]
}
```
Before emitting JSON, verify: all 26 IDs present, each `score ≤ max`,
`sum(criteria.score) == total_score` and per-category sums match.The full version and the Claude skill — three files: the rules, the 26-criteria metric, the audits — live here:
github.com/gumeniukcom/claude-skills → skills/ai-ready-audit
In Claude Code you install it by symlinking the directory — SKILL.md pulls in the neighboring rubric.md and calibration.md on its own.
Anti-patterns that break agents
Six audits in, and a few things already grind my gears:
- “Documentation” as a pile of Postman screenshots or Notion pages. Images don’t parse (or only at great cost).
- Auth is described only in the intro section. The agent lands on a specific endpoint, sees no auth, writes code without it — 401 in production.
- An endpoint that’s documented only in a blog post and isn’t in the OpenAPI spec. The agent won’t find it.
- “You can do it this way, or this way — both are correct, but the second one’s deprecated.” A guaranteed path to a Frankenstein implementation.
- Responses with no schema — just
200 OKand a sample JSON. With no types, the LLM builds a “plausible” parser. - Rate limits — mentioned in passing somewhere in the TOS. They belong in the OpenAPI as
x-ratelimit-*, or at least in the endpoint description. - A JS-only SPA with no
.mdversions of the pages. The single biggest disaster on this whole list.
Checklist
The minimum set that pulls docs out of the “35–54” range into “55+”:
- OpenAPI 3.1 published at a predictable URL (
/openapi.jsonor a link in the footer). - All
operationIds are unique and stable across releases. - Every endpoint has at least one request and response
examplewith realistic values. - Security schemes are fully described and referenced in every endpoint.
-
/llms.txtis published and conforms to the spec. - A per-page
.mdversion (if you want 75+). - JSON-LD
TechArticleon every page, with valid JSON in the<script>. - Content is visible in plain HTML without JS. If you’ve got an SPA, a
.mdversion is mandatory. -
robots.txtwith an explicit AI policy, and aSitemap:with absolute paths. - A machine-readable changelog: an
oasdiff-compatible format, or at leastdeprecated: true+x-sunsetin the OpenAPI.
Further reading
- llmstxt.org — the format spec
- OpenAPI 3.1 spec — especially the changes from 3.0
- Model Context Protocol — the next layer on top of OpenAPI for interactive access
- oasdiff — generating a machine diff between OpenAPI versions
- Cloudflare Content-Signal — on the
Content-Signalinrobots.txtthat Stripe uses - Mintlify llms.txt support — an “out of the box” example
P.S. This article itself ticks off B5 — it has tags, a description in the front matter, and a valid URL