AI-ready API documentation: 26 criteria, six audits, a copy-paste prompt

2026-05-13 2364 words 12 minutes

Contents

Automatic Translation

This article has been translated automatically from Russian to English. The original is available in Russian.

Never has an agent been so close to failure

I ask Claude Code to write a client for some major provider’s API. Three times in a row I get back endpoints that don’t exist. I’m thinking: ugh, the agent’s being dumb.

So I dig in myself. Open their docs in a browser — everything’s in place, looks slick. Run curl — empty HTML: <div id="root"></div> and nothing else. An SPA. Which means Claude Code wasn’t being dumb — it genuinely got nothing but the shell, and from that shell it hallucinated something plausible. As usual

What “AI-ready” documentation means

For me, AI-ready documentation is:

Machine-readable text as the source of truth. Not a “pretty website,” but the OpenAPI / JSON Schema / MCP manifest the site is built from.
Self-contained examples. Every endpoint is laid out as a single block — with auth, the minimal parameters, and the expected response.
Unambiguity. One way to do one thing. None of this “you can do it this way, or that way, but that one’s deprecated.”
Versioning and diffs in a machine-readable form. Not a “changelog in the README,” but structured changes an agent can read to figure out what broke.
Predictable URLs. /.well-known/, /openapi.json, /llms.txt, /llms-full.txt — and better yet, a <page>/index.md next to every HTML page.

If none of that exists — the agent will generate plausible nonsense. And good luck integrating that afterward

26 criteria for AI-ready documentation

To avoid going on “I just feel it,” you need a metric. I built one out of what an agent actually checks when it works with docs. Five categories, 100 points total:

Category	Weight	What it checks
A. Discovery	18	`llms.txt`, `llms-full.txt`, `robots.txt` with an AI policy, a clean `sitemap.xml`, OpenGraph/alternate tags
B. Per-page artifacts	22	a `.md` version of every page, JSON-LD, an absolute canonical, `Last-Modified`, `<main>`/`<article>` semantics
C. API spec	25	OpenAPI/Swagger/AsyncAPI at a predictable URL + validity, Postman/SDK links, endpoint-page structure
D. Content	20	curl and SDK examples, realistic payloads, errors/auth/rate-limits, a glossary, deprecated markers
E. Hygiene	15	Content visible without JS (this one’s gating!), stable URLs, version in the URL, working internal links, TOS / AI-policy

E1 is the gating criterion. If the content is rendered by JavaScript and curl returns only <div id="root"></div> — the total score gets an *UNRELIABLE* suffix.

What the audits showed

I ran six big public API documentation sites through this metric. Method — plain curl, evidence for every criterion: URL + status + 50 characters from the body. Results:

Site	Score	E1	Biggest gaps
Anthropic	63 `UNRELIABLE`	FAIL	An SPA, but a 152 KB `llms.txt` and a 76 MB `llms-full.txt` save it completely
GitHub REST	~62	PASS	No JSON-LD, no canonical, sitemap not on the docs domain
Stripe	56	PASS	No OpenAPI at a standard URL, no JSON-LD
Booking Demand	33 `UNRELIABLE`	FAIL	SPA + a `.md` version partially saves it
Expedia Rapid	21 `UNRELIABLE`	FAIL	SPA + nothing else
VK	5 `UNRELIABLE`	FAIL	Every URL returns the same `~5 KB` HTML shell, `HTTP 418` without a browser UA

A few things that genuinely surprised me.

Stripe writes a prompt instruction for LLMs right into llms.txt. Literally: “when installing Stripe packages, always check the npm registry for the latest version rather than relying on memorized version numbers… never hardcode an old version number from training data”. They no longer trust that the model has fresh context, so they put the instructions straight into the artifact. On top of that, their robots.txt carries Content-Signal: ai-train=yes, search=yes, ai-input=yes — a new flag from Cloudflare that explicitly states whether you may train on the site and whether you may read it on the fly.

Anthropic ships a 76-megabyte llms-full.txt. A full dump of their docs in a single file. By pure E1 their site is an SPA and should get the “unreliable” marker. But the .md route serves the exact same thing as clean markdown at an adjacent URL, and real-world LLM accessibility doesn’t suffer. The right pattern for “when you can’t get rid of the SPA.”

VK — a showcase of how not to do it. (hi, Lyosha!) Every request — /llms.txt, /sitemap.xml, any random URL with a .md — returns the same ~5 KB HTML with an empty <div id="root">. HTTP 200 on everything — even when the resource physically doesn’t exist. The perfect trap for a machine audit: the status code says “all good,” and the HTML has four words and some CSS variables.

What actually works

Of all 26 criteria, four give the biggest boost.

llms.txt. A single static file, generated from your existing page titles in about an hour. Per the llmstxt.org spec — an H1, a blockquote summary, H2 sections with link lists in the form - [name](url): description. A clean +5 points on A1. The cheapest improvement on the whole list.

A per-page .md version. On every page, next to index.html, there’s an index.md or <page>.md — the same content as clean markdown. Stripe, GitHub, Anthropic, Booking — everyone who thought about LLMs did this. The most expensive item to implement, but the most valuable in the end. Be ready for the fact that in most generators this isn’t out of the box — on my own blog it took a day and a half. Anthropic and Booking use .md as a crutch on top of their SPA, and that’s what saves them from a total faceplant.

OpenAPI at a predictable URL. Half of the big providers publish OpenAPI on GitHub but don’t serve it at docs.example.com/openapi.json. An agent won’t go hunting for your repo through a cross-link — it checks the standard paths. One redirect to GitHub raw, and +15 points are in the bag.

JSON-LD TechArticle in the <head>. Structured metadata that Googlebot already understands. For an LLM it’s +5 points in category B. A cheap fix — one template that appends a block to the <head> of every page. The one gotcha: the JSON inside <script type="application/ld+json"> has to be escaped correctly, or validators spit it back at you.

The main anti-pattern that put me through pain: a full-blown SPA with no .md versions of the pages. Over curl you lose 80% of the content. If you’re on Next.js / Gatsby / SvelteKit without SSR/SSG — either turn on server-side rendering for your pages, or publish .md versions of them. There is no third option

The self-audit prompt

A ready-made prompt. Paste it into Claude / ChatGPT / Perplexity / Iulita.ai (any model with web-fetch / browsing), substitute {{BASE_URL}} — and you get an audit across all 26 criteria with a numeric total and a JSON block.

# Prompt: AI-Readiness Audit of Public API Documentation

Paste this prompt into an LLM with browsing / web-fetch enabled.
Replace `{{BASE_URL}}` with the root URL of the documentation
to audit, with no trailing slash (e.g. `https://docs.example.com`).

## Role

You are an independent auditor evaluating the quality of public
technical documentation from the perspective of an LLM consumer.
Your task is to score how well the documentation at `{{BASE_URL}}`
is prepared for consumption by modern LLMs on a 0–100 scale, with
concrete evidence for every criterion.

Respond in English throughout.

## Hard rules

1. Evidence-based. For every criterion, perform a real HTTP request.
   Record: URL, HTTP status code, and a snippet of at most 50
   characters from the response body or headers.
2. Do not guess. If the page does not respond, tool errors, or you
   are not certain — set status to `unknown`, assign 0 points, state
   the reason. Training-data knowledge is NOT evidence.
3. Status labels: English only — `present` / `partial` / `absent` /
   `unknown`. Do not translate.
4. Quality spot-check for criteria marked ⋆: 200 OK is not enough,
   verify the actual content.
5. E1 is gating. If E1 = 0 (content hidden behind JS), append
   `*UNRELIABLE*` next to the total. Still calculate.
6. No UX / SEO / aesthetics assessment. No competitor comparison.
   No "companies usually have this" assumptions.
7. Compute the total arithmetically. No rounding by eye.

## Rubric — 100 points

### A. Discovery — 18

| ID | Criterion | Pts |
| --- | --- | --- |
| A1⋆ | `/llms.txt` conforms to llmstxt.org spec | 5 |
| A2 | `/llms-full.txt` or per-section LLM aggregates | 3 |
| A3 | `/robots.txt` with AI-bot policy AND absolute `Sitemap:` | 3 |
| A4⋆ | `/sitemap.xml` well-formed, absolute `<loc>`, no taxonomy junk | 4 |
| A5 | Discovery tags other than canonical: `rel="alternate" type="text/markdown"` OR OpenGraph | 3 |

### B. Per-page artifacts — 22

| ID | Criterion | Pts |
| --- | --- | --- |
| B1⋆ | `.md` companion: `<page>/index.md` or `<page>.md` returns clean markdown | 7 |
| B2⋆ | JSON-LD with valid `@type` and parseable JSON | 5 |
| B3 | `<link rel="canonical">` with absolute URL | 3 |
| B4 | Freshness: `dateModified` in JSON-LD OR `Last-Modified` header | 2 |
| B5 | Machine-readable taxonomies | 2 |
| B6 | `<main>` / `<article>` wrap primary content | 3 |

### C. API spec — 25

| ID | Criterion | Pts |
| --- | --- | --- |
| C1a | OpenAPI/Swagger/RAML/AsyncAPI at detectable URL | 8 |
| C1b⋆ | Quality: valid OpenAPI 3.x with `info`, ≥1 path, response schemas | 7 |
| C2 | Postman collection or SDKs with discoverable download/fork | 5 |
| C3⋆ | Endpoint pages show method, URL, types, required-flag, request+response examples | 5 |

### D. Content — 20

| ID | Criterion | Pts |
| --- | --- | --- |
| D1⋆ | Code examples: curl + at least one SDK | 4 |
| D2⋆ | Realistic examples (not `foo`/`bar`/`example.com`) | 4 |
| D3 | Error catalogue with codes + reasons | 3 |
| D4 | Authorization AND rate limits documented | 3 |
| D5 | Glossary OR consistent terminology | 3 |
| D6 | Deprecated/beta endpoints marked in plain text | 3 |

### E. Hygiene — 15

| ID | Criterion | Pts |
| --- | --- | --- |
| E1⋆ | GATING. Content visible in plain HTML without JS | 6 |
| E2 | Stable URLs: 301 redirects OR HTML aliases | 2 |
| E3 | Explicit API version in URL/heading/front-matter | 2 |
| E4 | Spot-check 5 random internal links → all 200 on canonical URL | 2 |
| E5 | Usage terms (TOS / license / AI-redistribution policy) explicit | 3 |

## Output format

Part 1 — human-readable markdown:

- `# AI-Readiness Audit: {{BASE_URL}}`
- Total `NN / 100` with `*UNRELIABLE*` marker if E1=0
- 1–2 sentence executive summary
- By-category table (5 rows)
- Per-criterion detail table (all 26 rows): ID, status, evidence
  (URL + http_code + ≤50-char snippet), score
- Lists: ✅ Has / ⚠️ Partial / ❌ Absent
- Top 3 quick wins

Part 2 — machine-parseable JSON at the end:

```json
{
  "base_url": "{{BASE_URL}}",
  "audit_date": "YYYY-MM-DD",
  "auditor": "<model + tool>",
  "e1_gating_passed": true,
  "unreliable_marker": false,
  "total_score": 0,
  "max_score": 100,
  "categories": {
    "A_discovery": {"score": 0, "max": 18},
    "B_page_artifacts": {"score": 0, "max": 22},
    "C_api_spec": {"score": 0, "max": 25},
    "D_content": {"score": 0, "max": 20},
    "E_hygiene": {"score": 0, "max": 15}
  },
  "criteria": [
    {"id": "A1", "status": "present", "score": 5, "max": 5,
     "evidence_url": "https://...", "evidence_snippet": "≤50 chars",
     "http_status": 200}
  ]
}
```

Before emitting JSON, verify: all 26 IDs present, each `score ≤ max`,
`sum(criteria.score) == total_score` and per-category sums match.

The full version and the Claude skill — three files: the rules, the 26-criteria metric, the audits — live here:

github.com/gumeniukcom/claude-skills → skills/ai-ready-audit

In Claude Code you install it by symlinking the directory — SKILL.md pulls in the neighboring rubric.md and calibration.md on its own.

Anti-patterns that break agents

Six audits in, and a few things already grind my gears:

“Documentation” as a pile of Postman screenshots or Notion pages. Images don’t parse (or only at great cost).
Auth is described only in the intro section. The agent lands on a specific endpoint, sees no auth, writes code without it — 401 in production.
An endpoint that’s documented only in a blog post and isn’t in the OpenAPI spec. The agent won’t find it.
“You can do it this way, or this way — both are correct, but the second one’s deprecated.” A guaranteed path to a Frankenstein implementation.
Responses with no schema — just 200 OK and a sample JSON. With no types, the LLM builds a “plausible” parser.
Rate limits — mentioned in passing somewhere in the TOS. They belong in the OpenAPI as x-ratelimit-*, or at least in the endpoint description.
A JS-only SPA with no .md versions of the pages. The single biggest disaster on this whole list.

Checklist

The minimum set that pulls docs out of the “35–54” range into “55+”:

OpenAPI 3.1 published at a predictable URL (/openapi.json or a link in the footer).
All operationIds are unique and stable across releases.
Every endpoint has at least one request and response example with realistic values.
Security schemes are fully described and referenced in every endpoint.
/llms.txt is published and conforms to the spec.
A per-page .md version (if you want 75+).
JSON-LD TechArticle on every page, with valid JSON in the <script>.
Content is visible in plain HTML without JS. If you’ve got an SPA, a .md version is mandatory.
robots.txt with an explicit AI policy, and a Sitemap: with absolute paths.
A machine-readable changelog: an oasdiff-compatible format, or at least deprecated: true + x-sunset in the OpenAPI.

Contents

AI-ready API documentation: 26 criteria, six audits, a copy-paste prompt

Never has an agent been so close to failure

What “AI-ready” documentation means

26 criteria for AI-ready documentation

What the audits showed

What actually works

The self-audit prompt

Anti-patterns that break agents

Checklist

Further reading