Making Your Blog AI-Agent Friendly with llms.txt

TL;DR: AI agents can’t parse your HTML effectively. The llms.txt standard (backed by Anthropic, Cloudflare, and others) gives them a discovery file pointing to markdown versions of your content. I implemented it on this blog today—here’s exactly how, with code examples you can copy.

Open Table of contents

The Problem I Didn’t Know I Had
The Numbers Don’t Lie
What AI Agents Actually Want
Enter llms.txt
- Who’s Behind It?
The Implementation Journey
The Road Not Taken: SSR and Content Negotiation
Testing the Discovery Flow
Why This Matters
What You Can Do Today
The Bigger Question

The Problem I Didn’t Know I Had

I was deploying this blog to Cloudflare Pages yesterday when a thought hit me: if I’m building an AI-focused technical blog, shouldn’t my blog be readable by AI agents?

Think about it for a moment. You ask Claude to “check out that blog post about context engineering.” Claude fetches the URL. It gets back… HTML. Script tags. Tailwind classes. Navigation elements. Footer links. Cookie banners.

Somewhere in there is your actual content. Good luck finding it.

Here’s the thing: we built the web for humans with browsers. CSS makes it pretty. JavaScript makes it interactive. But AI agents don’t care about pretty. They care about content.

The irony wasn’t lost on me. I’m building an AI-focused technical blog, on a site that AI can barely read. Sure, AI companies are building headless browser solutions—but is that really the long-term vision? Just to read a blog post, you need to spin up Chrome, have an AI agent drive it, parse the rendered DOM, and extract content from a sea of markup. That’s not efficient. That’s not scalable. That’s a workaround, not a solution.

The Numbers Don’t Lie

I tested this on my own site. This very blog post you’re reading:

Rendered HTML: ~135KB / ~30,000-35,000 tokens
Raw markdown: ~15KB / ~3,500-4,000 tokens
llms.txt discovery file: ~800 bytes / ~200 tokens

That’s a 9x reduction from HTML to markdown. When context windows cost money and every token counts, that difference matters.

What AI Agents Actually Want

When an AI agent visits your site, it wants:

Discovery - What content exists here? What’s important?
Navigation - How do I get to specific pieces?
Content - The actual text, in a format I can parse

HTML gives you none of this cleanly. Sure, agents can scrape it. But they’re parsing your navigation bar, your social links, your “Subscribe to newsletter” popup. Signal buried in noise.

What they actually want: Markdown. Clean, structured, semantic markdown.

# Post Title

The actual content starts here.

## Section One

More content...

No classes. No divs. No scripts. Just content.

Enter llms.txt

The llms.txt standard emerged to solve exactly this problem. It’s simple:

Create a /llms.txt file at your root
List your content with links to markdown versions
Update robots.txt to point AI agents to it

That’s it. No SSR. No middleware. No infrastructure changes.

Diagram showing traditional web scraping on left (HTML soup) versus llms.txt approach on right (clean structured discovery)

Who’s Behind It?

I first came across llms.txt when researching how to make documentation AI-accessible. The standard was created by Jeremy Howard (co-founder of Answer.AI, fast.ai, former Kaggle president) in September 2024. That got me thinking—if someone with that pedigree is proposing this, it’s worth paying attention.

Since then, adoption has accelerated:

Anthropic adopted it for Claude’s ingestion pipeline
Mintlify rolled it out across thousands of docs sites
Cursor uses it to improve code completions
Cloudflare includes llms.txt support in their AI Index

The spec is at llmstxt.org. It’s not theoretical—it’s already in production across thousands of sites.

The Implementation Journey

Here’s exactly what I built today. You can do this in an afternoon.

Step 1: Create /llms.txt

This is your discovery file. It tells AI agents what exists and how to access it.

# vanzan01's Blog

> Context Engineering and AI development.
> Engineering discipline meets AI.

This blog covers AI development, context engineering,
and technical leadership from a practitioner's perspective.

## Markdown Access

All content on this site is available in raw markdown format.
Append `.md` to any URL:

- [Homepage](/index.md)
- [All Posts](/posts.md)
- [About](/about.md)
- [Archives](/archives.md)

## Content

- [Posts Index](/posts.md): Complete list of all blog posts
  with dates and links
- [RSS Feed](/rss.xml): Subscribe to updates

## Topics

- Context Engineering
- AI Development
- Claude Code
- Technical Leadership
- Open Source

Put this in /public/llms.txt for static sites. It gets copied to your output directory at build time.

Step 2: Update robots.txt

Add the LLMs directive so agents know where to look:

// src/pages/robots.txt.ts
import type { APIRoute } from "astro";

const getRobotsTxt = (sitemapURL: URL, siteURL: URL) => `
User-agent: *
Allow: /

Sitemap: ${sitemapURL.href}

# AI Agent Discovery (llms.txt standard)
# See: https://llmstxt.org/
LLMs: ${siteURL.href}llms.txt
`;

export const GET: APIRoute = ({ site }) => {
  const sitemapURL = new URL("sitemap-index.xml", site);
  const siteURL = site
    ? new URL(site)
    : new URL("https://yourdomain.com");
  return new Response(getRobotsTxt(sitemapURL, siteURL));
};

The LLMs: directive is the key addition. It’s the equivalent of Sitemap: but for AI agents.

Step 3: Create Markdown Endpoints

This is where most people get stuck. You need to serve markdown versions of your pages.

Option A: Simple static files

For pages that don’t change often (about, archive), you can just create .md files:

/public/about.md
/public/archives.md

Option B: Dynamic API routes (recommended)

For content that changes—like your posts index—use API routes:

// src/pages/posts.md.ts
import { getCollection } from "astro:content";
import type { APIRoute } from "astro";
import getSortedPosts from "@/utils/getSortedPosts";

export const GET: APIRoute = async () => {
  const posts = await getCollection("blog");
  const sortedPosts = getSortedPosts(posts);

  let markdownContent = `# All Posts\n\n`;

  // Group posts by year
  const postsByYear = sortedPosts.reduce(
    (acc, post) => {
      const year = post.data.pubDatetime.getFullYear();
      if (!acc[year]) acc[year] = [];
      acc[year].push(post);
      return acc;
    },
    {} as Record<number, typeof sortedPosts>
  );

  // Sort years descending
  const years = Object.keys(postsByYear)
    .sort((a, b) => Number(b) - Number(a));

  for (const year of years) {
    markdownContent += `## ${year}\n\n`;

    for (const post of postsByYear[Number(year)]) {
      const date = post.data.pubDatetime.toLocaleDateString(
        "en-US",
        { month: "short", day: "numeric" }
      );
      markdownContent +=
        `- ${date}: [${post.data.title}](/posts/${post.id}.md)\n`;
    }
    markdownContent += "\n";
  }

  return new Response(markdownContent, {
    status: 200,
    headers: {
      "Content-Type": "text/markdown; charset=utf-8",
      "Cache-Control": "public, max-age=3600",
    },
  });
};

This generates a clean markdown index at build time:

# All Posts

## 2026

- Jan 4: [Making Your Blog AI-Agent Friendly](/posts/llms-txt.md)

## 2025

- Jan 2: [Hello World](/posts/hello-world.md)

Diagram showing the API route pattern: Astro builds at compile time, generates static markdown files, served to AI agents on request

Step 4: Individual Post Endpoints

For individual blog posts, you have two options:

If your source is already markdown, read and serve it directly:

// src/pages/posts/[slug].md.ts
import type { APIRoute } from "astro";
import { getCollection } from "astro:content";
import fs from "fs";
import path from "path";

export async function getStaticPaths() {
  const posts = await getCollection("blog");
  return posts.map((post) => ({
    params: { slug: post.id },
    props: { post },
  }));
}

export const GET: APIRoute = async ({ props }) => {
  const { post } = props;

  // Read the raw markdown file
  const filePath = path.join(
    process.cwd(),
    "src/content/blog",
    `${post.id}.md`
  );

  const rawContent = fs.readFileSync(filePath, "utf-8");

  return new Response(rawContent, {
    status: 200,
    headers: {
      "Content-Type": "text/markdown; charset=utf-8",
      "Cache-Control": "public, max-age=3600",
    },
  });
};

The Road Not Taken: SSR and Content Negotiation

I’ll be honest—my first instinct was fancier.

Content negotiation lets the same URL serve different formats based on the Accept header:

# Request HTML
curl https://blog.example.com/posts/my-post

# Request markdown
curl -H "Accept: text/markdown" https://blog.example.com/posts/my-post

Same URL, different content. Elegant, right?

Here’s the thing: it requires SSR.

Static sites (Astro, Next.js static export, Hugo, Jekyll) generate files at build time. No server running means no header inspection means no content negotiation.

I actually built the middleware:

// functions/_middleware.ts (Cloudflare Pages)
export async function onRequest(context) {
  const acceptHeader = context.request.headers.get("accept") || "";

  if (acceptHeader.includes("text/markdown")) {
    const mdUrl = new URL(context.request.url);
    mdUrl.pathname = mdUrl.pathname + ".md";
    return context.env.ASSETS.fetch(mdUrl);
  }

  return context.next();
}

It didn’t work. Cloudflare Pages Functions don’t run for static sites without the SSR adapter.

The trade-off became clear:

	Static Site	SSR
Speed	Lightning fast (CDN edge)	Slower (compute required)
Cost	Nearly free	Compute costs
Complexity	Simple	More infrastructure
Content negotiation	No	Yes
Cold starts	None	Yes

For a blog, static wins. The URL convention (.md suffix) is slightly less elegant but works perfectly with zero infrastructure.

Decision tree diagram: Does your content change per-request? If yes, SSR. If no, static with URL convention.

Testing the Discovery Flow

Here’s the moment of truth. I tested this by acting as an AI agent would:

Step 1: Check robots.txt

curl https://blog.vanzan01.org/robots.txt

Response includes:

LLMs: https://blog.vanzan01.org/llms.txt

Step 2: Fetch llms.txt

curl https://blog.vanzan01.org/llms.txt

Returns the discovery file with links to /posts.md, /about.md, etc.

Step 3: Navigate to posts index

curl https://blog.vanzan01.org/posts.md

Returns clean markdown listing all posts with links.

Step 4: Fetch a specific post

curl https://blog.vanzan01.org/posts/2026/hello-world.md

Returns the raw markdown content.

It works. An AI agent can now:

Discover my site structure
Navigate to content
Read clean markdown

No HTML parsing. No DOM traversal. No headless browser. Just content.

Why This Matters

Here’s the bigger picture.

We’re moving into an era where AI agents will browse the web for us. Not replace human browsing—augment it. “Claude, summarise that article I bookmarked.” “Find recent posts about context engineering.”

Websites that are AI-readable will get cited. AI-hostile sites will get skipped.

Think about it:

SEO was about being discoverable to Google. AI-discoverability is about being useful to agents.

The sites that figure this out early will have an advantage. Not a huge competitive moat—it’s too simple for that. But a head start in thinking about content as both human-readable and machine-readable.

Timeline showing evolution: 1990s SEO (metadata), 2000s mobile-first, 2010s accessibility, 2020s AI-discoverability

What You Can Do Today

If you run a static site (Astro, Next.js, Hugo, Jekyll, etc.), here’s your action list:

Create /llms.txt - Static file in your public folder
Update robots.txt - Add the LLMs: directive
Create markdown endpoints - API routes or static files
Test the flow - Curl through the discovery chain

Total time: An afternoon. No SSR. No infrastructure changes. No costs.

The spec is at llmstxt.org. This blog’s implementation is open source on GitHub.

The Bigger Question

I started this morning wanting to deploy a blog. I ended up thinking about how AI agents will browse the web.

But here’s what’s really interesting: the big question isn’t whether AI agents will talk to your website. It’s whether they’ll talk to your agent.

Think about it. Right now we’re building discovery mechanisms so AI can read our static content. But what happens when your personal AI agent needs information from my blog? Does it really need to hit a website at all? Or does your agent talk directly to my agent?

Websites might just be a transition phase.

The llms.txt standard is a bridge—making today’s static web accessible to AI. But the destination might be agent-to-agent communication, where websites are just one rendering of information that flows directly between systems.

The llms.txt standard is simple—almost too simple. But that’s the point. It works with what we have today. And it’s a step toward wherever this is all going.

Make your content AI-readable. The agents are already looking. Where does that leave your site?

Have you implemented llms.txt on your site? I’d love to hear about your approach—especially if you’re thinking about agent-to-agent communication and what comes after websites.