Why this started

I started mlllm.io as a public surface for my work: AI news, extended explanations, projects, blog notes, and a profile that could support an OpenAI open-source grant application.

The first practical decision was simple. I already had a Telegram channel, Look at AI News, and a TG-NEWS pipeline behind it. Instead of keeping the deeper explanations in Telegraph forever, I wanted the site to become the canonical place for the archive: one short news brief and one linked longform article for each story.

Because the project itself is about AI tools, I wanted the site to be readable not only for people and Google, but also for LLM-based agents, AI search systems, and RAG pipelines.

That became a much larger task than "add SEO".

The first result

After a few days of painful iteration, the site had the basic machine-readable layer I wanted:

  • static HTML that works without JavaScript rendering;
  • robots.txt, sitemap.xml, news-sitemap.xml, RSS, and llms.txt;
  • explicit access rules for search and AI crawlers;
  • structured pages for short news, longform articles, topics, projects, community, and author profile;
  • JSON-LD schema for news, articles, breadcrumbs, website, organization, and author signals;
  • a stable internal model: one short brief plus one expanded article, not duplicated content fragments.

The model-assisted audit score improved significantly. More important than the number was the path to that score: the decisions were not random. Each improvement came from a task, a check, a discussion, and a regression test.

That is where the next idea appeared.

Why one SEO skill was not enough

At first it looked like I needed one skill: "optimize a site for SEO and LLM crawlers".

That is too broad.

Real site growth touches several different disciplines:

  • semantic core and topic architecture;
  • URL structure and internal linking;
  • technical SEO and schema.org;
  • robots.txt, llms.txt, RSS, sitemap, and crawler access;
  • visible UX and onboarding for new readers;
  • content quality gates and editorial residue checks;
  • first-party analytics and bot monitoring;
  • external authority and backlink placement;
  • regression validation against live HTML;
  • coordination of all tasks without breaking the publishing pipeline.

If one prompt tries to own all of that, it becomes vague. It may sound senior, but it will miss details, confuse priorities, or optimize one layer while damaging another.

So I split the work into a cluster.

What the cluster does

The public repository is SEO/LLM Skill Cluster.

The central idea is that a site should be designed for three readers at once:

  • a human reader who needs orientation, trust, and useful navigation;
  • a search engine crawler that needs stable URLs, metadata, schema, and crawlable HTML;
  • an AI agent that needs a clean machine-readable map, source trails, entities, and direct answers without guessing.

The cluster is not one mega-skill. It is a group of narrower skills around a central orchestrator:

  • site growth orchestration;
  • semantic core design;
  • SEO information architecture;
  • internal link graph planning;
  • technical SEO and schema engineering;
  • LLM-friendly site architecture;
  • LLM citation monitoring;
  • server log and crawler analysis;
  • external authority placement scouting;
  • backlink quality validation;
  • UX journey architecture;
  • SEO regression validation.

Each skill has a bounded job. The orchestrator coordinates the sequence, keeps the task plan current, and prevents the work from turning into disconnected advice.

The rule I had to protect

The most important rule came from the site itself:

One story has one short news brief and one expanded longform article.

That sounds obvious, but SEO and LLM optimization can easily push a project in the wrong direction. It is tempting to create extra summaries, FAQ fragments, topic variants, agent pages, Markdown mirrors, and rewritten versions for every surface.

That may look like "more content", but it can also become duplication.

For mlllm.io, the right model is different:

  • the brief stays a real brief;
  • the longform stays a real longform;
  • topic pages explain broader concepts;
  • project pages show the systems behind the work;
  • blog posts carry personal thinking and build notes;
  • llms.txt and schema describe the site, but do not multiply the story itself.

LLM-friendly architecture should strengthen the content model, not create uncontrolled copies of it.

Why the task plan mattered

The other tool that turned out to be essential was task-plan-v2-dashboard.

Long agent work fails when everything lives only in chat. You keep asking "what is done?", "what is next?", "what changed?", and "did it break the site?".

The task plan solved that by turning the work into visible units:

  • tasks with IDs and statuses;
  • implementation gates;
  • validation steps;
  • notes about risks and regressions;
  • commits tied to completed work.

That also changes the human workflow. You do not have to stare at the model constantly and pull status out of it every five minutes. The dashboard gives enough structure to step away, make coffee, and come back to a visible state of the work.

That sounds small, but for multi-hour agent work it is a real product feature.

What I published

The repository contains the cluster structure, task plan, validation scripts, example artifacts, and documentation for how the skills should cooperate.

It is not meant to be a magic prompt. It is closer to an operating model:

  • plan the site architecture;
  • make the content machine-readable;
  • validate the live HTML;
  • monitor crawler and LLM visibility;
  • improve UX without breaking SEO;
  • record the work so it can be repeated.

The project grew out of a real use case, not a theoretical checklist. The use case was mlllm.io itself: a multilingual AI news and builder site fed by TG-NEWS, with short briefs, longform explainers, source trails, project pages, and public audit checks.

What comes next

The next layer is practical validation across more sites.

I want the cluster to support not just "make a page better", but the whole loop:

  1. build a semantic core;
  2. design site structure and internal linking;
  3. publish crawlable pages;
  4. add schema, llms.txt, feeds, and metadata;
  5. monitor search and AI crawler access;
  6. check whether LLM systems cite the site;
  7. identify legitimate external placements;
  8. feed the results back into the task plan.

That is the point of making it a skill cluster. A serious site is not optimized by one instruction. It is improved by a coordinated system of decisions, checks, and feedback loops.

The site was the test case. The skill cluster is the attempt to make the method reusable.