Why Skill Creator V2 needed a deeper taxonomy before it could create reliable skills.
Short version
- A skill cannot be classified by one label without losing important information.
- The same activity, such as research or review, appears across infrastructure, law, design, SEO, browser work, and meta-skill creation.
- Tool names are usually not classes. They are tool surfaces or tags.
- Risk is not just a topic. It changes the evidence, approval, rollback, and testing rules.
- Skill Creator V2 therefore uses a multi-axis model: activity, domain, tool surface, risk, evidence, and workflow shape.

This note is a cleaned public companion to the research pass behind The Skill That Builds Skills. The raw research was useful for architecture, but the public version should not be a dump of internal notes. The useful part for readers is the reasoning: why a skill creator needs a taxonomy at all, why flat categories fail, and how classification changes the quality of generated skills.
The Flat Taxonomy Problem
The naive way to classify agent skills is to make a list:
- SEO;
- Figma;
- Browser;
- Security;
- Legal;
- Infrastructure;
- Writing;
- Research.
That looks reasonable until you try to use it.
A security audit skill is not only "security." It is also review, analysis, risk assessment, evidence collection, and often infrastructure work. A Figma implementation skill is not only "Figma." It may be design review, GUI operation, asset export, code handoff, or visual QA. A legal defense skill is not only "writing." It has facts, dates, jurisdiction, citations, deadlines, and human approval gates.
One label cannot carry all of that.
The result is mixed granularity. A tool name such as Figma sits next to a domain such as law, an activity such as research, and a risk label such as security. The model then cannot decide what evidence is required, which tools are safe, or whether the agent should stop and ask for approval.
The Six Axes
The research pass led to a simple design decision: do not ask for one class. Build a classification packet.
Skill Creator V2 uses six axes.
| Axis | Question | Why it matters | | --- | --- | --- | | Activity type | What is the agent doing? | Research, analyze, create, implement, review, monitor, remediate, coordinate, and other verbs have different proof. | | Domain | What professional subject is involved? | Infrastructure, legal defense, design, discoverability, media, knowledge management, browser work, and meta-skill creation have different failure modes. | | Tool surface | Where does the work happen? | CLI, browser, files, API, database, GUI, version control, or device interface changes the execution and evidence. | | Risk profile | What can go wrong? | Higher risk requires stronger checks, rollback, human approval, and privacy boundaries. | | Evidence profile | What proves success? | Logs, diffs, screenshots, tests, citations, browser traces, database rows, or public URLs prove different things. | | Workflow shape | How is the work organized? | One-pass tasks, pipelines, reviewer loops, orchestrator-worker systems, and monitoring loops need different skill structures. |
This is the core move. A skill is not "SEO" or "Figma." It is a tuple of properties.
Examples
An infrastructure deployment skill might classify as:
- activity: implement and verify;
- domain: infrastructure operation;
- tool surface: CLI, filesystem, server SSH, version control;
- risk: production-impacting;
- evidence: command output, config diff, service status, smoke test, rollback path;
- workflow: plan, apply, verify, rollback-ready.
A Figma-to-code skill might classify as:
- activity: inspect, translate, implement, review;
- domain: design and visual creation;
- tool surface: GUI/design canvas plus code;
- risk: public UX regression;
- evidence: node IDs, screenshots, generated files, responsive checks;
- workflow: design read, code change, visual verification.
A legal defense skill might classify as:
- activity: analyze, draft, review, communicate;
- domain: legal reasoning and defense;
- tool surface: documents, citations, maybe court or agency portals;
- risk: high-stakes human decision;
- evidence: dated sources, jurisdiction notes, fact/assumption separation, citation trail;
- workflow: draft with mandatory human review.
These are not cosmetic differences. They change what the skill is allowed to do.
Classes, Subclasses, Tags, and Tool Surfaces
The taxonomy also needs rules for what becomes a real class.
A domain becomes a class only when it has a distinct workflow fingerprint: different activities, evidence, failure modes, and review gates.
A specialization becomes a subclass when it shares the parent workflow but needs different evidence or procedures. For example, tax defense and traffic defense can live under legal defense because they share legal reasoning and document workflows, while differing in procedure, forum, and proof.
A label becomes a tag when it is useful but does not change the workflow. VPS, VPN, DNS, and router are often infrastructure tags. They matter, but they should not explode the top-level taxonomy.
A tool name becomes a tool surface, not a class, when the same tool can appear in many domains. Browser, Figma, Terraform, Playwright, SQLite, and GitHub are usually surfaces or tags. They shape execution, but they do not automatically define the skill's professional purpose.
This prevents the taxonomy from becoming a pile of product names.
Why Risk Is Separate
Security is a good example of why flat labels fail.
Sometimes security is the domain: a security audit skill is explicitly about security.
Sometimes security is a risk profile: a deployment skill can be infrastructure work with security consequences.
Sometimes security is a review requirement: a public website change may need privacy and header checks even if the page is not "about security."
So security should not always be a class. Often it is a risk dimension that changes the gates.
The same logic applies to legal, financial, production, privacy, and public-reputation risk. The higher the cost of being wrong, the more evidence the skill must require before it says "done."
What This Changes in Skill Creator V2
This taxonomy changes the generation workflow.
Before writing SKILL.md, Skill Creator V2 should classify the work.
Then it should derive:
- what the skill owns;
- what it explicitly does not own;
- what evidence is mandatory;
- which tools and files are expected;
- which actions require user approval;
- which tests or evals are needed;
- whether this should be one skill or a skill group;
- what a reviewer should challenge before release.
That makes the system more mature than a prompt template.
A prompt template asks: "What should the agent say?"
Skill Creator V2 asks: "What work are we capturing, what can go wrong, and what proves success?"
Why This Matters for Skill Groups
Some work is too broad for one skill.
If a workflow includes strategy, implementation, review, testing, publication, and monitoring, one giant skill becomes brittle. It will trigger too broadly and hide too much responsibility.
The taxonomy helps split the work:
- an orchestrator owns sequence and handoff;
- a specialist owns one domain;
- a reviewer owns gates and quality;
- a tester owns evidence;
- a publisher owns sanitized packaging and release.
The split should happen only when there is a real boundary: different evidence, different risk, different tool surface, or different approval owner. Otherwise the system just creates bureaucracy.
The Practical Standard
The practical standard is simple:
do not call a skill production-ready because the file looks polished.
Call it ready only when the taxonomy, evidence, risk, tests, and boundaries agree with the work it claims to automate.
That is the reason Skill Creator V2 needs a deep taxonomy.
Not for elegance.
For safer autonomy.
For reusable skills that can be inspected, tested, improved, and trusted.