H
Howardism
Plate IIAI EngineeringHOWARDISM

Agentic Work Systematization

PublishedJune 26, 2026FiledConceptDomainAI EngineeringTagsAgent EngineeringHarnessAutomationAI Coding WorkflowEngineering MetricsEmpiricalOpenaiReading7 minSourceAI-synthesised

OpenAI Codex study's 'systematization' margin: the shift from ad-hoc agent use (describe task → agent does it → done) to reusable workflow infrastructure via skills and plugins; skill use rose 5.4%→26.6% of weekly-active users (Mar→Jun 2026) and is near-universal at OpenAI (96.2%); custom skills (org-specific procedural context) concentrate where shared conventions and team standards exist — the empirical counterpart to loop-engineering's skills primitive

Illustration for Agentic Work Systematization

Sources#

Summary#

One of three "how" margins OpenAI's Codex usage study uses to measure whether agentic AI is moving beyond one-off assistance: systematization — the shift from ad hoc delegation (describe a task, the agent does it, the interaction ends) to reusable workflow infrastructure that lets similar work be delegated repeatedly without re-supplying context each time. In Codex, this happens through skills (a SKILL.md workflow spec) and plugins (an installable bundle of skills + integrations). The paper's framing: without systematization, "users must repeatedly supply task context, procedural guidance, and instructions, limiting the extent to which work can be handed off." Systematization is therefore a precondition for deep delegation, not a power-user nicety. This is the measured, empirical counterpart to Loop Engineering's skills primitive and to Agent Context Files's "intent written down outside" — Osmani's intent debt argument, now with adoption curves.

Evidence note. empirical — skill-source invocation measured from Codex logs over weekly windows; OpenAI-internal usage is a frontier preview, not a population estimate (see Conversation-to-Delegation Shift's evidence note).

Skill use is common, rising, and uneven#

In the 7-day window ending June 11, 2026:

PopulationShare invoking ≥1 skill
Individual users25.7%
Organizational users30.4%
OpenAI workers96.2%

And it is climbing fast: the share of weekly-active Codex users invoking any skill rose from 5.4% (Mar 1, 2026) → 26.6% (Jun 11, 2026) — roughly a 5× increase in three months. Within OpenAI, systematization is effectively the default mode of use; among external users it is a minority practice but a growing one.

The skill taxonomy, and what custom skills reveal#

The paper distinguishes five skill sources, ordered from product-provided to user-authored:

  1. Preinstalled — capabilities bundled with Codex (e.g. image generation).
  2. Curated — standalone OpenAI-distributed skills not tied to a plugin (e.g. PDF handling).
  3. Plugin skills — bundled inside a plugin (e.g. Google Drive document workflows).
  4. Custom plugin skills — associated with a recognized plugin but not matching the curated catalog.
  5. Custom skills — standalone, user/org-authored, not distributed by OpenAI (team data-viz guidelines, a research workflow).

Growth comes especially from plugins and custom skills — the two ends of the systematization spectrum. Plugins extend general capability into recurring artifact domains (docs, spreadsheets, slide decks); custom skills encode local procedural context — team writing standards, recurring reports, organization-specific workflows, user preferences. The rise of custom skills is the load-bearing signal: users value not just the model's general capability but the ability to attach persistent procedural context to repeated, too-specialized-to-standardize tasks.

The organizational gradient#

Plugin and custom-skill use is highest at OpenAI and substantially higher among organizational than individual users. The paper reads this as variation in the returns to systematization: custom skills pay off most when repeated tasks depend on shared conventions, internal procedures, or team-level standards — conditions far more common in organizational settings than in solo individual use. So systematization concentrates where persistent procedural context reduces coordination cost across repeated tasks. (Appendix detail: 50.9% of OpenAI collaboration conversations invoke a skill — systematization has diffused past code into knowledge work internally.)

This is the same logic as compounding context: the value isn't the one invocation, it's that the encoded workflow is reused and shared across people and time. A skill is the authoring format; a plugin is the distribution unit — the mechanism by which one person's systematization spreads across an org (the same skill≠plugin distinction Loop Engineering draws).

Skills as a cross-vendor distribution channel for methodology#

A step past org-internal systematization: in June 2026 Google shipped its agent-quality evaluation methodology as installable skills (npx skills add … from skills.sh, two packages) designed to be driven by whatever coding agent the customer already uses. The taxonomy above runs product-provided → user-authored within one product; this is a third position — vendor-authored methodology distributed in the skill format across agent products. The skill is becoming not just how orgs encode local procedural context but how vendors ship expertise into other vendors' harnesses.

Connections#

  • Loop Engineering — the practitioner-discipline source for the same primitive: skills as the third of five loop primitives, "intent written down on the outside" so the loop doesn't re-derive the project each cycle; this page is its measured, adoption-curve counterpart
  • Agent Context Files — skills/SKILL.md as externalized, reusable project context; systematization is the usage-data evidence that this primitive is being adopted at scale
  • Conversation-to-Delegation Shift — systematization is one of the three "how" margins (with concurrency and runtime) by which that study measures depth of delegation
  • Parallel Agent Orchestration — the sibling margins from the same study; systematization is what makes parallel/repeatable delegation tractable
  • Compounding Data Moat — custom skills are encoded org-specific procedural context that compounds and is shared; the returns-to-systematization gradient is a moat-from-context argument
  • Harness Shrinkage as Models Improve — skills/plugins are harness capability being absorbed into the product as named, shareable primitives rather than hand-maintained scaffolding
  • MCP and Computer Use — plugins bundle MCP/connector integrations alongside skills; the tool-reach half of systematization
  • Agentic Technical Debt — un-systematized agent use is the re-derive-from-zero / intent-debt failure mode; skills are the persistent-context antidote, here shown being adopted
  • Ticket-Driven Agent Orchestration — board/ticket state is the other durable externalization; systematization via skills is the procedure side, tickets are the work-graph side
  • OpenAI — the lab whose Codex telemetry this is
  • Codex — the tool whose skill/plugin system this measures
  • Agent Quality Flywheel — vendor-authored methodology shipped in the skill format across agent products; the cross-vendor extension of the systematization spectrum

Open questions#

  • The 5.4%→26.6% curve is three months. Is this a durable behavior change or a novelty spike following a Codex skills-feature push? (Cf. the OpenAI-internal training campaigns the paper notes.)
  • Custom skills encode org-specific context — but who maintains them as the codebase and conventions drift? Systematization could itself become a debt surface (Agentic Technical Debt) if skills rot.
  • Does systematization cause deeper delegation or merely correlate with already-intensive users? The paper shows the association, not the direction.

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 13
  • Agent Context Files

    The cross-vendor markdown-as-control-plane pattern: repo-versioned plaintext (CLAUDE.md / AGENTS.md / SOUL.md / WORKFLO…

  • Agent Quality Flywheel

    Google's eval-fix loop packaged as a skill your coding agent drives: Build & Test → Ship & Monitor → Learn & Refine, ex…

  • Agentic Technical Debt

    Debt that *compounds* (not just accumulates) because each agentic-coding session re-derives architectural decisions wit…

  • Codex

    OpenAI's agentic coding and work platform: a CLI (April 2025) plus a desktop app (built Nov 2025, released Feb 2026) bu…

  • Compounding Data Moat

    Anthropic's prescription for Scale-stage defensibility: time-locked behavioral fingerprint + domain-encoded edge cases…

  • Conversation-to-Delegation Shift

    OpenAI's Codex usage study (June 2026): the move from conversational AI ('asking') to agentic AI ('delegated production…

  • Gemini Enterprise Agent Platform

    *Entity.* Google Cloud's agent platform: the GenAI evaluation service with adaptive AutoRaters (built with DeepMind), U…

  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • Loop Engineering

    Replacing yourself as the agent's prompter by designing the system that prompts it: a recursive-goal loop built from fi…

  • MCP and Computer Use

    Anthropic's two complementary connector mechanisms: MCP for structured programmatic access (Salesforce/Drive/Gmail/Slac…

  • AI Engineering & Agent Tooling

    Map of Content for the ai-engineering domain — 45 concepts. Curated entry point; see Home for all domains.

  • Parallel Agent Orchestration

    OpenAI Codex study's concurrency + runtime margins: the intensive-user workflow where a human oversees a team of agents…

  • Ticket-Driven Agent Orchestration

    The inversion that makes Symphony work: tickets as units of work (not sessions/PRs), DAG dependencies, agent-extensible…

Related articles
  • Loop Engineering

    Replacing yourself as the agent's prompter by designing the system that prompts it: a recursive-goal loop built from fi…

  • Harness Shrinkage as Models Improve

    Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from…

  • Claude Code

    Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE sur…

  • Agent Harness Engineering

    Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical archit…

  • Claude Code Best Practices

    Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→cod…