Agentic Work Systematization

Sources#

Summary#

One of three "how" margins OpenAI's Codex usage study uses to measure whether agentic AI is moving beyond one-off assistance: systematization — the shift from ad hoc delegation (describe a task, the agent does it, the interaction ends) to reusable workflow infrastructure that lets similar work be delegated repeatedly without re-supplying context each time. In Codex, this happens through skills (a SKILL.md workflow spec) and plugins (an installable bundle of skills + integrations). The paper's framing: without systematization, "users must repeatedly supply task context, procedural guidance, and instructions, limiting the extent to which work can be handed off." Systematization is therefore a precondition for deep delegation, not a power-user nicety. This is the measured, empirical counterpart to Loop Engineering's skills primitive and to Agent Context Files's "intent written down outside" — Osmani's intent debt argument, now with adoption curves.

Evidence note. empirical — skill-source invocation measured from Codex logs over weekly windows; OpenAI-internal usage is a frontier preview, not a population estimate (see Conversation-to-Delegation Shift's evidence note).

Skill use is common, rising, and uneven#

In the 7-day window ending June 11, 2026:

Population	Share invoking ≥1 skill
Individual users	25.7%
Organizational users	30.4%
OpenAI workers	96.2%

And it is climbing fast: the share of weekly-active Codex users invoking any skill rose from 5.4% (Mar 1, 2026) → 26.6% (Jun 11, 2026) — roughly a 5× increase in three months. Within OpenAI, systematization is effectively the default mode of use; among external users it is a minority practice but a growing one.

The skill taxonomy, and what custom skills reveal#

The paper distinguishes five skill sources, ordered from product-provided to user-authored:

Preinstalled — capabilities bundled with Codex (e.g. image generation).
Curated — standalone OpenAI-distributed skills not tied to a plugin (e.g. PDF handling).
Plugin skills — bundled inside a plugin (e.g. Google Drive document workflows).
Custom plugin skills — associated with a recognized plugin but not matching the curated catalog.
Custom skills — standalone, user/org-authored, not distributed by OpenAI (team data-viz guidelines, a research workflow).

Growth comes especially from plugins and custom skills — the two ends of the systematization spectrum. Plugins extend general capability into recurring artifact domains (docs, spreadsheets, slide decks); custom skills encode local procedural context — team writing standards, recurring reports, organization-specific workflows, user preferences. The rise of custom skills is the load-bearing signal: users value not just the model's general capability but the ability to attach persistent procedural context to repeated, too-specialized-to-standardize tasks.

The organizational gradient#

Plugin and custom-skill use is highest at OpenAI and substantially higher among organizational than individual users. The paper reads this as variation in the returns to systematization: custom skills pay off most when repeated tasks depend on shared conventions, internal procedures, or team-level standards — conditions far more common in organizational settings than in solo individual use. So systematization concentrates where persistent procedural context reduces coordination cost across repeated tasks. (Appendix detail: 50.9% of OpenAI collaboration conversations invoke a skill — systematization has diffused past code into knowledge work internally.)

This is the same logic as compounding context: the value isn't the one invocation, it's that the encoded workflow is reused and shared across people and time. A skill is the authoring format; a plugin is the distribution unit — the mechanism by which one person's systematization spreads across an org (the same skill≠plugin distinction Loop Engineering draws).

Skills as a cross-vendor distribution channel for methodology#

A step past org-internal systematization: in June 2026 Google shipped its agent-quality evaluation methodology as installable skills (npx skills add … from skills.sh, two packages) designed to be driven by whatever coding agent the customer already uses. The taxonomy above runs product-provided → user-authored within one product; this is a third position — vendor-authored methodology distributed in the skill format across agent products. The skill is becoming not just how orgs encode local procedural context but how vendors ship expertise into other vendors' harnesses.

Connections#

Loop Engineering — the practitioner-discipline source for the same primitive: skills as the third of five loop primitives, "intent written down on the outside" so the loop doesn't re-derive the project each cycle; this page is its measured, adoption-curve counterpart
Agent Context Files — skills/SKILL.md as externalized, reusable project context; systematization is the usage-data evidence that this primitive is being adopted at scale
Conversation-to-Delegation Shift — systematization is one of the three "how" margins (with concurrency and runtime) by which that study measures depth of delegation
Parallel Agent Orchestration — the sibling margins from the same study; systematization is what makes parallel/repeatable delegation tractable
Compounding Data Moat — custom skills are encoded org-specific procedural context that compounds and is shared; the returns-to-systematization gradient is a moat-from-context argument
Harness Shrinkage as Models Improve — skills/plugins are harness capability being absorbed into the product as named, shareable primitives rather than hand-maintained scaffolding
MCP and Computer Use — plugins bundle MCP/connector integrations alongside skills; the tool-reach half of systematization
Agentic Technical Debt — un-systematized agent use is the re-derive-from-zero / intent-debt failure mode; skills are the persistent-context antidote, here shown being adopted
Ticket-Driven Agent Orchestration — board/ticket state is the other durable externalization; systematization via skills is the procedure side, tickets are the work-graph side
OpenAI — the lab whose Codex telemetry this is
Codex — the tool whose skill/plugin system this measures
Agent Quality Flywheel — vendor-authored methodology shipped in the skill format across agent products; the cross-vendor extension of the systematization spectrum

Open questions#

The 5.4%→26.6% curve is three months. Is this a durable behavior change or a novelty spike following a Codex skills-feature push? (Cf. the OpenAI-internal training campaigns the paper notes.)
Custom skills encode org-specific context — but who maintains them as the codebase and conventions drift? Systematization could itself become a debt surface (Agentic Technical Debt) if skills rot.
Does systematization cause deeper delegation or merely correlate with already-intensive users? The paper shows the association, not the direction.

Sources#

The Shift to Agentic AI: Evidence from Codex — §5.3 "Systematization of Agentic Work"; §6 Conclusion; footnote 13 (skill vs plugin definitions); Appendix Figure A8 (skill use by task area)
Driving the Agent Quality Flywheel from Your Coding Agent- Google Developers Blog — Google's eval methodology distributed as skills.sh packages for any coding agent (vendor-claim)