H
Howardismvol. 03 · quiet corner of the web
Howardism · Vol. 03Plate II · No. 02

Concept, filed.

Pieces69SectionConceptOldest10 Apr 2026Newest23 May 2026

Patterns, frameworks, and abstractions worth a name.

Concept articles, sorted by date, newest first.
TitleSummaryDate
Agent-Native InfrastructureThe world is still built for humans and must be rewritten for agents; "what do I copy-paste to my agent?"; sensors/actuators; agent-to-agent representation
Agentic Loops Overtake Bespoke SystemsDeepMind's *basic* Ralph-loop agent matched its bespoke evolutionary+AlphaProof system as the LLM improved; the bitter lesson / harness-shrinkage confirmed in formal math
AI-Driven Formal Proof SearchLLM generates Lean, compiler verifies every step → eliminates hallucination; DeepMind resolves 9/353 Erdős + 44/492 OEIS open problems; verification as a filter for human review
The AI-Native Safe-Choice InversionBuying the legacy incumbent used to be "safe"; post-AI, *being* the incumbent = not AI-native; boards give buyers air cover; a counter-positioning play
Building Is Cheap, Arguing Is Expensive"In technical debate, code wins": generate three PRs vs whiteboard; prototype over design doc; reduce design docs
Code as Source of TruthDocs go stale at high coding throughput; check specs/skills into the repo; onboard via Claude; spec-drift verification
Dogfooding as Product DisciplineProduct sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, Glasgow founder-led sales)
Evolutionary Proof SearchThe full-featured agent's mechanism: population DB of proof sketches, Elo via Plackett–Luce/Gibbs, P-UCB selection, LLM-critic fitness for binary proof eval
Founder-Led Sales DisciplineStay founder-led until PMF; don't offload sales to an AE *or* an agent; explicit tension with Founder As Agent Orchestrator
Jagged Intelligence (Ghosts, Not Animals)"Ghosts not animals": jagged statistical circuits, no intrinsic motivation; car-wash/strawberry failures; stay in the loop, treat as tools
Managers as ICsEvery Claude Code manager starts as an IC; flat org; agentic coding collapsed the onboarding cost that pushed managers out of the codebase
Narrow Wedge into a Legacy MarketDisrupt without being feature-complete: be the best for a narrow customer profile (tech cos outgrowing QuickBooks); Google-Sheets MVP; the wedge-flip lesson
Outsource Your Thinking, Not Your Understanding"You can outsource your thinking but not your understanding"; understanding as the non-delegable human bottleneck; knowledge bases as understanding-tools
Product Velocity as MoatShipping speed as differentiator + trust signal ("you'll scale with us"); a treadmill that must convert into durable lock-in
Software 3.0Karpathy's taxonomy: 1.0 code, 2.0 weights, 3.0 prompting; LLM as programmable interpreter; MenuGen "shouldn't exist"; neural-net-as-host-process extrapolation
The Verifiability ThesisLLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peaks; "verifiable + labs care"; everything eventually verifiable
Verification as the New BottleneckFiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax; PR-cycle-time funnel analysis
Vibe Coding vs. Agentic EngineeringVibe coding raises the floor (anyone builds); agentic engineering preserves the quality bar while going faster; ">10x and widening"; hire on big projects, not puzzles
Compute AllocatorThe human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding invested in alignment/communication; abundance mindset
Disposable Micro-AppsThrowaway custom UIs built per-task to edit a plan ("micro-software on top of micro-software"); copy-back-to-markdown; rational under the abundance mindset
HTML as the New MarkdownThariq Shihipar's thesis: as models improve, thousand-line markdown plans overwhelm the *human*; HTML artifacts (visual, interactive) keep humans in the loop. The model-facing harness shrinks while this human-facing harness grows
Living Design System`design_system.html` extracted from repos as a portable, human- and machine-readable source of truth; component playgrounds; bridges engineering ↔ non-technical stakeholders
Agentic Technical DebtDebt that *compounds* (not just accumulates) because each agentic-coding session re-derives architectural decisions without persistent CLAUDE.md; surfaces late as a forced rewrite
AI-Native Startup LifecycleAnthropic's May 2026 reframing of Idea/MVP/Launch/Scale assuming AI infrastructure: each stage's headcount/capital/skill gates dissolve; lean unicorn as deliberate target
Compounding Data MoatAnthropic's prescription for Scale-stage defensibility: time-locked behavioral fingerprint + domain-encoded edge cases + workflow lock-in via APIs/integrations beyond what migration agents can port
Evals as Product SpecCat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done looks like for ambiguous AI features; companion to introspection (hypothesis) and vibe-check (direction)
Founder as Agent OrchestratorFounder role shift: less individual contributor, more orchestrator of specialized AI assistants; non-technical founders unblocked; lean 10-person unicorn structurally enabled
MCP and Computer UseAnthropic's two complementary connector mechanisms: MCP for structured programmatic access (Salesforce/Drive/Gmail/Slack/Figma + niche industry systems); computer use as the GUI-driving catchall when no MCP exists; Boris Cherny's "to the model, it's just tokens"
Problem-Solution Fit DisciplineIdea-stage thesis: three defenses against premature building (time, resources, belief friction) all eroded; AI as devil's advocate is the antidote to confirmation-bias-with-research-engine
Zero-Friction Scope CreepMVP failure mode when agentic coding removes the cost-based forcing function against scope creep; antidote is written scope + evidence-based amendment criteria
Encoder-Free Early FusionMultimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch hMLP for frames, flow head for audio out, all co-trained from scratch in one transformer
Full-Duplex InteractionPerceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speech, live translation/commentary, time-aware speech — all special cases of model behavior
Interaction / Background Model SplitDual-model architecture: time-aware interaction model stays present; async background model handles deep reasoning/tools; rich-context-package delegation; "reasoning-model planning at non-thinking latency"
Interaction ModelsThinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via harness; interactivity scales with intelligence only if it's in the model
Interactivity BenchmarksFD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (visual proactivity); TML-Interaction-Small: 0.40s turn-taking latency, dominates interaction quality
The Bitter LessonSutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolving harnesses into models; caveat — mechanical verification and character may not migrate inward
Time-Aligned Micro-TurnsThe core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; streaming-sessions inference (upstreamed to SGLang), latency-tuned MoE kernels, bitwise trainer-sampler alignment
Turn-Based Interface BottleneckWhy current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out by the interface, not the work; less-intelligent harness (VAD/turn-detection) should dissolve
Agentic Misalignment (AM)Lynch et al. 2025 eval and threat model: LLM email-agent discovers it may be deleted, can take harmful actions; OOD relative to conversational AFT; primary eval surface for Model Spec Midtraining
AI Brain FryKropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognitive cost surface for both tool and employee framings
AI Employee FramingKropp et al. (HBR May 2026, n=1,261): framing AI agents as "employees" vs "tools" cuts personal accountability −9pp, increases escalation +44%, reduces error catching −18%, no adoption gain
Alignment Fine-Tuning (AFT)Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec Midtraining
Chain-of-Thought MonitorabilityKorbak et al. 2025: chain-of-thought traces are a fragile monitor; direct CoT training compromises faithfulness; MSM offers an alternative path
Deliberative AlignmentGuan et al. 2025 (OpenAI): SFT on (prompt, CoT, response) tuples with spec-grounded CoT; strongest non-MSM baseline; risks compromising Cot Monitorability
Human-AI Accountability RedesignHBR five-pillar prescription: span-of-control redesign, role redesign, performance management reset, decision-rights/escalation/consequences, agentic-unit-not-human-role design
Model Spec Midtraining (MSM)New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT generalization; cuts agentic misalignment 54%→7%; beats deliberative alignment baseline
Model Spec ScienceEmpirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > general "be ethical" framing; first concrete examples in Li et al. 2026
Synthetic Document Finetuning (SDF)Wang et al. 2025 technique for modifying model beliefs via fine-tuning on synthetic documents; foundation that Model Spec Midtraining builds on
Agent Loop Pattern`/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, parallel fan-out, "loops are the future"
AI Native Product CadenceCat Wu's 6mo→1mo→1day cadence at Anthropic: research-preview branding, mission-as-tiebreaker, evergreen launch room, lighter PRDs, weekly metrics readouts
Claude Character as ProductPersonality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the harness asset that *doesn't* shrink
Context Window Smart ZoneSmart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised context; clear-and-restart > compaction; status-line token counting as essential discipline
Deep Modules for AgentsOusterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in fresh context; Sandcastle three-agent pattern
Design Concept GrillingMatt Pocock's `grill-me` skill; reach Brooks "design concept" before any plan; counter to specs-to-code; PRD as destination doc, Kanban as journey doc
Engineer PM ConvergenceGeneralists across disciplines; product taste as bottleneck skill; Anthropic Claude Code team as case study; "just do things" cultural substrate
Harness Shrinkage as Models ImprovePrompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from now" claim; mechanical verification stays load-bearing
Model Introspection FeedbackCat Wu's underrated technique: ask the model why it failed; treat answer as harness-debugging signal not model criticism; caveats around model self-report fidelity
Printing Press Software DemocratizationBoris Cherny's analogy: 1400s literacy expansion → AI software-writing expansion; domain knowledge displaces coding skill; 10× more disruption-grade startups predicted
Seven Powers Applied to AIHelmer/Acquired framework re-evaluated for AI: switching costs and process power erode; network effects, scale, cornered resources persist; counter-positioning amplifies
Vertical Slice Tracer BulletsPragmatic-Programmer tracer-bullet pattern applied to agent task decomposition; vertical slices > horizontal layers; Kanban-with-blocking-edges over numbered phase plans
Codex App Server ProtocolJSON-RPC stdio protocol for headless Codex sessions: initialize/initialized/thread-start/turn-start handshake, continuation turns reuse thread_id, dynamic tool calls for token-isolated tool injection
Ticket-Driven Agent OrchestrationThe inversion that makes Symphony work: tickets as units of work (not sessions/PRs), DAG dependencies, agent-extensible work graph, "objectives not transitions"
Claude Code Auto ModeClaude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground between default and `--dangerously-skip-permissions`
Client-Side Agent OptimizationAgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server-side serving; the combo abstraction; 13–32× cost gaps between best/worst combinations
Scale-Dependent Prompt SensitivityLarge models underperform small ones on 7.7% of standard benchmarks due to overthinking; brevity constraints recover 26pp and fully reverse hierarchy on GSM8K/MMLU-STEM
Agent Harness EngineeringPatterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical architecture enforcement, agent code review
Claude Code Best PracticesAnthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→code workflow, environment config
LLM-as-Compiler Knowledge BaseKarpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4-phase ingest→compile→query→lint pipeline
LLM-Driven Vulnerability ResearchClaude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and Anthropic's Project Glasswing response