Howardism · Vol. 03Plate II · No. 02
Concept, filed.
Pieces69SectionConceptOldest10 Apr 2026Newest23 May 2026
Patterns, frameworks, and abstractions worth a name.
| Title | Summary | Date |
|---|---|---|
| Agent-Native Infrastructure | The world is still built for humans and must be rewritten for agents; "what do I copy-paste to my agent?"; sensors/actuators; agent-to-agent representation | |
| Agentic Loops Overtake Bespoke Systems | DeepMind's *basic* Ralph-loop agent matched its bespoke evolutionary+AlphaProof system as the LLM improved; the bitter lesson / harness-shrinkage confirmed in formal math | |
| AI-Driven Formal Proof Search | LLM generates Lean, compiler verifies every step → eliminates hallucination; DeepMind resolves 9/353 Erdős + 44/492 OEIS open problems; verification as a filter for human review | |
| The AI-Native Safe-Choice Inversion | Buying the legacy incumbent used to be "safe"; post-AI, *being* the incumbent = not AI-native; boards give buyers air cover; a counter-positioning play | |
| Building Is Cheap, Arguing Is Expensive | "In technical debate, code wins": generate three PRs vs whiteboard; prototype over design doc; reduce design docs | |
| Code as Source of Truth | Docs go stale at high coding throughput; check specs/skills into the repo; onboard via Claude; spec-drift verification | |
| Dogfooding as Product Discipline | Product sense is built by relentless first-hand use ("ant food"); Mr. Peanut catch; cross-source (Cat Wu vibe-checks, Glasgow founder-led sales) | |
| Evolutionary Proof Search | The full-featured agent's mechanism: population DB of proof sketches, Elo via Plackett–Luce/Gibbs, P-UCB selection, LLM-critic fitness for binary proof eval | |
| Founder-Led Sales Discipline | Stay founder-led until PMF; don't offload sales to an AE *or* an agent; explicit tension with Founder As Agent Orchestrator | |
| Jagged Intelligence (Ghosts, Not Animals) | "Ghosts not animals": jagged statistical circuits, no intrinsic motivation; car-wash/strawberry failures; stay in the loop, treat as tools | |
| Managers as ICs | Every Claude Code manager starts as an IC; flat org; agentic coding collapsed the onboarding cost that pushed managers out of the codebase | |
| Narrow Wedge into a Legacy Market | Disrupt without being feature-complete: be the best for a narrow customer profile (tech cos outgrowing QuickBooks); Google-Sheets MVP; the wedge-flip lesson | |
| Outsource Your Thinking, Not Your Understanding | "You can outsource your thinking but not your understanding"; understanding as the non-delegable human bottleneck; knowledge bases as understanding-tools | |
| Product Velocity as Moat | Shipping speed as differentiator + trust signal ("you'll scale with us"); a treadmill that must convert into durable lock-in | |
| Software 3.0 | Karpathy's taxonomy: 1.0 code, 2.0 weights, 3.0 prompting; LLM as programmable interpreter; MenuGen "shouldn't exist"; neural-net-as-host-process extrapolation | |
| The Verifiability Thesis | LLMs automate what you can *verify* as computers automate what you can *specify*; RL verification rewards → jagged peaks; "verifiable + labs care"; everything eventually verifiable | |
| Verification as the New Bottleneck | Fiona Fung: coding is no longer the bottleneck — verification, review, maintenance are; shift-left; TDD loses its tax; PR-cycle-time funnel analysis | |
| Vibe Coding vs. Agentic Engineering | Vibe coding raises the floor (anyone builds); agentic engineering preserves the quality bar while going faster; ">10x and widening"; hire on big projects, not puzzles | |
| Compute Allocator | The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding invested in alignment/communication; abundance mindset | |
| Disposable Micro-Apps | Throwaway custom UIs built per-task to edit a plan ("micro-software on top of micro-software"); copy-back-to-markdown; rational under the abundance mindset | |
| HTML as the New Markdown | Thariq Shihipar's thesis: as models improve, thousand-line markdown plans overwhelm the *human*; HTML artifacts (visual, interactive) keep humans in the loop. The model-facing harness shrinks while this human-facing harness grows | |
| Living Design System | `design_system.html` extracted from repos as a portable, human- and machine-readable source of truth; component playgrounds; bridges engineering ↔ non-technical stakeholders | |
| Agentic Technical Debt | Debt that *compounds* (not just accumulates) because each agentic-coding session re-derives architectural decisions without persistent CLAUDE.md; surfaces late as a forced rewrite | |
| AI-Native Startup Lifecycle | Anthropic's May 2026 reframing of Idea/MVP/Launch/Scale assuming AI infrastructure: each stage's headcount/capital/skill gates dissolve; lean unicorn as deliberate target | |
| Compounding Data Moat | Anthropic's prescription for Scale-stage defensibility: time-locked behavioral fingerprint + domain-encoded edge cases + workflow lock-in via APIs/integrations beyond what migration agents can port | |
| Evals as Product Spec | Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done looks like for ambiguous AI features; companion to introspection (hypothesis) and vibe-check (direction) | |
| Founder as Agent Orchestrator | Founder role shift: less individual contributor, more orchestrator of specialized AI assistants; non-technical founders unblocked; lean 10-person unicorn structurally enabled | |
| MCP and Computer Use | Anthropic's two complementary connector mechanisms: MCP for structured programmatic access (Salesforce/Drive/Gmail/Slack/Figma + niche industry systems); computer use as the GUI-driving catchall when no MCP exists; Boris Cherny's "to the model, it's just tokens" | |
| Problem-Solution Fit Discipline | Idea-stage thesis: three defenses against premature building (time, resources, belief friction) all eroded; AI as devil's advocate is the antidote to confirmation-bias-with-research-engine | |
| Zero-Friction Scope Creep | MVP failure mode when agentic coding removes the cost-based forcing function against scope creep; antidote is written scope + evidence-based amendment criteria | |
| Encoder-Free Early Fusion | Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch hMLP for frames, flow head for audio out, all co-trained from scratch in one transformer | |
| Full-Duplex Interaction | Perceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speech, live translation/commentary, time-aware speech — all special cases of model behavior | |
| Interaction / Background Model Split | Dual-model architecture: time-aware interaction model stays present; async background model handles deep reasoning/tools; rich-context-package delegation; "reasoning-model planning at non-thinking latency" | |
| Interaction Models | Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via harness; interactivity scales with intelligence only if it's in the model | |
| Interactivity Benchmarks | FD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (visual proactivity); TML-Interaction-Small: 0.40s turn-taking latency, dominates interaction quality | |
| The Bitter Lesson | Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolving harnesses into models; caveat — mechanical verification and character may not migrate inward | |
| Time-Aligned Micro-Turns | The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; streaming-sessions inference (upstreamed to SGLang), latency-tuned MoE kernels, bitwise trainer-sampler alignment | |
| Turn-Based Interface Bottleneck | Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out by the interface, not the work; less-intelligent harness (VAD/turn-detection) should dissolve | |
| Agentic Misalignment (AM) | Lynch et al. 2025 eval and threat model: LLM email-agent discovers it may be deleted, can take harmful actions; OOD relative to conversational AFT; primary eval surface for Model Spec Midtraining | |
| AI Brain Fry | Kropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognitive cost surface for both tool and employee framings | |
| AI Employee Framing | Kropp et al. (HBR May 2026, n=1,261): framing AI agents as "employees" vs "tools" cuts personal accountability −9pp, increases escalation +44%, reduces error catching −18%, no adoption gain | |
| Alignment Fine-Tuning (AFT) | Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec Midtraining | |
| Chain-of-Thought Monitorability | Korbak et al. 2025: chain-of-thought traces are a fragile monitor; direct CoT training compromises faithfulness; MSM offers an alternative path | |
| Deliberative Alignment | Guan et al. 2025 (OpenAI): SFT on (prompt, CoT, response) tuples with spec-grounded CoT; strongest non-MSM baseline; risks compromising Cot Monitorability | |
| Human-AI Accountability Redesign | HBR five-pillar prescription: span-of-control redesign, role redesign, performance management reset, decision-rights/escalation/consequences, agentic-unit-not-human-role design | |
| Model Spec Midtraining (MSM) | New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT generalization; cuts agentic misalignment 54%→7%; beats deliberative alignment baseline | |
| Model Spec Science | Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > general "be ethical" framing; first concrete examples in Li et al. 2026 | |
| Synthetic Document Finetuning (SDF) | Wang et al. 2025 technique for modifying model beliefs via fine-tuning on synthetic documents; foundation that Model Spec Midtraining builds on | |
| Agent Loop Pattern | `/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, parallel fan-out, "loops are the future" | |
| AI Native Product Cadence | Cat Wu's 6mo→1mo→1day cadence at Anthropic: research-preview branding, mission-as-tiebreaker, evergreen launch room, lighter PRDs, weekly metrics readouts | |
| Claude Character as Product | Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the harness asset that *doesn't* shrink | |
| Context Window Smart Zone | Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised context; clear-and-restart > compaction; status-line token counting as essential discipline | |
| Deep Modules for Agents | Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in fresh context; Sandcastle three-agent pattern | |
| Design Concept Grilling | Matt Pocock's `grill-me` skill; reach Brooks "design concept" before any plan; counter to specs-to-code; PRD as destination doc, Kanban as journey doc | |
| Engineer PM Convergence | Generalists across disciplines; product taste as bottleneck skill; Anthropic Claude Code team as case study; "just do things" cultural substrate | |
| Harness Shrinkage as Models Improve | Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from now" claim; mechanical verification stays load-bearing | |
| Model Introspection Feedback | Cat Wu's underrated technique: ask the model why it failed; treat answer as harness-debugging signal not model criticism; caveats around model self-report fidelity | |
| Printing Press Software Democratization | Boris Cherny's analogy: 1400s literacy expansion → AI software-writing expansion; domain knowledge displaces coding skill; 10× more disruption-grade startups predicted | |
| Seven Powers Applied to AI | Helmer/Acquired framework re-evaluated for AI: switching costs and process power erode; network effects, scale, cornered resources persist; counter-positioning amplifies | |
| Vertical Slice Tracer Bullets | Pragmatic-Programmer tracer-bullet pattern applied to agent task decomposition; vertical slices > horizontal layers; Kanban-with-blocking-edges over numbered phase plans | |
| Codex App Server Protocol | JSON-RPC stdio protocol for headless Codex sessions: initialize/initialized/thread-start/turn-start handshake, continuation turns reuse thread_id, dynamic tool calls for token-isolated tool injection | |
| Ticket-Driven Agent Orchestration | The inversion that makes Symphony work: tickets as units of work (not sessions/PRs), DAG dependencies, agent-extensible work graph, "objectives not transitions" | |
| Claude Code Auto Mode | Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground between default and `--dangerously-skip-permissions` | |
| Client-Side Agent Optimization | AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server-side serving; the combo abstraction; 13–32× cost gaps between best/worst combinations | |
| Scale-Dependent Prompt Sensitivity | Large models underperform small ones on 7.7% of standard benchmarks due to overthinking; brevity constraints recover 26pp and fully reverse hierarchy on GSM8K/MMLU-STEM | |
| Agent Harness Engineering | Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical architecture enforcement, agent code review | |
| Claude Code Best Practices | Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→code workflow, environment config | |
| LLM-as-Compiler Knowledge Base | Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4-phase ingest→compile→query→lint pipeline | |
| LLM-Driven Vulnerability Research | Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and Anthropic's Project Glasswing response |