Howardism · Vol. 03Plate II · No. 02
Writing, in order.
Pieces59Sections4Oldest10 Apr 2026Newest13 May 2026
02
Plate II · Writing
A dense index of every article in the wiki, grouped by kind. Hover any title for a preview, click to read.
Concept
39 pieces| Title | Summary | Date |
|---|---|---|
| Encoder-Free Early Fusion | Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch hMLP for frames, flow head for audio out, all co-trained from scratch in one transformer | |
| Full-Duplex Interaction | Perceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speech, live translation/commentary, time-aware speech — all special cases of model behavior | |
| Interaction / Background Model Split | Dual-model architecture: time-aware interaction model stays present; async background model handles deep reasoning/tools; rich-context-package delegation; "reasoning-model planning at non-thinking latency" | |
| Interaction Models | Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via harness; interactivity scales with intelligence only if it's in the model | |
| Interactivity Benchmarks | FD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (visual proactivity); TML-Interaction-Small: 0.40s turn-taking latency, dominates interaction quality | |
| The Bitter Lesson | Sutton 2019: scaled general methods beat hand-engineered structure; recurring justification across the wiki for dissolving harnesses into models; caveat — mechanical verification and character may not migrate inward | |
| Time-Aligned Micro-Turns | The core interaction-model move: input/output as continuous streams in ~200ms interleaved chunks, no turn boundaries; streaming-sessions inference (upstreamed to SGLang), latency-tuned MoE kernels, bitwise trainer-sampler alignment | |
| Turn-Based Interface Bottleneck | Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out by the interface, not the work; less-intelligent harness (VAD/turn-detection) should dissolve | |
| Agentic Misalignment (AM) | Lynch et al. 2025 eval and threat model: LLM email-agent discovers it may be deleted, can take harmful actions; OOD relative to conversational AFT; primary eval surface for Model Spec Midtraining | |
| AI Brain Fry | Kropp et al. 2026/03: mental fatigue from excessive AI oversight increases minor errors +11%, major errors +39%; cognitive cost surface for both tool and employee framings | |
| AI Employee Framing | Kropp et al. (HBR May 2026, n=1,261): framing AI agents as "employees" vs "tools" cuts personal accountability −9pp, increases escalation +44%, reduces error catching −18%, no adoption gain | |
| Alignment Fine-Tuning (AFT) | Standard post-pretraining stage (SFT + RLHF) for installing values; shallow-alignment failure mode motivates Model Spec Midtraining | |
| Chain-of-Thought Monitorability | Korbak et al. 2025: chain-of-thought traces are a fragile monitor; direct CoT training compromises faithfulness; MSM offers an alternative path | |
| Deliberative Alignment | Guan et al. 2025 (OpenAI): SFT on (prompt, CoT, response) tuples with spec-grounded CoT; strongest non-MSM baseline; risks compromising Cot Monitorability | |
| Human-AI Accountability Redesign | HBR five-pillar prescription: span-of-control redesign, role redesign, performance management reset, decision-rights/escalation/consequences, agentic-unit-not-human-role design | |
| Model Spec Midtraining (MSM) | New training phase between pretrain and AFT: train base model on synthetic docs discussing the Model Spec; controls AFT generalization; cuts agentic misalignment 54%→7%; beats deliberative alignment baseline | |
| Model Spec Science | Empirical study of which Model Spec features best generalize alignment; value explanations > rules alone, specific > general "be ethical" framing; first concrete examples in Li et al. 2026 | |
| Synthetic Document Finetuning (SDF) | Wang et al. 2025 technique for modifying model beliefs via fine-tuning on synthetic documents; foundation that Model Spec Midtraining builds on | |
| Agent Loop Pattern | `/loop` (cron-scheduled) and Ralph Wiggum (backlog-draining) loops as next-generation agent primitive; AFK execution, parallel fan-out, "loops are the future" | |
| AI Native Product Cadence | Cat Wu's 6mo→1mo→1day cadence at Anthropic: research-preview branding, mission-as-tiebreaker, evergreen launch room, lighter PRDs, weekly metrics readouts | |
| Claude Character as Product | Personality as load-bearing product surface; Amanda's role at Anthropic; lunchtime vibe-checks as eval discipline; the harness asset that *doesn't* shrink | |
| Context Window Smart Zone | Smart zone vs dumb zone (Dex Hardy / Matt Pocock): quadratic attention scaling, ~100K marker independent of advertised context; clear-and-restart > compaction; status-line token counting as essential discipline | |
| Deep Modules for Agents | Ousterhout deep-vs-shallow modules applied to agent-friendly codebases; push-vs-pull instruction delivery; reviewer in fresh context; Sandcastle three-agent pattern | |
| Design Concept Grilling | Matt Pocock's `grill-me` skill; reach Brooks "design concept" before any plan; counter to specs-to-code; PRD as destination doc, Kanban as journey doc | |
| Engineer PM Convergence | Generalists across disciplines; product taste as bottleneck skill; Anthropic Claude Code team as case study; "just do things" cultural substrate | |
| Harness Shrinkage as Models Improve | Prompt scaffolding shrinks each model release; Cat Wu's pruning discipline; Boris Cherny "100 lines of code a year from now" claim; mechanical verification stays load-bearing | |
| Model Introspection Feedback | Cat Wu's underrated technique: ask the model why it failed; treat answer as harness-debugging signal not model criticism; caveats around model self-report fidelity | |
| Printing Press Software Democratization | Boris Cherny's analogy: 1400s literacy expansion → AI software-writing expansion; domain knowledge displaces coding skill; 10× more disruption-grade startups predicted | |
| Seven Powers Applied to AI | Helmer/Acquired framework re-evaluated for AI: switching costs and process power erode; network effects, scale, cornered resources persist; counter-positioning amplifies | |
| Vertical Slice Tracer Bullets | Pragmatic-Programmer tracer-bullet pattern applied to agent task decomposition; vertical slices > horizontal layers; Kanban-with-blocking-edges over numbered phase plans | |
| Codex App Server Protocol | JSON-RPC stdio protocol for headless Codex sessions: initialize/initialized/thread-start/turn-start handshake, continuation turns reuse thread_id, dynamic tool calls for token-isolated tool injection | |
| Ticket-Driven Agent Orchestration | The inversion that makes Symphony work: tickets as units of work (not sessions/PRs), DAG dependencies, agent-extensible work graph, "objectives not transitions" | |
| Claude Code Auto Mode | Claude Code permission mode using a classifier to auto-approve safe tool calls and block risky ones; middle ground between default and `--dangerously-skip-permissions` | |
| Client-Side Agent Optimization | AgentOpt's framing of developer-controlled agent optimization (model-per-role, budget, routing) as distinct from server-side serving; the combo abstraction; 13–32× cost gaps between best/worst combinations | |
| Scale-Dependent Prompt Sensitivity | Large models underperform small ones on 7.7% of standard benchmarks due to overthinking; brevity constraints recover 26pp and fully reverse hierarchy on GSM8K/MMLU-STEM | |
| Agent Harness Engineering | Patterns for scaffolding long-running LLM agents: environment design, progressive context disclosure, mechanical architecture enforcement, agent code review | |
| Claude Code Best Practices | Anthropic's guide to effective Claude Code usage: context management, verification-driven development, explore→plan→code workflow, environment config | |
| LLM-as-Compiler Knowledge Base | Karpathy's architecture: LLM incrementally compiles raw docs into a persistent interlinked wiki, replacing RAG with a 4-phase ingest→compile→query→lint pipeline | |
| LLM-Driven Vulnerability Research | Claude Mythos Preview's emergent cybersecurity capabilities: autonomous zero-day discovery, full exploit chains, and Anthropic's Project Glasswing response |
Entity
14 pieces| Title | Summary | Date |
|---|---|---|
| Thinking Machines Lab | AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-sessions to SGLang; benchmarks against GPT-realtime / Gemini-live; research grants open | |
| TML-Interaction-Small | TML's first interaction model: 276B MoE / 12B active, audio+video+text in / text+audio out, 200ms micro-turns, async background agent; best turn-taking latency of any model; research preview May 2026 | |
| Chloe Li | Lead author of MSM paper (arXiv 2605.02087); Anthropic Fellows Program; designed all specs and experiments | |
| Claude's Constitution / Model Spec | Anthropic Model Spec / Constitution by Askell et al.; document specifying Claude's values + hard constraints (SP1–3, GP1–2); now also a direct training input via MSM | |
| Anthropic | AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs round 2 | |
| Boris Cherny | Creator of Claude Code at Anthropic; phone-driven workflow with hundreds of agents; primary advocate of `/loop` primitive; "coding is solved (for me)" thesis | |
| Cat Wu | Head of Product for Claude Code and Cowork at Anthropic; primary articulator of AI-native product cadence and engineer-PM convergence | |
| Claude Code | Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE surfaces; central tool across all 2026 sources | |
| Cowork | Anthropic's non-code knowledge-work agent product; sibling to Claude Code; output is decks/inbox/dossiers; same MCP/computer-use primitives | |
| Matt Pocock | Independent AI-coding educator; built Sandcastle library; smart-zone/grill-me/tracer-bullets pedagogical framing; "bad code bases make bad agents" | |
| Mythos Model | Anthropic preview-tier frontier model; gated for safety; used internally alongside Opus 4.7; descendant expected to ship publicly later | |
| Hermes Agent | Nous Research's CLI agent + Gateway daemon (Telegram/Discord/Slack/WhatsApp); AGENTS.md/SOUL.md context split, bounded memory files, DM-pairing auth, container-as-security-boundary model | |
| Symphony | OpenAI's open-source agent orchestrator (March 2026): turns Linear into a control plane for Codex, per-issue workspace, daemon-driven, SPEC.md-as-product, hedged 500% landed-PRs claim | |
| Claude Opus 4.7 | GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokenizer inflation, new `xhigh` effort, first post-Glasswing safeguards |
Essay
5 pieces| Title | Summary | Date |
|---|---|---|
| Opinions on Using AI Tools & the Future of the Software Engineering Role | Debate map of four stances on using AI tools (bullish-insider / pragmatist-practitioner / skeptic-governance / architecture-thesis) + synthesis on the future SWE role: coding→deciding/verifying, role convergence, what stays human, which moats survive, honest caveats | |
| Learning to Co-Work with AI: A Software Engineer's Field Guide | Field guide for software engineers in the AI era: 6 skill clusters (taste, harness, alignment-first planning, agent-friendly architecture, verification, strategic positioning), daily practices, anti-patterns, 90-day plan | |
| Opus 4.6 → 4.7 Changes and Multi-Agent Coding Considerations | 4.6→4.7 delta table + six hazards for multi-agent coding teams: role-based model selection, prompt re-tuning, harness invariants, per-agent context budget, unattended-fan-out safety, independent reviewer | |
| When to Use Claude Opus 4.6 for Work | Decision rules for Opus 4.6 deployment: solver-not-planner, elaboration-load-bearing tasks, brevity constraints, Pareto frontier check | |
| What Are AI Tools? | Overview of AI tools landscape and categories |
Index
1 piece| Title | Summary | Date |
|---|---|---|
| Operations Log | Chronological record of wiki operations. Each entry uses the format: `## [YYYY-MM-DD] operation | Subject` |