Howardism · Vol. 03Plate II · No. 02
Interaction, in order.
Notes12TopicInteractionOldest6 May 2026Newest23 May 2026
Real-time multimodal, full-duplex, and human-AI collaboration.
| Title | Summary | Date |
|---|---|---|
| Orchestration vs Employee Framing: Reconciling the Founder's Playbook with HBR's Accountability Evidence | Reconciles the Founder's Playbook orchestration framings with HBR Kropp et al.'s accountability evidence; "orchestration as workflow design" survives the critique; "orchestration as mental model of agents-as-coworkers" does not; operational checklist for the disciplined founder | |
| Compute Allocator | The human's evolving role: deciding what's worth spending compute on; ~1% of generated tokens ship, 99% is scaffolding invested in alignment/communication; abundance mindset | |
| Disposable Micro-Apps | Throwaway custom UIs built per-task to edit a plan ("micro-software on top of micro-software"); copy-back-to-markdown; rational under the abundance mindset | |
| Living Design System | `design_system.html` extracted from repos as a portable, human- and machine-readable source of truth; component playgrounds; bridges engineering ↔ non-technical stakeholders | |
| Does the Human-Facing Harness (HTML Artifacts) Hit Its Own Bloat Ceiling? | Yes — HTML raises and reshapes the human-attention ceiling but can't remove it; bloat relocates from document-length to artifact-sprawl/rubber-stamping; the ceiling gets *more* binding as models improve (inverse of the shrinking model-facing harness) | |
| Encoder-Free Early Fusion | Multimodal design with minimal pre-processing instead of large standalone encoders: dMel audio embedding, 40×40-patch hMLP for frames, flow head for audio out, all co-trained from scratch in one transformer | |
| Full-Duplex Interaction | Perceive-and-respond simultaneously across modalities; proactive interjection, visual-cue reactions, simultaneous speech, live translation/commentary, time-aware speech — all special cases of model behavior | |
| Interaction / Background Model Split | Dual-model architecture: time-aware interaction model stays present; async background model handles deep reasoning/tools; rich-context-package delegation; "reasoning-model planning at non-thinking latency" | |
| Interaction Models | Thinking Machines Lab (May 2026): models that handle audio/video/text interaction natively in real time instead of via harness; interactivity scales with intelligence only if it's in the model | |
| Interactivity Benchmarks | FD-bench, Audio MultiChallenge + new TimeSpeak/CueSpeak (proactive audio) and RepCount-A/ProactiveVideoQA/Charades (visual proactivity); TML-Interaction-Small: 0.40s turn-taking latency, dominates interaction quality | |
| Turn-Based Interface Bottleneck | Why current AI interfaces limit collaboration: single-thread turn-taking is a bandwidth bottleneck; humans pushed out by the interface, not the work; less-intelligent harness (VAD/turn-detection) should dissolve | |
| Learning to Co-Work with AI: A Software Engineer's Field Guide | Field guide for software engineers in the AI era: 6 skill clusters (taste, harness, alignment-first planning, agent-friendly architecture, verification, strategic positioning), daily practices, anti-patterns, 90-day plan |