H
Howardism
Howardism · Vol. 03Plate II · No. 02

Entities, in order.

Notes24DomainEntitiesOpen Qs28Newest28 May 2026Oldest17 Apr 2026

Profiles of the people, labs, products, and projects.

Map of Content for all 24 entity pages. See Home for concept domains.

  • AlphaProof Nexus — DeepMind framework for LLM-aided Lean proof generation; four agents (basic→full-featured); proof-sketch + EVOLVE-BLOCK interface; SafeVerify
  • Andrej Karpathy — Co-founder OpenAI, ex-Tesla AI, Eureka Labs; coined "vibe coding," Software 1/2/3.0, "ghosts not animals," "agentic engineering"; originated the LLM-wiki pattern this vault runs on
  • Anthropic — AI safety company / vendor of Claude; mission-as-tiebreaker culture; ~30–40 PMs across teams; Mike Krieger leads Labs round 2
  • Boris Cherny — Creator of Claude Code at Anthropic; phone-driven workflow with hundreds of agents; primary advocate of /loop primitive; "coding is solved (for me)" thesis
  • Campfire — AI-native ERP (YC S23) pulling customers off NetSuite; custom foundation model + agent platform; Series B (Accel/Ribbit); doubling ARR/quarter since Q4 2024
  • Cat Wu — Head of Product for Claude Code and Cowork at Anthropic; primary articulator of AI-native product cadence and engineer-PM convergence
  • Chloe Li — Lead author of MSM paper (arXiv 2605.02087); Anthropic Fellows Program; designed all specs and experiments
  • Claire Vo — Host of the "How I AI" interview series (ChatPRD); interviewed Thariq Shihipar; runs a parallel component-visualization practice for non-technical stakeholders
  • Claude Code — Anthropic's agentic coding product; created by Boris Cherny late 2024; TypeScript/React; CLI/desktop/web/mobile/IDE surfaces; central tool across all 2026 sources
  • Claude's Constitution / Model Spec — Anthropic Model Spec / Constitution by Askell et al.; document specifying Claude's values + hard constraints (SP1–3, GP1–2); now also a direct training input via MSM
  • Claude Opus 4.7 — GA frontier model from Anthropic; direct upgrade to 4.6 at same price; literal instruction following, 1.0–1.35× tokenizer inflation, new xhigh effort, first post-Glasswing safeguards
  • Cowork — Anthropic's non-code knowledge-work agent product; sibling to Claude Code; output is decks/inbox/dossiers; same MCP/computer-use primitives
  • Fiona Fung — Leads engineering + product for Claude Code and Cowork at Anthropic (ex-Meta/Microsoft); "what served you prior may no longer"; rewrote team norms for the AI-native org
  • Google DeepMind — Google's AI lab; built AlphaProof Nexus; Gemini models, AlphaProof, AlphaEvolve; opens the AI-for-mathematics domain in this wiki
  • Hermes Agent — Nous Research's CLI agent + Gateway daemon (Telegram/Discord/Slack/WhatsApp); AGENTS.md/SOUL.md context split, bounded memory files, DM-pairing auth, container-as-security-boundary model
  • John Glasgow — CEO/founder of Campfire; 10yr corporate finance; founder-led-sales advocate; long-horizon "last job I'll ever have"
  • Lean — Proof assistant whose compiler mechanically verifies every step; the sorry placeholder enables proof sketches; mathlib maturity gates the reachable frontier
  • Matt Pocock — Independent AI-coding educator; built Sandcastle library; smart-zone/grill-me/tracer-bullets pedagogical framing; "bad code bases make bad agents"
  • Mythos Model — Anthropic preview-tier frontier model; gated for safety; used internally alongside Opus 4.7; descendant expected to ship publicly later
  • OWASP — Open Worldwide Application Security Project; source of the agentic threat taxonomy cited throughout Anthropic's Zero Trust framework, coined the term 'least agency', and maintains the AI-BOM (CycloneDX ML-BOM extension)
  • Symphony — OpenAI's open-source agent orchestrator (March 2026): turns Linear into a control plane for Codex, per-issue workspace, daemon-driven, SPEC.md-as-product, hedged 500% landed-PRs claim
  • Thariq Shihipar — Engineer on the Claude Code team at Anthropic; "HTML is the new markdown" and "compute allocator" framings; three HTML-first workflows
  • Thinking Machines Lab — AI research lab behind interaction models (May 2026); harness-dissolves-into-model thesis; upstreamed streaming-sessions to SGLang; benchmarks against GPT-realtime / Gemini-live; research grants open
  • TML-Interaction-Small — TML's first interaction model: 276B MoE / 12B active, audio+video+text in / text+audio out, 200ms micro-turns, async background agent; best turn-taking latency of any model; research preview May 2026

Open questions 28 open

  • AlphaProof Nexus
    • The framework's reach is gated by [[lean]]'s mathlib maturity. What's the path to domains needing new theory rather than subgoal decomposition?
    • AlphaProof adds little as a soloist but helps as a tool. As the prover LLM strengthens, does the AlphaProof tool become redundant entirely?
  • Campfire
    • Campfire claims its AI edge comes from "our own foundation model." For an ERP, what does a custom foundation model actually buy over fine-tuning a frontier model — and is it durable as frontier models improve (cf. [[harness-shrinkage-as-models-improve]])?
    • "Never had anyone outgrow Campfire" — does that hold as customers reach true enterprise scale where NetSuite's breadth historically mattered?
  • Claude Opus 4.7
    • Do Hakim's (2026) brevity-constraint findings on Opus 4.6 replicate on Opus 4.7, or does the literal-instruction-following change the elasticity? Specifically: does `<50 words` still yield +13.1pp on GSM8K?
    • Does Opus 4.7 still underperform as a planner in HotpotQA-style combo sweeps, or does improved instruction-following close the gap that AgentOpt (Hua et al., 2026) identified?
    • What is the real-world token-inflation multiplier on typical Claude Code sessions (1.0–1.35× is content-dependent — what's the distribution on code-heavy vs. prose-heavy inputs)?
    • How does xhigh compare to max on coding evals? The migration guidance says "start with high or xhigh" — is max ever worth it for coding?
    • What fraction of existing CLAUDE.md / system-prompt hedges become counterproductive under literal instruction following?
  • Cowork
    • How does Cowork's harness compare to [[claude-code]]'s? Both surface skills, MCP, sub-agents — but the failure modes for non-code output differ (no test suite, no compiler, no diff to review).
    • What's the eval discipline for Cowork-class outputs? Cat Wu says memory benefits a lot from evals; unclear how slide-deck quality is measured.
  • Google DeepMind
    • DeepMind reports its bespoke systems being caught by simple loops. Does the lab's comparative advantage move from *systems* to *models + verifiers + benchmarks* (mathlib, Formal Conjectures)?
    • The paper opens AI-for-math; what's DeepMind's next target domain where a sound verifier exists?
  • Hermes Agent
    • The container backend disabling dangerous-command checks is a defensible design but a meaningful security-model shift. What's the empirical track record? Have lockdown failures in popular images (Daytona, `nikolaik/python-nodejs`) caused incidents?
    • How do bounded memory files (~2,200 chars `MEMORY.md`) hold up over long-term use? Auto-consolidation is mentioned but not specified — what's the consolidation algorithm and how lossy is it?
    • Hermes's DM-pairing flow is a clean security primitive. Why hasn't this pattern been adopted by Claude Code or Cursor for shared/team deployments?
    • The split between `AGENTS.md` (project) and `SOUL.md` (personality) is explicit in Hermes but implicit in Claude Code's `CLAUDE.md`. Does the split materially improve outcomes, or is it a documentation choice without empirical backing?
    • Cron jobs in fresh sessions with no memory — how do teams structure the "context the agent needs" without it bloating every cron prompt? Is there a standard pattern?
  • Lean
    • mathlib maturity gates the reachable frontier. Can AI formal proof search *grow* mathlib (formalize new theory) as a byproduct, expanding its own frontier?
    • Lean is a perfect verifier for math. Which other domains have a comparably sound automatic verifier (vs. only noisy ones like tests or LLM-judge councils)?
  • Mythos Model
    • Public release timeline: not in source.
    • Capability profile beyond cybersecurity: Mythos Preview focused on the safety story; other capability dimensions not well-documented externally.
    • Internal access controls: who at Anthropic actually uses Mythos for daily work, vs Opus 4.7? Boris implies infrequent (try-it use); not detailed.
  • Symphony
    • The **500% landed-PRs claim** is hedged — no baseline definition, "on some teams" only. What does the distribution look like across teams? What happens to PR *quality* and revert rate at that throughput?
    • "Workspaces preserved across runs" is the opposite of typical CI ephemerality. At what point does state pollution from prior runs (stale `node_modules`, leftover branches, build artifacts) start hurting more than warm-cache helps?
    • Symphony doesn't write to the tracker — agents do. This means tracker policy is a *prompt* in `WORKFLOW.md`. How brittle is this in practice when Linear changes its API? How is consistent state-machine behavior enforced when agents have prompt-level discretion?
    • The spec was simplified by being implemented in 6 languages. What's the extension of this technique? Could `compiler-prompt.md` in this vault be similarly cross-fuzzed?
    • Symphony explicitly says agents can self-create tickets. What governance prevents runaway ticket-graph expansion? Is human triage of agent-created tickets the only check?