H
Howardism
Plate IIEntitiesHOWARDISM

Gemini Enterprise Agent Platform

PublishedJuly 2, 2026FiledEntityDomainEntitiesTagsEntityPlatformGoogleEvaluationReading3 minSourceAI-synthesised

*Entity.* Google Cloud's agent platform: the GenAI evaluation service with adaptive AutoRaters (built with DeepMind), User Simulator, Automatic Loss Analysis, Online Monitors, OTel tracing, and the ADK/agents-cli toolchain; ships the quality-flywheel eval skill in two packages

Illustration for Gemini Enterprise Agent Platform

Sources#

Summary#

Google Cloud's platform for building, running, and evaluating agents — in this corpus, the infrastructure underneath the Agent Quality Flywheel. Its evaluation stack is the notable part: a GenAI evaluation service whose AutoRaters (developed with Google DeepMind, and per Google the same ones used on its own models and first-party agents) are adaptive model-based judges — for a multi-turn agent they extract user intent from the conversation, generate per-case rubrics, validate the trace against each criterion, and majority-vote across samples.

Components referenced#

  • GenAI evaluation service — the independent grader in the flywheel's optimizer/evaluator split; predefined multi-turn AutoRaters (multi_turn_task_success, multi_turn_trajectory_quality) plus custom rubric metrics.
  • User Simulator — synthesizes multi-turn scenarios for cold-start evaluation before real traffic exists.
  • Automatic Loss Analysis — clusters failure verdicts when failures number ten or more.
  • Online Monitors — continuously evaluate live production traffic and write quality scores to Cloud Monitoring.
  • OTel tracing — agents emit OpenTelemetry traces (ADK does by default); production traces double as eval datasets.
  • ADK (Agent Development Kit) + agents-cli — Google's agent framework and CLI toolchain; the adk-samples agents are the flywheel's demo subjects.
  • The two skill packagesgoogle-agents-cli-eval (ADK/agents-cli) and agent-platform-eval-flywheel (Evaluation SDK, any framework), installed via npx skills add … from skills.sh.

Position in the corpus#

The Google-side counterpart to Claude Code's and Codex's agent stacks — but where those entries anchor building with agents, this platform's corpus role is measuring them: it packages evaluation (judges, simulators, monitors) as the product surface. That the delivery mechanism is a skill driven by whatever coding agent you already use is itself evidence for the skills-as-distribution-unit pattern (Agentic Work Systematization).

Connections#

Sources#

§ end
About this piece

Articles in this journal are synthesised by AI agents from a curated wiki and are refreshed automatically as new concepts arrive. Topics, framing, and editorial direction are curated by Howardism.

Cited by 4
  • Agent Quality Flywheel

    Google's eval-fix loop packaged as a skill your coding agent drives: Build & Test → Ship & Monitor → Learn & Refine, ex…

  • Google DeepMind

    Google's AI lab; built AlphaProof Nexus; Gemini models, AlphaProof, AlphaEvolve; opens the AI-for-mathematics domain an…

  • LLM-as-a-Judge

    Using one LLM to grade another's outputs against criteria/rubrics; DRACO's protocol is per-criterion binary MET/UNMET +…

  • Entities — People, Orgs, Tools & Projects

    Map of Content for all 39 entity pages. See Home for concept domains.

Related articles
  • Agent Quality Flywheel

    Google's eval-fix loop packaged as a skill your coding agent drives: Build & Test → Ship & Monitor → Learn & Refine, ex…

  • Evals as Product Spec

    Cat Wu's framing of evals as the emerging core PM skill: ten great evals beats a hundred mediocre; encode what done loo…

  • Google DeepMind

    Google's AI lab; built AlphaProof Nexus; Gemini models, AlphaProof, AlphaEvolve; opens the AI-for-mathematics domain an…

  • LLM-as-a-Judge

    Using one LLM to grade another's outputs against criteria/rubrics; DRACO's protocol is per-criterion binary MET/UNMET +…

  • Loop Engineering

    Replacing yourself as the agent's prompter by designing the system that prompts it: a recursive-goal loop built from fi…