Gemini Enterprise Agent Platform

Sources#

Driving the Agent Quality Flywheel from Your Coding Agent- Google Developers Blog

Summary#

Google Cloud's platform for building, running, and evaluating agents — in this corpus, the infrastructure underneath the Agent Quality Flywheel. Its evaluation stack is the notable part: a GenAI evaluation service whose AutoRaters (developed with Google DeepMind, and per Google the same ones used on its own models and first-party agents) are adaptive model-based judges — for a multi-turn agent they extract user intent from the conversation, generate per-case rubrics, validate the trace against each criterion, and majority-vote across samples.

Components referenced#

GenAI evaluation service — the independent grader in the flywheel's optimizer/evaluator split; predefined multi-turn AutoRaters (multi_turn_task_success, multi_turn_trajectory_quality) plus custom rubric metrics.
User Simulator — synthesizes multi-turn scenarios for cold-start evaluation before real traffic exists.
Automatic Loss Analysis — clusters failure verdicts when failures number ten or more.
Online Monitors — continuously evaluate live production traffic and write quality scores to Cloud Monitoring.
OTel tracing — agents emit OpenTelemetry traces (ADK does by default); production traces double as eval datasets.
ADK (Agent Development Kit) + agents-cli — Google's agent framework and CLI toolchain; the adk-samples agents are the flywheel's demo subjects.
The two skill packages — google-agents-cli-eval (ADK/agents-cli) and agent-platform-eval-flywheel (Evaluation SDK, any framework), installed via npx skills add … from skills.sh.

Position in the corpus#

The Google-side counterpart to Claude Code's and Codex's agent stacks — but where those entries anchor building with agents, this platform's corpus role is measuring them: it packages evaluation (judges, simulators, monitors) as the product surface. That the delivery mechanism is a skill driven by whatever coding agent you already use is itself evidence for the skills-as-distribution-unit pattern (Agentic Work Systematization).

Connections#

Agent Quality Flywheel — the methodology this platform ships and executes
Google DeepMind — co-developer of the AutoRaters at the evaluation stack's core
Optimizer–Evaluator Decoupling — the evaluation service is the independent grader the rule requires

Sources#

Driving the Agent Quality Flywheel from Your Coding Agent- Google Developers Blog — Melnyk & Dai, 2026-06-30 (vendor-claim)