ViteHub is still experimental. Expect bugs and breaking changes.

Agent Evals

Run repeatable behavior checks against Agent Definitions from the ViteHub CLI.

Agent Evals are repeatable development checks that run an Agent Definition against cases and score the resulting Agent Invocations. Use them when a model, instruction, Capability, or Workspace change needs behavior proof.

Define an eval

Keep eval files close to the Agent Definition they protect. The Agent Eval Runner discovers evals and writes an Evalite config with ViteHub defaults.

server/agents/support.eval.ts
import { defineEval, textContains } from '@vite-hub/agent/eval'
import support from './support'

export default defineEval({
  agent: support,
  scenarios: [
    {
      name: 'answers from docs',
      input: { prompt: 'How do I run provisioning?' },
      scorers: [
        textContains('vitehub provision'),
      ],
    },
  ],
})

Run the evals

Run all discovered Agent Evals with no target. Pass a path when you want one Agent Eval Target.

Terminal
pnpm vitehub agent eval
pnpm vitehub agent eval server/agents/support.eval.ts

Use output and threshold options in CI-shaped checks.

Terminal
pnpm vitehub agent eval --threshold 0.9 --output .vitehub/evals/support.json --hide-table

Configure defaults

Agent Eval defaults belong under the Agent Package integration options. Use this when the app needs repeatable local and CI behavior.

vite.config.ts
import { hubAgent } from '@vite-hub/agent/vite'
import { defineConfig } from 'vite'

export default defineConfig({
  plugins: [
    hubAgent({
      eval: {
        maxConcurrency: 2,
        scoreThreshold: 0.85,
        testTimeout: 60_000,
      },
    }),
  ],
})

What to score

BehaviorUseful assertion
Source-grounded answerExpected citation, phrase, or refusal when the Source does not answer.
Capability behaviorTool was used, rejected, or omitted as expected.
Access boundaryScoped-out Workspace content does not appear in the answer.
Cost or latencyAgent Usage Record stays within the expected budget when telemetry is attached.

Next steps

  • Use CLI for command options.
  • Use DevTools for interactive Agent Invocation inspection.
  • Use Runtime events for Agent Usage and Trace Event language.
Copyright © 2026