ViteHub is still experimental. Expect bugs and breaking changes.

Agent Evals

Run repeatable behavior checks against Agent Definitions from the ViteHub CLI.

Agent Evals are repeatable development checks that run an Agent Definition against cases and score the resulting Agent Invocations. Use them when a model, instruction, Capability, or Workspace change needs behavior proof.

Define an eval

Keep eval files close to the Agent Definition they protect. The Agent Eval Runner discovers evals and writes an Evalite config with ViteHub defaults.

server/agents/support.eval.ts

import { defineEval, textContains } from '@vite-hub/agent/eval'
import support from './support'

export default defineEval({
  agent: support,
  scenarios: [
    {
      name: 'answers from docs',
      input: { prompt: 'How do I run provisioning?' },
      scorers: [
        textContains('vitehub provision'),
      ],
    },
  ],
})

Run the evals

Run all discovered Agent Evals with no target. Pass a path when you want one Agent Eval Target.

Terminal

pnpm vitehub agent eval
pnpm vitehub agent eval server/agents/support.eval.ts

Use output and threshold options in CI-shaped checks.

Terminal

pnpm vitehub agent eval --threshold 0.9 --output .vitehub/evals/support.json --hide-table

Configure defaults

Agent Eval defaults belong under the Agent Package integration options. Use this when the app needs repeatable local and CI behavior.

vite.config.ts

import { hubAgent } from '@vite-hub/agent/vite'
import { defineConfig } from 'vite'

export default defineConfig({
  plugins: [
    hubAgent({
      eval: {
        maxConcurrency: 2,
        scoreThreshold: 0.85,
        testTimeout: 60_000,
      },
    }),
  ],
})

What to score

Behavior	Useful assertion
Source-grounded answer	Expected citation, phrase, or refusal when the Source does not answer.
Capability behavior	Tool was used, rejected, or omitted as expected.
Access boundary	Scoped-out Workspace content does not appear in the answer.
Cost or latency	Agent Usage Record stays within the expected budget when telemetry is attached.

Next steps

Use CLI for command options.
Use DevTools for interactive Agent Invocation inspection.
Use Runtime events for Agent Usage and Trace Event language.