ViteHub is still experimental. Expect bugs and breaking changes.
Agent Evals
Run repeatable behavior checks against Agent Definitions from the ViteHub CLI.
Agent Evals are repeatable development checks that run an Agent Definition against cases and score the resulting Agent Invocations. Use them when a model, instruction, Capability, or Workspace change needs behavior proof.
Define an eval
Keep eval files close to the Agent Definition they protect. The Agent Eval Runner discovers evals and writes an Evalite config with ViteHub defaults.
server/agents/support.eval.ts
import { defineEval, textContains } from '@vite-hub/agent/eval'
import support from './support'
export default defineEval({
agent: support,
scenarios: [
{
name: 'answers from docs',
input: { prompt: 'How do I run provisioning?' },
scorers: [
textContains('vitehub provision'),
],
},
],
})
Run the evals
Run all discovered Agent Evals with no target. Pass a path when you want one Agent Eval Target.
Terminal
pnpm vitehub agent eval
pnpm vitehub agent eval server/agents/support.eval.ts
Use output and threshold options in CI-shaped checks.
Terminal
pnpm vitehub agent eval --threshold 0.9 --output .vitehub/evals/support.json --hide-table
Configure defaults
Agent Eval defaults belong under the Agent Package integration options. Use this when the app needs repeatable local and CI behavior.
vite.config.ts
import { hubAgent } from '@vite-hub/agent/vite'
import { defineConfig } from 'vite'
export default defineConfig({
plugins: [
hubAgent({
eval: {
maxConcurrency: 2,
scoreThreshold: 0.85,
testTimeout: 60_000,
},
}),
],
})
What to score
| Behavior | Useful assertion |
|---|---|
| Source-grounded answer | Expected citation, phrase, or refusal when the Source does not answer. |
| Capability behavior | Tool was used, rejected, or omitted as expected. |
| Access boundary | Scoped-out Workspace content does not appear in the answer. |
| Cost or latency | Agent Usage Record stays within the expected budget when telemetry is attached. |
Next steps
- Use CLI for command options.
- Use DevTools for interactive Agent Invocation inspection.
- Use Runtime events for Agent Usage and Trace Event language.