Agent Skills evals.json Adapter
Overview
Section titled “Overview”Agent Skills uses evals.json for lightweight skill-scoped datasets: a prompt, optional expected outcome, optional fixture files, and natural-language assertions or expectations.
AgentV treats evals.json as a built-in read adapter input, not as a native core eval format. Detection requires a top-level skill_name string and evals array, so arbitrary .json files are still rejected. You can run a detected Agent Skills file directly:
agentv eval evals.json --target claudeOr convert it to AgentV EVAL YAML when you want to edit the generated suite:
agentv convert evals.json --out EVAL.yamlagentv eval EVAL.yaml --target claudeThis keeps AgentV’s core authoring formats focused on YAML, JSONL, and TypeScript while still making Agent Skills suites easy to onboard. The boundary is the same adapter layer used for external datasets: external schema in, AgentV-native cases at runtime.
Quick Start
Section titled “Quick Start”Create evals.json:
{ "skill_name": "csv-analyzer", "evals": [ { "id": 1, "prompt": "I have a CSV of monthly sales data in evals/files/sales.csv. Find the top 3 months by revenue.", "expected_output": "The top 3 months by revenue are November ($22,500), September ($20,100), and December ($19,400).", "files": ["evals/files/sales.csv"], "assertions": [ "Output identifies November as the highest revenue month", "Output includes exactly 3 months", "Revenue figures are included for each month" ] } ]}Run it directly or convert it first:
agentv eval evals.json --target claude
agentv convert evals.json --out EVAL.yamlagentv eval EVAL.yaml --target claudeThe --target flag selects the agent harness. The agent evaluates itself; skills load through the normal agent runtime.
CLI Surface
Section titled “CLI Surface”Run a detected Agent Skills file directly:
agentv eval evals.json --target claude --output .agentv/results/csv-analyzerImport the definition into editable AgentV YAML without running a target:
agentv convert evals.json --out EVAL.yamlPrepare one converted case for a human or external agent without running the target provider:
agentv prepare EVAL.yaml --test-id "1" --target claude --out .agentv/prepared/csv-analyzer-1agentv import is reserved for agent session transcripts and selected external
datasets such as Hugging Face. Agent Skills evals.json uses the eval read
adapter for execution and convert for definition import.
Field Mapping
Section titled “Field Mapping”The read adapter promotes evals.json fields into AgentV cases. The converter writes the same mapping to YAML:
| evals.json | EVAL.yaml output | Notes |
|---|---|---|
prompt | input | Written as prompt text |
expected_output | criteria + llm-rubric criterion | Agent Skills uses this as expected outcome/rubric context, not AgentV passive reference-data expected_output |
assertions[] | llm-rubric criteria | Strings are grouped into one rubric with one criterion per assertion |
expectations[] | llm-rubric criteria | Same handling as assertions[] |
files[] | input_files | Resolved relative to the evals.json file |
skill_name | tags.skill, description | Used for suite grouping |
id | id | Converted to a string |
The generated llm-rubric assertion emits per-criterion grading rows in AgentV artifacts just like other assertion entries. The converted YAML is the editable source of truth after conversion.
evals.json file paths map to AgentV input_files:
tests: - id: "1" input_files: - evals/files/sales.csv input: "Analyze the sales data."Use workspace.repos when the eval should materialize a repository before those fixture paths are read.
Offline Grading
Section titled “Offline Grading”Grade existing agent sessions offline by importing transcripts and running the adapter input or converted YAML:
agentv import claude --listagentv import claude --session-id <uuid>
agentv eval evals.json --target copilot-logIf another tool owns the original evals.json, keep that file as the source and run it through the read adapter. Convert only when you need to edit the AgentV-native form.
Converted YAML
Section titled “Converted YAML”The converter writes comments that point to native AgentV features:
# Converted from Agent Skills evals.json# Agent Skills expected_output is treated as expected outcome/rubric context,# not as AgentV expected_output reference data.# AgentV features you can add:# - type: is-json, contains, regex for deterministic graders# - type: script for custom scoring scripts# - type: llm-rubric value arrays with weights and score ranges for rubrics# - Multi-turn conversations via input message arrays# - Multiple assertions with weighted scoring# - Workspace isolation with repos and hooks
tags: skill: "csv-analyzer"metadata: source_adapter: "agent-skills-evals-json"
tests: - id: "1" criteria: |- The top 3 months by revenue are November, September, and December. input: "Find the top 3 months by revenue." input_files: - "evals/files/sales.csv" # Promoted from evals.json expected_output, assertions[], and expectations[] # Replace with type: is-json, contains, or regex for deterministic checks assert: - name: agent-skills-criteria type: llm-rubric value: - id: "expected-outcome" outcome: "The top 3 months by revenue are November, September, and December." required: true - id: "assertion-1" outcome: "Output identifies November as the highest revenue month" required: trueFrom there you can add deterministic graders, workspace isolation, multi-turn inputs, target-specific configuration, or script graders in normal AgentV YAML.
When to Keep evals.json
Section titled “When to Keep evals.json”Keep evals.json when another Agent Skills tool owns that file or when you are packaging a skill for an ecosystem that expects it. Use AgentV’s read adapter directly:
agentv eval evals/evals.json --target claude --output .agentv/results/csv-analyzerUse AgentV YAML directly when AgentV owns the eval lifecycle.
Side-by-side
Section titled “Side-by-side”evals.json
Section titled “evals.json”{ "skill_name": "support-agent", "evals": [ { "id": 1, "prompt": "A customer says their order #12345 hasn't arrived after 2 weeks. Help them.", "expected_output": "An empathetic response that offers to track the order and provides next steps.", "assertions": [ "Response acknowledges the customer's frustration", "Response offers to look up order #12345", "Response provides clear next steps" ] } ]}EVAL.yaml
Section titled “EVAL.yaml”tests: - id: "1" input: | A customer says their order #12345 hasn't arrived after 2 weeks. Help them. criteria: | An empathetic response that offers to track the order and provides next steps. assert: - name: agent-skills-criteria type: llm-rubric value: - id: expected-outcome outcome: "An empathetic response that offers to track the order and provides next steps." required: true - id: assertion-1 outcome: "Response acknowledges the customer's frustration" required: true - id: assertion-2 outcome: "Response offers to look up order #12345" required: true - name: order-number type: contains value: "12345"The YAML version can mix rubric criteria with deterministic checks; the order-number assertion is instant and free.