Skip to content

Agent Skills evals.json Adapter

Agent Skills uses evals.json for lightweight skill-scoped datasets: a prompt, optional expected outcome, optional fixture files, and natural-language assertions or expectations.

AgentV treats evals.json as a built-in read adapter input, not as a native core eval format. Detection requires a top-level skill_name string and evals array, so arbitrary .json files are still rejected. You can run a detected Agent Skills file directly:

Terminal window
agentv eval evals.json --target claude

Or convert it to AgentV EVAL YAML when you want to edit the generated suite:

Terminal window
agentv convert evals.json --out EVAL.yaml
agentv eval EVAL.yaml --target claude

This keeps AgentV’s core authoring formats focused on YAML, JSONL, and TypeScript while still making Agent Skills suites easy to onboard. The boundary is the same adapter layer used for external datasets: external schema in, AgentV-native cases at runtime.

Create evals.json:

{
"skill_name": "csv-analyzer",
"evals": [
{
"id": 1,
"prompt": "I have a CSV of monthly sales data in evals/files/sales.csv. Find the top 3 months by revenue.",
"expected_output": "The top 3 months by revenue are November ($22,500), September ($20,100), and December ($19,400).",
"files": ["evals/files/sales.csv"],
"assertions": [
"Output identifies November as the highest revenue month",
"Output includes exactly 3 months",
"Revenue figures are included for each month"
]
}
]
}

Run it directly or convert it first:

Terminal window
agentv eval evals.json --target claude
agentv convert evals.json --out EVAL.yaml
agentv eval EVAL.yaml --target claude

The --target flag selects the agent harness. The agent evaluates itself; skills load through the normal agent runtime.

Run a detected Agent Skills file directly:

Terminal window
agentv eval evals.json --target claude --output .agentv/results/csv-analyzer

Import the definition into editable AgentV YAML without running a target:

Terminal window
agentv convert evals.json --out EVAL.yaml

Prepare one converted case for a human or external agent without running the target provider:

Terminal window
agentv prepare EVAL.yaml --test-id "1" --target claude --out .agentv/prepared/csv-analyzer-1

agentv import is reserved for agent session transcripts and selected external datasets such as Hugging Face. Agent Skills evals.json uses the eval read adapter for execution and convert for definition import.

The read adapter promotes evals.json fields into AgentV cases. The converter writes the same mapping to YAML:

evals.jsonEVAL.yaml outputNotes
promptinputWritten as prompt text
expected_outputcriteria + llm-rubric criterionAgent Skills uses this as expected outcome/rubric context, not AgentV passive reference-data expected_output
assertions[]llm-rubric criteriaStrings are grouped into one rubric with one criterion per assertion
expectations[]llm-rubric criteriaSame handling as assertions[]
files[]input_filesResolved relative to the evals.json file
skill_nametags.skill, descriptionUsed for suite grouping
ididConverted to a string

The generated llm-rubric assertion emits per-criterion grading rows in AgentV artifacts just like other assertion entries. The converted YAML is the editable source of truth after conversion.

evals.json file paths map to AgentV input_files:

tests:
- id: "1"
input_files:
- evals/files/sales.csv
input: "Analyze the sales data."

Use workspace.repos when the eval should materialize a repository before those fixture paths are read.

Grade existing agent sessions offline by importing transcripts and running the adapter input or converted YAML:

Terminal window
agentv import claude --list
agentv import claude --session-id <uuid>
agentv eval evals.json --target copilot-log

If another tool owns the original evals.json, keep that file as the source and run it through the read adapter. Convert only when you need to edit the AgentV-native form.

The converter writes comments that point to native AgentV features:

# Converted from Agent Skills evals.json
# Agent Skills expected_output is treated as expected outcome/rubric context,
# not as AgentV expected_output reference data.
# AgentV features you can add:
# - type: is-json, contains, regex for deterministic graders
# - type: script for custom scoring scripts
# - type: llm-rubric value arrays with weights and score ranges for rubrics
# - Multi-turn conversations via input message arrays
# - Multiple assertions with weighted scoring
# - Workspace isolation with repos and hooks
tags:
skill: "csv-analyzer"
metadata:
source_adapter: "agent-skills-evals-json"
tests:
- id: "1"
criteria: |-
The top 3 months by revenue are November, September, and December.
input: "Find the top 3 months by revenue."
input_files:
- "evals/files/sales.csv"
# Promoted from evals.json expected_output, assertions[], and expectations[]
# Replace with type: is-json, contains, or regex for deterministic checks
assert:
- name: agent-skills-criteria
type: llm-rubric
value:
- id: "expected-outcome"
outcome: "The top 3 months by revenue are November, September, and December."
required: true
- id: "assertion-1"
outcome: "Output identifies November as the highest revenue month"
required: true

From there you can add deterministic graders, workspace isolation, multi-turn inputs, target-specific configuration, or script graders in normal AgentV YAML.

Keep evals.json when another Agent Skills tool owns that file or when you are packaging a skill for an ecosystem that expects it. Use AgentV’s read adapter directly:

Terminal window
agentv eval evals/evals.json --target claude --output .agentv/results/csv-analyzer

Use AgentV YAML directly when AgentV owns the eval lifecycle.

{
"skill_name": "support-agent",
"evals": [
{
"id": 1,
"prompt": "A customer says their order #12345 hasn't arrived after 2 weeks. Help them.",
"expected_output": "An empathetic response that offers to track the order and provides next steps.",
"assertions": [
"Response acknowledges the customer's frustration",
"Response offers to look up order #12345",
"Response provides clear next steps"
]
}
]
}
tests:
- id: "1"
input: |
A customer says their order #12345 hasn't arrived after 2 weeks. Help them.
criteria: |
An empathetic response that offers to track the order and provides next steps.
assert:
- name: agent-skills-criteria
type: llm-rubric
value:
- id: expected-outcome
outcome: "An empathetic response that offers to track the order and provides next steps."
required: true
- id: assertion-1
outcome: "Response acknowledges the customer's frustration"
required: true
- id: assertion-2
outcome: "Response offers to look up order #12345"
required: true
- name: order-number
type: contains
value: "12345"

The YAML version can mix rubric criteria with deterministic checks; the order-number assertion is instant and free.