Classification and Entity Extraction Evaluators
Agenta offers several evaluators to assess model performance in classification and entity extraction tasks.
Exact Match
The Exact Match evaluator determines if the model's output precisely matches the expected answer.
How It Works
This evaluator compares the generated output with the correct answer stored in the test set. It returns a boolean value: true for an exact match, and false otherwise.
Configuration
| Parameter | Type | Description |
|---|---|---|
correct_answer_key | String | The column name in the test set containing the correct answer |
Contains JSON
The Contains JSON evaluator checks if the model's output contains a valid JSON structure.
How It Works
This evaluator attempts to parse the output as JSON. It returns true if a valid JSON structure is found within the output, and false otherwise.
JSON Field Match (Deprecated)
The JSON Field Match evaluator has been replaced by JSON Multi-Field Match. The new evaluator supports multiple fields, nested paths, and provides per-field scoring. Existing configurations will continue to work, but we recommend migrating to the new evaluator.
JSON Multi-Field Match
The JSON Multi-Field Match evaluator compares multiple fields between two JSON objects and reports a score for each field. This evaluator is ideal for entity extraction tasks where you need to validate that specific fields (like name, email, or address) match the expected values.
How It Works
The evaluator parses both the model output and the ground truth as JSON. It then compares each configured field path and produces:
- A score for each field (1 if matched, 0 if not matched)
- An aggregate score showing the percentage of fields that matched
For example, if you configure fields ["name", "email", "phone"] and the model gets name and email correct but phone wrong, you will see:
name: 1.0email: 1.0phone: 0.0aggregate_score: 0.67
Path Formats
You can specify field paths in three formats:
| Format | Example | Description |
|---|---|---|
| Dot notation | user.address.city | Simple nested access. Use numeric indices for arrays: items.0.name |
| JSON Path | $.user.address.city | Standard JSON Path syntax. Supports array indexing: $.items[0].name |
| JSON Pointer | /user/address/city | RFC 6901 standard. Use numeric segments for arrays: /items/0/name |
Dot notation is recommended for most cases. JSON Path and JSON Pointer are useful when you need compatibility with other tools.
Configuration
| Parameter | Type | Description |
|---|---|---|
fields | String[] | List of field paths to compare (e.g., ["name", "user.email"]) |
correct_answer_key | String | The column name in the test set containing the expected JSON |
Example
Ground truth (in the correct_answer column):
{
"name": "John Doe",
"email": "john@example.com",
"address": {
"city": "New York",
"zip": "10001"
}
}
Model output:
{
"name": "John Doe",
"email": "jane@example.com",
"address": {
"city": "New York",
"zip": "10002"
}
}
Configured fields: ["name", "email", "address.city", "address.zip"]
Results:
| Field | Score |
|---|---|
name | 1.0 |
email | 0.0 |
address.city | 1.0 |
address.zip | 0.0 |
aggregate_score | 0.5 |
JSON Diff Match
How It Works
This evaluator compares the output JSON with the correct answer JSON. Here's a detailed breakdown of the process:
-
JSON Flattening: Both the output and correct answer JSONs are flattened into single-level dictionaries.
For example, a JSON like:
{
"name": "John",
"address": {
"city": "New York",
"zip": "10001"
},
"hobbies": ["reading", "swimming"]
}Would be flattened to:
{
"name": "John",
"address.city": "New York",
"address.zip": "10001",
"hobbies.0": "reading",
"hobbies.1": "swimming"
} -
Comparison: The evaluator compares these flattened structures, checking each key-value pair while considering configuration options (like case sensitivity and schema-only comparison).
-
Scoring: For each matching key-value pair, a score of 1 is assigned. The final score is the average of all comparisons, resulting in a value between 0 and 1.
Configuration
| Parameter | Type | Description |
|---|---|---|
compare_schema_only | Boolean | If true, only compares key names and types, ignoring values |
predict_keys | Boolean | If true, only considers keys present in the ground truth |
case_insensitive_keys | Boolean | If true, treats keys as case-insensitive |
correct_answer_key | String | The column name in the test set containing the correct answer JSON |