Evaluator
Assess content quality using customizable evaluation metrics
The Evaluator block allows you to assess the quality of content using customizable evaluation metrics. This is particularly useful for evaluating AI-generated text, ensuring outputs meet specific criteria, and building quality-control mechanisms into your workflows.
The Evaluator block images will be added soon. Currently, the block uses standard evaluation metrics to assess content quality.
Overview
The Evaluator block utilizes LLMs to objectively evaluate content based on custom metrics you define. This is especially useful for:
- Assessing the quality of AI-generated content
- Evaluating responses against specific criteria
- Creating scoring frameworks for different types of content
- Building objective feedback loops in your workflows
Configuration Options
Evaluation Metrics
Define custom metrics to evaluate content against. Each metric includes:
- Name: A short identifier for the metric
- Description: A detailed explanation of what the metric measures
- Range: The numeric range for scoring (e.g., 1-5, 0-10)
Example metrics:
Content
The content to be evaluated. This can be:
- Directly provided in the block configuration
- Connected from another block's output (typically an Agent block)
- Dynamically generated during workflow execution
Model Selection
Choose an LLM provider to perform the evaluation:
- OpenAI (GPT-4o, o1, o3-mini)
- Anthropic (Claude 3.7 Sonnet)
- Google (Gemini 2.5 Pro, Gemini 2.0 Flash)
- Groq, Cerebras
- Ollama Local Models
- And more
The chosen model should have strong reasoning capabilities to provide accurate evaluations.
API Key
Your API key for the selected LLM provider. This is securely stored and used for authentication.
How It Works
- The Evaluator block takes the provided content and your custom metrics
- It generates a specialized prompt that instructs the LLM to evaluate the content
- The prompt includes clear guidelines on how to score each metric
- The LLM evaluates the content and returns numeric scores for each metric
- The Evaluator block formats these scores as structured output for use in your workflow
Inputs and Outputs
Inputs
- Content: The text or structured data to evaluate
- Metrics: Custom evaluation criteria with scoring ranges
- Model Settings: LLM provider and parameters
Outputs
- Content: A summary of the evaluation
- Model: The model used for evaluation
- Tokens: Usage statistics
- Metric Scores: Numeric scores for each defined metric
Example Usage
Here's an example of how an Evaluator block might be configured for assessing customer service responses:
Best Practices
- Use specific metric descriptions: Clearly define what each metric measures to get more accurate evaluations
- Choose appropriate ranges: Select scoring ranges that provide enough granularity without being overly complex
- Connect with Agent blocks: Use Evaluator blocks to assess Agent block outputs and create feedback loops
- Use consistent metrics: For comparative analysis, maintain consistent metrics across similar evaluations
- Combine multiple metrics: Use several metrics to get a comprehensive evaluation