Starter Template

This is an Opencode draft

Copy this template to start your first evaluation config.

Basic Template

Save as promptfooconfig.yaml:

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: "My first evaluation - [DESCRIBE WHAT YOU'RE TESTING]"

# The prompt template - {{question}} will be replaced with actual questions
prompts:
  - "{{question}}"

# The LLM(s) to test
providers:
  - id: file://providers/owui.js
    label: "DEV - [MODEL NAME]"
    config:
      apiEndpointEnvironmentVariable: "DEV_OWUI_ENDPOINT"
      apiKeyEnvironmentVariable: "DEV_OWUI_API_KEY"
      model: "[MODEL-NAME]"
      outputSources: true

# Settings that apply to ALL tests
defaultTest:
  options:
    # Provider for assertions (LLM-as-judge)
    # Standard can be found at https://github.com/promptfoo/promptfoo/blob/main/src/prompts/grading.ts
    # Simple rubric prompt inspired from the multilingual rubric eval prompt: https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded/llm-rubric/#non-english-evaluation
    rubricPrompt: |
      [
        {
          "role": "system",
          "content": "Du vurderer om givne svar er fyldestgørende og korrekte i forhold til ideelle ekspert svar. Svar JSON format: {\"reason\": \"string\", \"pass\": boolean, \"score\": number}. ALLE begrundelser skal være på dansk."
        },
        {
          "role": "user", 
          "content": "<Givet svar>\n{{ output }}\n</Givet svar>\n\n<Ideelt svar>\n{{ rubric }}\n</Ideelt svar>"
        }
      ]
    provider:
      id: file://providers/owui.js
      config:
        apiEndpointEnvironmentVariable: "DEV_OWUI_ENDPOINT"
        apiKeyEnvironmentVariable: "DEV_OWUI_API_KEY"
        model: "AarhusAI-default"
    
    # Python environment for custom assertions
    config:
      pythonExecutable: .venv/bin/python

# Reusable assertion templates
assertionTemplates:
  # Example: Check for contact email
  contactEmail:
    type: regex
    value: '.*databeskyttelse@mbu\.aarhus\.dk.*'
    transform: "output.text"

# Your test cases
tests:
  # Test 1: Simple containment
  - description: "[DESCRIBE WHAT THIS TEST CHECKS]"
    vars:
      question: "[YOUR QUESTION HERE]"
    assert:
      - type: contains
        value: "[EXPECTED TEXT]"
    metadata:
      category: "[CATEGORY]"
      origin: "[real|synthetic]"

  # Test 2: Multiple acceptable answers
  - description: "Test answer contains correct date"
    vars:
      question: "[YOUR QUESTION HERE]"
    assert:
      - type: icontains-any
        value:
          - "option 1"
          - "option 2"
        metric: answer
    metadata:
      category: "answer"
      origin: "synthetic"

  # Test 3: Document retrieval check
  - description: "Verify correct document retrieved"
    vars:
      question: "[YOUR QUESTION HERE]"
    assert:
      - type: icontains
        value: "[DOCUMENT-NAME.md]"
        transform: 'output.sources.map(o => o.reference).join(" ")'
        metric: docRetrival
    metadata:
      category: "docRetrieval"
      origin: "real"

  # Test 4: Content retrieval check
  - description: "Verify correct content retrieved"
    vars:
      question: "[YOUR QUESTION HERE]"
    assert:
      - type: icontains-any
        value:
          - "key phrase 1"
          - "key phrase 2"
        transform: 'output.sources.map(o => o.content).join("\n\n")'
        metric: contentRetrieval
    metadata:
      category: "contentRetrieval"
      origin: "real"

  # Test 5: Refusal check (use template)
  - description: "Test that inappropriate question is refused"
    vars:
      question: "[YOUR QUESTION HERE]"
    assert:
      - $ref: "#/assertionTemplates/refuseToAnswer"
    metadata:
      category: "refusal"
      origin: "synthetic"
      refusal: yes

  # Test 6: LLM rubric (quality check)
  - description: "Test answer quality"
    vars:
      question: "[YOUR QUESTION HERE]"
    assert:
      - type: llm-rubric
        value: |
          [WRITE YOUR IDEAL ANSWER OR QUALITY CRITERIA HERE]
          This should be detailed and specific.
        transform: "output.text"
        weight: 0.1
        metric: answer
    metadata:
      category: "answer"
      origin: "real"

  # Test 7: Using custom Python assertion
  - description: "Test with custom assertion"
    vars:
      question: "[YOUR QUESTION HERE]"
    assert:
      - type: python
        value: file:///app/assertions/ja_nej_assertion.py
        transform: "output.text"
        config:
          expectedAnswerCategory: Bekræftende
        metric: answer
    metadata:
      category: "answer"
      origin: "synthetic"

Running Your First Eval

Save the file as promptfooconfig.yaml
Open terminal in VSCode
Navigate to your config directory:
```
cd eval-configs/my-use-case
```

Run the evaluation:

promptfoo eval --config promptfooconfig.yaml

Results will open in your browser

Customization Checklist

Before running, make sure to update:

description: - What you’re testing
providers: - Your model configuration
defaultTest.provider: - Judge model
assertionTemplates: - Your reusable patterns
tests: - Your actual test cases
All [PLACEHOLDERS] replaced with real values

Common Assertion Types Quick Reference

# Contains (case-sensitive)
- type: contains
  value: "exact text"

# Contains (case-insensitive)
- type: icontains
  value: "text"

# Any of these
- type: contains-any
  value:
    - "option1"
    - "option2"

# All of these
- type: contains-all
  value:
    - "required1"
    - "required2"

# Regex pattern
- type: regex
  value: '.*pattern.*'

# LLM quality check
- type: llm-rubric
  value: "Quality description"

# Reference template
- $ref: "#/assertionTemplates/name"

# Custom Python
- type: python
  value: file:///app/assertions/script.py
  config:
    param: "value"