Skip to content

Objective Format Reference

~3 min read

Crucis reads YAML objective files with strict keys. Legacy keys are rejected at parse time.

Top-level keys

Key Required Type Default Description
name yes str Objective name (used as task name for single-task objectives)
description yes str What the function/module should do
signature no str Callable signature hint (e.g. add(a: int, b: int) -> int)
train_evals no list [] Visible evaluation cases shown to generation and critic agents
holdout_evals no list [] Hidden evaluation cases used only in final verification
tests_constraint_profile no str default Profile name from constraints/profiles.yaml for generated tests (constraints auto-classified as required or advisory)
implementation_constraint_profile no str default Profile name for implementation code (constraints auto-classified as required or advisory)
behaviors no list[str] [] Expected behavioral properties (e.g. "idempotent", "thread-safe")
target_files no list[str] [] Files the implementation agent should write
context_files no list[str] [] Existing files injected into generation and evaluation prompts
existing_tests no list[str] [] Test files run as a regression gate during evaluation
tasks no list [] Multi-task entries (see below)
verification_granularity no str task task or objective

Task keys

Each entry in tasks supports:

Key Required Type Default Description
name yes str Task name (must be a valid Python identifier)
description no str "" Task description (falls back to top-level)
signature no str Task signature (falls back to top-level)
behaviors no list[str] [] Task-specific behavioral properties
train_evals no list falls back Task-specific train evals
holdout_evals no list falls back Task-specific holdout evals
tests_constraint_profile no str falls back Task-specific test constraint profile
implementation_constraint_profile no str falls back Task-specific implementation constraint profile
target_files no list[str] falls back Task-specific target files
context_files no list[str] [] Task-specific context files (merged with top-level)
existing_tests no list[str] [] Task-specific regression tests (merged with top-level)

Inheritance: Task-level fields override top-level fields when set. context_files and existing_tests are merged (both levels are included).

Eval schema

Both train_evals and holdout_evals use the same format:

train_evals:
  - input: "(1, 2)"
    output: "3"
  • input — Python expression representing call arguments as a tuple (e.g. "(1, 2)" means func(1, 2))
  • output — Python expression representing the expected return value

Auto-holdout: If you only provide examples (or train_evals) without a holdout_evals key, Crucis automatically splits the last ~20% as holdout evals. To provide explicit holdout evals, add a holdout: key. To opt out of auto-holdout, set holdout: [].

Holdout evals have stricter requirements:

  • Both input and output are required
  • Values must be strings
  • Expressions must parse as valid Python eval expressions
  • The raw key is not allowed

Verification granularity

Controls how Crucis evaluates implementation correctness:

  • task (default) — each task's tests run independently; all must pass
  • objective — all tests run as a single batch
verification_granularity: task

Use task for multi-task objectives to get per-task failure feedback. Use objective when tasks are tightly coupled.

Minimal example

name: add
description: Add two integers.
signature: "add(a: int, b: int) -> int"
target_files:
  - "src/solution.py"
train_evals:
  - input: "(1, 2)"
    output: "3"
holdout_evals:
  - input: "(100, 23)"
    output: "123"

Multi-task example

name: math_utils
description: Basic math utilities.
target_files:
  - "src/math_utils.py"
tests_constraint_profile: recommended
implementation_constraint_profile: recommended
verification_granularity: task

tasks:
  - name: add
    description: Return the sum of two integers.
    signature: "add(a: int, b: int) -> int"
    train_evals:
      - input: "(1, 2)"
        output: "3"
  - name: multiply
    description: Return the product of two integers.
    signature: "multiply(a: int, b: int) -> int"
    train_evals:
      - input: "(3, 4)"
        output: "12"