Background Optimizer¶

~3 min read

Experimental: The optimizer is disabled by default. Enable it by adding optimizer: enabled: true to .crucis/settings.yaml. See Enabling the optimizer below.

Crucis includes a background policy optimizer powered by GEPA that improves prompt steering over time. After each fit or evaluate run, Crucis queues an optimization job that scores candidate policies against a baseline using black-box evaluation.

How It Works¶

After crucis run completes, an optimization job is queued.
A detached worker process picks up the job.
The worker builds examples from the objective and checkpoint.
GEPA generates candidate policies (prompt directives for generation, adversary, and evaluation agents).
Each candidate is scored by running the evaluation step in an isolated workspace.
If the candidate beats the baseline by min_score_delta, it's marked ready for promotion.

Policy Structure¶

An optimizer policy steers Crucis prompts at three stages:

# .crucis/optimizer/active_policy.yaml
repository_skill: "Context about the repository and coding patterns..."
generation_directives: "When generating train suites, focus on..."
adversary_directives: "When reviewing tests adversarially, look for..."
evaluation_directives: "When implementing code, prefer..."

Each field is injected into the corresponding prompt builder. Fields are capped at 20,000 characters.

Directory Layout¶

.crucis/
  settings.yaml              # Optimizer settings (see below)
  optimizer/
    active_policy.yaml       # Current active policy
    status.json              # Optimizer state and metrics
    worker.lock              # Prevents concurrent workers
    queue/
      <job_id>.json          # Pending optimization jobs
    runs/
      <run_id>/
        candidate_policy.yaml  # Candidate from this run
        result.json            # Outcome metrics
        report.md              # Human-readable summary

Job Lifecycle¶

direction: down

start: fit/evaluate completes

queue: Queue job

worker: Spawn detached worker

lock: Acquire worker.lock

examples: Build examples {
  tooltip: From objective + checkpoint
}

split: Split train/validation

gepa: Run GEPA optimize_anything()

score: Score baseline vs candidate

gate: "candidate >= baseline + min_score_delta?" {
  shape: diamond
}

ready: candidate_ready = true
skip: Skip (no improvement)

promote: Promote or wait {
  tooltip: "Manual: crucis promote | Auto: immediate"
}

start -> queue -> worker -> lock -> examples -> split -> gepa -> score -> gate
gate -> ready: yes
gate -> skip: no
ready -> promote

Scoring¶

Each example is scored by running the evaluation step in a temporary isolated workspace:

final_score = pass_weight * correctness + speed_weight * speed

correctness: 1.0 if all tests pass, 0.0 otherwise
speed: max(0.0, 1.0 - duration_sec / timeout_sec)
Default weights: pass_weight=0.9, speed_weight=0.1

Promotion requires: - candidate_mean_score >= baseline_mean_score + min_score_delta - candidate_mean_correctness >= baseline_mean_correctness (no regression)

Promotion¶

Manual mode (default)¶

After the optimizer marks a candidate as ready:

# Check status
crucis status

# Promote the winning candidate
crucis promote --run-id <run_id>

The status command (alias: summary) shows optimizer status including whether a candidate is ready and its run ID.

Auto mode¶

Set promotion_mode: auto in .crucis/settings.yaml. Winning candidates are promoted immediately without user intervention.

Automation-Friendly Commands¶

Crucis exposes machine-readable outputs for optimizer workflows:

# One-shot foreground worker drain
crucis optimizer-worker --workspace . --json

# Continuous worker loop
crucis optimizer-worker --workspace . --loop

# Scriptable optimizer + checkpoint status
crucis status --json

Settings¶

All optimizer settings live in .crucis/settings.yaml under the optimizer key:

Setting	Default	Description
`enabled`	`false`	Enable/disable background optimization
`max_metric_calls`	`24`	Max scoring evaluations per run
`reflection_lm`	`openai/gpt-5.2`	LLM used for GEPA reflection
`train_split_ratio`	`0.7`	Train/validation split for examples
`max_examples_per_run`	`24`	Max examples per optimization run
`evaluator_timeout_sec`	`180`	Timeout for each evaluation attempt
`pass_weight`	`0.9`	Weight for correctness in scoring
`speed_weight`	`0.1`	Weight for speed in scoring
`min_score_delta`	`0.01`	Minimum improvement to promote
`promotion_mode`	`manual`	`manual` or `auto`
`queue_max_jobs`	`64`	Max jobs in queue
`capture_stdio`	`true`	Capture agent stdout/stderr

Example .crucis/settings.yaml:

schema_version: 1
optimizer:
  enabled: true
  promotion_mode: manual
  pass_weight: 0.9
  speed_weight: 0.1
  min_score_delta: 0.01
  evaluator_timeout_sec: 180

Enabling the Optimizer¶

The optimizer is experimental and disabled by default. To enable it, add the optimizer block to .crucis/settings.yaml:

schema_version: 1
optimizer:
  enabled: true

You can also scaffold the settings file with crucis init --with-settings and then add the optimizer block manually.

Disabling the Optimizer¶

Set enabled: false in .crucis/settings.yaml, or set the environment variable to skip optimization entirely:

CRUCIS_DISABLE_BACKGROUND_OPTIMIZER=1 crucis run

Policy Override¶

For testing, you can override the active policy with an environment variable:

export CRUCIS_POLICY_OVERRIDE_JSON='{"repository_skill":"...","generation_directives":"...","adversary_directives":"...","evaluation_directives":"..."}'

This takes precedence over active_policy.yaml.