Glossary¶
Term definitions for Crucis concepts.
Concepts¶
Scaffold¶
The workspace initialization created by crucis init. By default generates only objective.yaml and src/solution.py (for new-project scaffolds). Use --with-profiles to also create constraints/profiles.yaml and --with-settings to also create .crucis/settings.yaml. Existing codebases are auto-detected or forced via --existing-codebase.
Plan¶
A structured generation plan created by crucis run --plan. Written to plan.md in the workspace root. Guides the test generation agent with per-task file structure, test categories, and constraint compliance instructions.
Objective¶
A YAML file defining what Crucis should build. Contains a name, description, train evals, holdout evals, constraints, target files, and optional behaviors. Can be single-task or multi-task. See Objective Format Reference.
Task¶
One unit of work within an objective. Each task has its own name, signature, evals, constraint profile, and optional behaviors. Multi-task objectives list tasks under the tasks key.
Behaviors¶
An optional list[str] on objectives and tasks describing expected behavioral properties (e.g. "idempotent", "thread-safe", "deterministic"). Used to guide test generation and adversarial review.
Train Evals¶
Visible input/output pairs used during test generation and adversarial review. Agents see these evals and use them to build pytest suites.
Holdout Evals¶
Hidden input/output pairs never shown to any agent. Used only during final verification to ensure the implementation generalizes beyond known inputs. Failures are reported as counts only -- no payloads are leaked.
Auto-Holdout¶
When only examples (or train_evals) are provided without a holdout_evals key, Crucis automatically splits the last ~20% as holdout evals. To provide explicit holdout evals, add a holdout: key. To opt out of auto-holdout, set holdout: [].
Train Suite¶
A generated pytest file containing tests for a task. Built by the generation agent from the objective, constraints, and train evals. Must pass syntax validation and constraint checks before approval.
Constraint Profile¶
A named set of constraints defined in constraints/profiles.yaml. Constraints are listed flat and auto-classified into required (blocking) or advisory based on the field type. The old nested primary:/secondary: format still works. Referenced by name in the objective YAML via tests_constraint_profile and implementation_constraint_profile. See Constraints Reference.
Required Constraints¶
Hard gates -- if violated, the train suite is rejected and regenerated. Violations are fed back to the generation prompt. Most constraints are required by default.
Advisory Constraints¶
Soft gates -- violations are reported but don't block approval. Checked only after required constraints pass. Advisory fields: require_docstrings, no_print_statements, no_debugger_statements, no_global_state, require_type_annotations, no_nested_imports, no_star_imports, max_local_variables.
Adversarial Review¶
The critic agent analyzes a train suite for weaknesses. Returns a JSON report with attack vectors, generalization gaps, and suggested probe tests.
Cheating Probe¶
A deliberately cheating implementation generated to test whether the train suite can be passed by hardcoding, fingerprinting inputs, or using lookup tables. If the probe passes, the tests have gaps.
Checkpoint¶
A JSON file (.checkpoint.json) that persists progress across runs. Tracks each task's state, approved train suite source, adversarial report, and evaluation_passed status. Allows resuming interrupted runs. Use crucis run --reset to clear the entire checkpoint or --reset-task <name> to clear specific tasks.
Curriculum¶
A markdown file generated during evaluation containing the objective metadata, target files, test paths, and per-task details. This is the primary context sent to the implementation agent.
Sandbox¶
Docker-isolated pytest execution. Tests run inside a container to prevent generated code from affecting the host. Falls back to host pytest if Docker is unavailable.
Verification Granularity¶
Controls how tests are verified: task (default) runs each task's tests independently; objective runs all tasks' tests together in a single pytest invocation.
Background Optimizer¶
An experimental GEPA-powered system that improves prompt steering over time. Disabled by default; enable with optimizer: enabled: true in .crucis/settings.yaml. After successful evaluation, a background worker scores candidate policies and promotes winners. See Background Optimizer.
Policy¶
An optimizer policy that steers Crucis prompts. Contains four fields: repository_skill, generation_directives, adversary_directives, evaluation_directives. Each is injected into the corresponding prompt builder.
Promotion¶
Replacing the active optimizer policy with a winning candidate. Can be manual (crucis promote) or automatic (promotion_mode: auto).
Task States¶
Each task progresses through a state machine:
| State | Description |
|---|---|
pending |
Task not yet started |
train_suite_generated |
Tests generated by LLM, not yet reviewed |
train_suite_approved |
Tests approved by user or auto-approved |
adversarially_reviewed |
Adversary attacked, probe ran |
complete |
Ready for evaluation |
Agents¶
| Role | Default Agent | Default Model | Purpose |
|---|---|---|---|
| Generation | claude |
claude-opus-4-6 |
Generate pytest train suites |
| Critic | claude |
claude-opus-4-6 |
Adversarial review of tests |
| Implementation | codex |
gpt-5.3-codex |
Write code to pass tests |
File Structure¶
| Path | Purpose |
|---|---|
objective.yaml |
Objective definition (always created by crucis init) |
src/solution.py |
Implementation target (created by crucis init for new projects) |
plan.md |
Structured generation plan |
.checkpoint.json |
Task progress and train suite sources |
curriculum.md |
Generated evaluation guide |
constraints/profiles.yaml |
Constraint profile definitions (created with --with-profiles) |
tests/test_<task>.py |
Generated train suites |
src/<target>.py |
Implementation targets |
.crucis/settings.yaml |
Runtime settings (created with --with-settings) |
.crucis/optimizer/ |
Optimizer state, policies, queue, and runs (when optimizer enabled) |
.crucis/logs/ |
Structured JSONL run logs |