Observability into what's being built — beyond the prep guide.
A terminal for the systems built for this campaign — what each is, how it works,
and where it stands. First system: the synthetic Data Factory.
Data Factory
design · no data yet
A general engine that manufactures high-quality synthetic data points —
one methodically-designed point per "lever pull" — and routes each into a human-review
queue. A different synthetic flow per data type shares one common pipeline shape. First
product: an industry benchmark (type TBD).
Status
Design
Recipes
2 defined
Next phase
P0 · GTFA lever
Generator
Claude Code /loop
Pipeline — one shape, per-type middle
Input Spec
governs one lever pull; defines the variety axes
→
Lever
type-specific gated loop — the only part that differs per type
→
Candidate + receipts
the artifact plus its full trace
→
Review Queue
human verdicts feed reflection
→
Approved Dataset
export
Everything outside the
Lever is shared and built once. Variety is enforced
structurally: variety axes → a fresh coordinate per pull → a novelty gate against the corpus
(so the LLM can't repeat itself).
Recipes (per-type flows)
prompt → ground-truth answerP0
Verifiable answer, graded programmatically. Hard problem: is the reference correct?