AI policies don't tell you who can think.The Constraint does.

AI-in-bounds assessment. Same caps. Exportable evidence.

Role pack·Forward Deployed Engineer

Run a Pilot Watch a sample run

[01]Sample run

// What you review

See a candidate in the arena.

One module, fixed caps—then the evidence a hiring manager actually reads.

[02]Role pack

// AI-native hiring

Forward Deployed Engineer

ai-forward-engineer·AI Native / FDE

TypeScript/JavaScript, React Product UI, agent-assisted delivery, and debugging under constraints—a full assessment for AI-native roles.

~2–3h total · sequential modules · same constraints per candidate

MODEL_LOCK · TOKEN_BUDGET · CONTEXT_CAP · GOALS_VERIFY

00Live

TS/JS Core Logic

code-workbench · ~60m

Objective

Prove they can plan, implement, and verify TypeScript/JavaScript under fixed model and token caps—not just paste until it compiles.

01Live

React Product UI

ui-workbench · ~55m

Objective

Ship a product UI with AI in bounds: streaming interfaces, agent controls, and recovery when the first pass fails.

02Planned

Agent workflow

web_skill

Objective

Orchestrate an agent-assisted workflow end to end—scope, tool use, and a verifiable outcome.

03Planned

Debugging

code-workbench

Objective

Diagnose and fix a failing system under the same constraints: isolate the fault, recover, and leave evidence.

Clone a pack → send candidate links → review traces and output. One bar per opening.

This is the Forward Deployed Engineer pack—ready to run. Pilots can use it as-is or map another role as the catalog grows.

[03]Data annotation

// labeling · RLHF · eval

Throughput doesn't hire judgment.

Annotation companies hire at volume. Speed tests and rubric quizzes still miss who can think under model pressure—and leave a trail your QA leads can actually review.

Reviewers · AI trainers · eval ops · technical annotators

Same bar, every candidate

Fixed caps and the same arena for every hire—so you compare judgment, not who got a longer prompt or a better tool tip.

Evidence that maps to QA

Exportable traces: how they interpret instructions, recover from bad first passes, and whether the output would survive your review ladder.

Built for data-quality roles

Not another coding screen. Assess the people who decide label quality, preference ranking, and eval pass/fail—under real AI constraints.

Pilots scoped to your ladder · packs adapted to your domains.

Talk annotation pilot →

[04]The pilot

// How it runs

Live in days, not quarters.

Limited pilot slots. Scoped with your eng team.

01
Intro call & req scoping
02
Pick a pack (e.g. FDE) + adjust
03
Candidate links & trace review

Limited pilot slots · scoped in days, not quarters.

[05]FAQ

// Before the pilot

Questions teams ask first.

Yes—that's the point. AI is in-bounds under fixed constraints. You measure how they plan, prompt, recover, and ship—not whether they hid a tab.

Turn off the blockers. Turn on the arena.

Run a Pilot