AI policies don't tell you who can think.The Constraint does.

A technical skills benchmark for the AI era. Score how people think with AI—not whether they hid a tab.

Shortlists backed by evidence, not vibes.

[01]The shift// Why constraints reveal skill

You can't prevent AI cheating. You can design it out.

In an unconstrained environment, AI compensates for gaps—signal becomes noise. Under fixed limits, only candidates who plan, prioritize, and recover actually win.

  • 01

    Unlimited compute hides gaps

    When anyone can brute-force prompts, you can't tell who understands the problem. Caps force tradeoffs—and tradeoffs reveal judgment.

  • 02

    Measurement beats prevention

    Proctoring measures evasion. The arena measures output and process: what they shipped, how they got there, and what broke along the way.

  • 03

    Same constraints, fair compare

    Every candidate gets the same model, token budget, and context window. You're ranking skill—not who had a better cheat sheet.

[02]How it works// Four steps to a defensible shortlist

Put every candidate in the same arena. Review the run.

You set the role challenge once. They build with AI inside fixed limits. You get a dossier you can share with engineering and the client.

  1. 01Scope

    Pick the challenge for the req

    We help you map the role to one arena task—agent workflow, fix under a token cap, or ship a feature with a pinned model. One bar per opening.

  2. 02Run

    Candidates work with AI—on your terms

    No blockers. Same context window, token budget, and model for everyone. What varies is skill: planning, prompting, recovery, and what they ship.

  3. 03Review

    You read the trace, not a screenshot

    Hiring managers and recruiters get prompts, tool calls, failures, retries, and final output. See who thinks under constraints—not who gamed a test.

  4. 04Place

    Shortlist with evidence

    Rank side-by-side, forward traces to the client, and move to onsite only when the run backs the hire. Fewer false positives. Faster debriefs.

// Sample trace

> challenge: agent-workflow · token_cap: 64k · model: pinned

> step 03: tool_call failed · recovery in 2 retries

> step 07: output shipped · rubric: recovery ★★★★☆

> export: ready · share_with: eng-lead, client-ta

[03]Sample dossier// Deliverables per candidate
// Sample dossier excerptarena-run · candidate-0042

> challenge: agent-workflow · token_cap: 64k · model: pinned

> step 03: tool_call failed · recovery in 2 retries

> step 07: output shipped · rubric: recovery ★★★★☆

> export: ready · share_with: eng-lead, client-ta

Trace

Full arena replay

Share a step-by-step record with engineering leads and clients: what they asked the model, what broke, and how they fixed it.

Measure

Skill under constraints

Score token discipline, tool use, and recovery—not whether they alt-tabbed. AI is in-bounds; sloppy work isn't.

Compare

Ranked shortlist

Same task, same caps. Stack five finalists and pick on evidence—recruiters can defend the slate in one email.

Request full sample on pilot call
[04]Why not the usual stack// Measure vs. prevent

Prevention is an arms race. Measurement is a standard.

Help your team argue for evidence-based assessment—not another blocker subscription.

Measures real skill with AI in bounds

Proctoring / AI detection
No
Live coding (no AI)
Partial
Generic take-home
No
The Constraint
Yes

Comparable across candidates

Proctoring / AI detection
No
Live coding (no AI)
Yes
Generic take-home
Partial
The Constraint
Yes

Client-ready evidence

Proctoring / AI detection
No
Live coding (no AI)
No
Generic take-home
Partial
The Constraint
Yes

Scales async

Proctoring / AI detection
Partial
Live coding (no AI)
No
Generic take-home
Yes
The Constraint
Yes

Arms race with AI cheating

Proctoring / AI detection
Yes
Live coding (no AI)
N/A
Generic take-home
Yes
The Constraint
No
[05]Who it's for// Hiring, trust & FAQ

If you own the req or the shortlist, this is for you.

Replace blocker tools and anti-cheat add-ons with one assessment your team and your clients can read in minutes.

Hiring Managers

Compare five finalists on the same task—with traces, not gut feel.

Technical Recruiters

Defend the shortlist to the client in one email with artifacts.

Staffing & RPO

Back every placement with a dossier the client can audit.

// Trust

Fairness

Same arena, same caps

Identical model profile, token budget, and context window per challenge. What differs is the candidate—not the playing field.

Pilot

Q2 cohort

Limited pilot slots. Scoped challenge in days. Direct support for challenge design and rubric alignment with your eng team.

Questions teams ask before the pilot.

Yes—that's the point. AI is in-bounds under fixed constraints. You measure how they plan, prompt, recover, and ship—not whether they hid a tab.

Turn off the blockers. Turn on the arena.

  1. 01Intro call & req scoping
  2. 02Challenge + rubric in days
  3. 03Candidate links & trace review
Run a Pilot
The Constraint for Companies | Technical Hiring in the Arena