API
Evaluate prompts, gate regression in CI, and serve production versions over HTTP. Open on every plan.
Authentication
Every request needs an API key in the Authorization header.
Authorization: Bearer pe_live_your_key_herePlans & quota
The Eval API is open on every plan. Each plan has a monthly managed quota (evals run on our infra). BYOK does not consume quota.
| Plan | Managed quota/mo | Modes |
|---|---|---|
| free | 10 | lint |
| basic | 30 | lint |
| pro | 75 | lint + full |
| team | 250 | lint + full |
| BYOK | unlimited (your key) | lint + full on any plan |
Endpoint
https://prompt-eval.com/api/v1/evalRequest body
json{
"prompt": "Your prompt text here",
"mode": "lint",
"language": "en"
}Parameters
| Parameter | Type | Req. | Default | Description |
|---|---|---|---|---|
| prompt | string | Yes | — | The prompt to evaluate (10–20,000 chars) |
| mode | string | No | "lint" | "lint" or "full" (full needs Pro/Team or BYOK) |
| language | string | No | "en" | "en" or "pt" — analysis language |
| min_score | integer | No | your threshold / 75 | Bar for the pass field |
| fail_on_conflict | boolean | No | false | Fail when a contradiction is detected |
| baseline_slug | string | No | — | Gate regression vs the slug's production version (Pro+) |
| max_drop | number | No | 0 | Tolerated score drop vs baseline |
| provider_key | string | No | — | BYOK — Anthropic key (prefer the X-Provider-Key header) |
Constraints
prompt minimum: 10 characters · prompt maximum: 20,000 characters
Response
mode "lint" (default)
json{
"mode": "lint",
"score": 74,
"dimensions": {
"clarity": 68,
"specificity": 81,
"structure": 77,
"robustness": 70
},
"conflicts": [
{ "a": "Be concise", "b": "Explain in full detail", "type": "contradiction" }
],
"benchmark": { "top_percent": 22, "total": 1840 },
"pass": false,
"min_score": 75,
"meta": {
"prompt_chars": 1240,
"evaluated_at": "2026-06-04T14:32:00Z",
"plan": "pro",
"byok": false,
"calls_month": 12,
"quota_month": 75,
"quota_remaining": 63
}
}Response fields
| Field | Type | Description |
|---|---|---|
| mode | string | "lint" or "full" (echoes the mode used) |
| score | integer | Overall quality score, 0–100 |
| dimensions.clarity | integer | How unambiguous the instruction is |
| dimensions.specificity | integer | How precisely the output is defined |
| dimensions.structure | integer | How logically the prompt is organized |
| dimensions.robustness | integer | How well it handles edge cases |
| conflicts | object[] | Detected conflicts: { a, b, type }. type: contradiction | tension | redundancy |
| benchmark | object | null | { top_percent, total } — percentile vs all evals; null until enough data |
| pass | boolean | CI gate verdict (see section) |
| min_score | integer | The threshold actually used |
| meta.plan | string | The key's plan |
| meta.byok | boolean | Whether it ran on your own key |
| meta.calls_month | integer | Managed calls this month |
| meta.quota_month | integer | Plan managed quota |
| meta.quota_remaining | integer | Managed quota left |
mode "full" — adds graph, problems, recommendations and improved prompt
json{
"mode": "full",
"score": 74,
"dimensions": { "clarity": 68, "specificity": 81, "structure": 77, "robustness": 70 },
"conflicts": [ /* ... */ ],
"benchmark": { "top_percent": 22, "total": 1840 },
"pass": false,
"min_score": 75,
"graph": { "nodes": [ /* ... */ ], "edges": [ /* ... */ ] },
"problems": {
"critical": ["Output format is underspecified."],
"warnings": ["Persona may conflict with the tone constraint."]
},
"recommendations": [
"Add an explicit output-format section with an example.",
"Reconcile the tone constraint with the persona."
],
"improved_prompt": "You are a customer support agent...",
"meta": { "plan": "pro", "byok": false, "calls_month": 13, "quota_month": 75, "quota_remaining": 62 }
}| Field | Type | Description |
|---|---|---|
| graph | object | { nodes, edges } — the prompt instruction graph |
| problems.critical | string[] | Critical problems |
| problems.warnings | string[] | Warnings |
| recommendations | string[] | Actionable improvements |
| improved_prompt | string | Rewritten version of the prompt |
| regression | object | Present only when baseline_slug is passed (see gate) |
CI gate (the pass field)
The pass field is the boolean verdict to use in CI. It is true when ALL of the conditions below hold:
Regression vs production
Pass baseline_slug to compare the current score against that slug's PRODUCTION version (Pro+, requires a slug + a production version). max_drop sets the tolerated drop (0 = any drop fails). The regression object is returned:
json"regression": {
"baseline": { "slug": "support-agent" },
"baseline_score": 82,
"current_score": 74,
"delta": -8,
"max_drop": 0,
"regressed": true
}Errors
| Status | Code | When it happens |
|---|---|---|
| 400 | Bad Request | prompt missing/short, invalid mode/language, malformed provider_key |
| 401 | Unauthorized | Missing/revoked API key, or BYOK key rejected |
| 402 | Quota Exhausted | Monthly managed quota used up (response carries an upgrade CTA) |
| 403 | Forbidden | mode "full" without Pro/Team and without BYOK |
| 404 | Not Found | baseline_slug has no prompt or no production version |
| 413 | Payload Too Large | Prompt exceeds 20,000 characters |
| 422 | Unprocessable | Baseline production version has no score yet |
| 429 | Too Many Requests | 5 req/min rate limit reached |
| 500 | Server Error | Eval failed — retry |
All errors return a JSON body:
json{ "error": "machine_code", "message": "Human-readable description." }BYOK — bring your own key
Send the X-Provider-Key header with an Anthropic key (sk-ant-...). Inference runs on your key: it consumes no managed quota, costs us nothing, and unlocks full mode on any plan (including free).
bash# BYOK: run on your own Anthropic key.
# No managed quota is consumed, and it unlocks "full" on any plan.
curl -X POST https://prompt-eval.com/api/v1/eval \
-H "Authorization: Bearer pe_live_your_key_here" \
-H "X-Provider-Key: sk-ant-your-anthropic-key" \
-H "Content-Type: application/json" \
-d '{ "prompt": "You are a customer support agent...", "mode": "full" }'Examples
bashcurl -X POST https://prompt-eval.com/api/v1/eval \
-H "Authorization: Bearer pe_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"prompt": "You are a customer support agent. Answer clearly and concisely. Always end by asking if there is anything else you can help with.",
"mode": "lint"
}'CI/CD integration
The recommended way is the official GitHub Action — it builds the request, fails the build on score, conflict or regression, and renders a job summary.
yaml# .github/workflows/prompt-check.yml
name: Prompt Check
on:
pull_request:
paths: ['prompts/**'] # adjust to your prompts directory
jobs:
eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: FranciscoFerreiraff/prompteval-action@v1
with:
api_key: ${{ secrets.PROMPTEVAL_API_KEY }}
prompt_file: prompts/support-agent.md
min_score: 75
fail_on_conflict: true
baseline_slug: support-agent # gate regression vs productionPrompts API (serving)
Fetch the production version of a prompt by slug. No AI call — pure library read. Safe to call on every request. Pro+.
https://prompt-eval.com/api/v1/prompts/{slug}Examples
bashcurl https://prompt-eval.com/api/v1/prompts/customer-support \
-H "Authorization: Bearer pe_live_your_key_here"Risk mitigation
Supabase has a 99.9% SLA — that means up to ~8h of downtime per year. Don't let your app go down because of it.
No fallback — stops if the API goes down
python# ❌ NÃO faça isso — você depende 100% do PromptEval em produção
def handle_request(user_input):
prompt = get_from_prompteval("customer-support") # falha se a API cair
return call_llm(prompt, user_input)With fallback — never stops working
python# ✅ Faça isso — fallback local garante uptime
FALLBACK_PROMPT = "You are a helpful assistant..." # versão hardcoded de emergência
def handle_request(user_input):
try:
prompt = get_from_prompteval("customer-support")
except Exception:
prompt = FALLBACK_PROMPT # nunca para de funcionar
return call_llm(prompt, user_input)javascript// ✅ JavaScript — fallback local
const FALLBACK_PROMPT = "You are a helpful assistant...";
async function getPrompt(slug) {
try {
const res = await fetch(
`https://prompt-eval.com/api/v1/prompts/${slug}`,
{ headers: { Authorization: `Bearer ${process.env.PROMPTEVAL_API_KEY}` } }
);
if (!res.ok) throw new Error(`HTTP ${res.status}`);
const data = await res.json();
return data.content;
} catch {
return FALLBACK_PROMPT; // nunca para de funcionar
}
}Response — 200
json{
"slug": "customer-support",
"name": "Customer Support Agent",
"version": 3,
"content": "You are a customer support agent...",
"score": 82,
"production_set_at": "2026-05-14T10:00:00Z"
}| Field | Type | Description |
|---|---|---|
| slug | string | Slug set in the library |
| name | string | Prompt name in the library |
| version | integer | Production version number |
| content | string | Prompt content (use as your system prompt) |
| score | integer | null | Version score (0–100), or null if not evaluated |
| production_set_at | string | ISO 8601 — when it was marked as production |
Errors
| Status | When it happens |
|---|---|
| 400 | Invalid slug format in URL |
| 401 | Missing, malformed, or revoked API key |
| 403 | Valid key, but account is not on the Pro/Team plan |
| 404 | Slug not found, or prompt has no production version |
| 405 | Method not allowed (use GET) |
Need higher quota? Contact us — we offer custom plans for high-volume teams. → francisco@prompt-eval.com
Missing something in the docs? Open an issue or send feedback →