PromptEval/ docs
🇧🇷🇺🇸
← back
api reference

API

Evaluate prompts, gate regression in CI, and serve production versions over HTTP. Open on every plan.

Authentication

Every request needs an API key in the Authorization header.

Authorization: Bearer pe_live_your_key_here
Generate and revoke keys in the dashboard, under API. Available on any plan — what changes per plan is the managed quota (see below).

Plans & quota

The Eval API is open on every plan. Each plan has a monthly managed quota (evals run on our infra). BYOK does not consume quota.

PlanManaged quota/moModes
free10lint
basic30lint
pro75lint + full
team250lint + full
BYOKunlimited (your key)lint + full on any plan
lint = score + 4 dimensions + conflicts + benchmark + pass (cheap). full = + graph + problems + recommendations + improved prompt (needs Pro/Team OR BYOK). Score cache does not consume quota (same prompt = same result, free). Rate limit: 5 req/min on all plans.

Endpoint

POSThttps://prompt-eval.com/api/v1/eval

Request body

json{
  "prompt": "Your prompt text here",
  "mode": "lint",
  "language": "en"
}

Parameters

ParameterTypeReq.DefaultDescription
promptstringYesThe prompt to evaluate (10–20,000 chars)
modestringNo"lint""lint" or "full" (full needs Pro/Team or BYOK)
languagestringNo"en""en" or "pt" — analysis language
min_scoreintegerNoyour threshold / 75Bar for the pass field
fail_on_conflictbooleanNofalseFail when a contradiction is detected
baseline_slugstringNoGate regression vs the slug's production version (Pro+)
max_dropnumberNo0Tolerated score drop vs baseline
provider_keystringNoBYOK — Anthropic key (prefer the X-Provider-Key header)

Constraints

prompt minimum: 10 characters · prompt maximum: 20,000 characters

Response

mode "lint" (default)

json{
  "mode": "lint",
  "score": 74,
  "dimensions": {
    "clarity": 68,
    "specificity": 81,
    "structure": 77,
    "robustness": 70
  },
  "conflicts": [
    { "a": "Be concise", "b": "Explain in full detail", "type": "contradiction" }
  ],
  "benchmark": { "top_percent": 22, "total": 1840 },
  "pass": false,
  "min_score": 75,
  "meta": {
    "prompt_chars": 1240,
    "evaluated_at": "2026-06-04T14:32:00Z",
    "plan": "pro",
    "byok": false,
    "calls_month": 12,
    "quota_month": 75,
    "quota_remaining": 63
  }
}

Response fields

FieldTypeDescription
modestring"lint" or "full" (echoes the mode used)
scoreintegerOverall quality score, 0–100
dimensions.clarityintegerHow unambiguous the instruction is
dimensions.specificityintegerHow precisely the output is defined
dimensions.structureintegerHow logically the prompt is organized
dimensions.robustnessintegerHow well it handles edge cases
conflictsobject[]Detected conflicts: { a, b, type }. type: contradiction | tension | redundancy
benchmarkobject | null{ top_percent, total } — percentile vs all evals; null until enough data
passbooleanCI gate verdict (see section)
min_scoreintegerThe threshold actually used
meta.planstringThe key's plan
meta.byokbooleanWhether it ran on your own key
meta.calls_monthintegerManaged calls this month
meta.quota_monthintegerPlan managed quota
meta.quota_remainingintegerManaged quota left

mode "full" — adds graph, problems, recommendations and improved prompt

json{
  "mode": "full",
  "score": 74,
  "dimensions": { "clarity": 68, "specificity": 81, "structure": 77, "robustness": 70 },
  "conflicts": [ /* ... */ ],
  "benchmark": { "top_percent": 22, "total": 1840 },
  "pass": false,
  "min_score": 75,
  "graph": { "nodes": [ /* ... */ ], "edges": [ /* ... */ ] },
  "problems": {
    "critical": ["Output format is underspecified."],
    "warnings": ["Persona may conflict with the tone constraint."]
  },
  "recommendations": [
    "Add an explicit output-format section with an example.",
    "Reconcile the tone constraint with the persona."
  ],
  "improved_prompt": "You are a customer support agent...",
  "meta": { "plan": "pro", "byok": false, "calls_month": 13, "quota_month": 75, "quota_remaining": 62 }
}
FieldTypeDescription
graphobject{ nodes, edges } — the prompt instruction graph
problems.criticalstring[]Critical problems
problems.warningsstring[]Warnings
recommendationsstring[]Actionable improvements
improved_promptstringRewritten version of the prompt
regressionobjectPresent only when baseline_slug is passed (see gate)

CI gate (the pass field)

The pass field is the boolean verdict to use in CI. It is true when ALL of the conditions below hold:

1. score ≥ min_score — precedence: body min_score > your saved production threshold > 75
2. no contradiction — OR fail_on_conflict is off (only contradictions fail, not tensions/redundancies)
3. no regression — if baseline_slug was passed and the drop exceeds max_drop, regressed=true and pass=false

Regression vs production

Pass baseline_slug to compare the current score against that slug's PRODUCTION version (Pro+, requires a slug + a production version). max_drop sets the tolerated drop (0 = any drop fails). The regression object is returned:

json"regression": {
  "baseline": { "slug": "support-agent" },
  "baseline_score": 82,
  "current_score": 74,
  "delta": -8,
  "max_drop": 0,
  "regressed": true
}

Errors

StatusCodeWhen it happens
400Bad Requestprompt missing/short, invalid mode/language, malformed provider_key
401UnauthorizedMissing/revoked API key, or BYOK key rejected
402Quota ExhaustedMonthly managed quota used up (response carries an upgrade CTA)
403Forbiddenmode "full" without Pro/Team and without BYOK
404Not Foundbaseline_slug has no prompt or no production version
413Payload Too LargePrompt exceeds 20,000 characters
422UnprocessableBaseline production version has no score yet
429Too Many Requests5 req/min rate limit reached
500Server ErrorEval failed — retry

All errors return a JSON body:

json{ "error": "machine_code", "message": "Human-readable description." }

BYOK — bring your own key

Send the X-Provider-Key header with an Anthropic key (sk-ant-...). Inference runs on your key: it consumes no managed quota, costs us nothing, and unlocks full mode on any plan (including free).

bash# BYOK: run on your own Anthropic key.
# No managed quota is consumed, and it unlocks "full" on any plan.
curl -X POST https://prompt-eval.com/api/v1/eval \
  -H "Authorization: Bearer pe_live_your_key_here" \
  -H "X-Provider-Key: sk-ant-your-anthropic-key" \
  -H "Content-Type: application/json" \
  -d '{ "prompt": "You are a customer support agent...", "mode": "full" }'
The BYOK key is never stored — it is used only for that request. Prefer the X-Provider-Key header over the provider_key body field.

Examples

bashcurl -X POST https://prompt-eval.com/api/v1/eval \
  -H "Authorization: Bearer pe_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "You are a customer support agent. Answer clearly and concisely. Always end by asking if there is anything else you can help with.",
    "mode": "lint"
  }'

CI/CD integration

The recommended way is the official GitHub Action — it builds the request, fails the build on score, conflict or regression, and renders a job summary.

yaml# .github/workflows/prompt-check.yml
name: Prompt Check
on:
  pull_request:
    paths: ['prompts/**']   # adjust to your prompts directory
jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: FranciscoFerreiraff/prompteval-action@v1
        with:
          api_key: ${{ secrets.PROMPTEVAL_API_KEY }}
          prompt_file: prompts/support-agent.md
          min_score: 75
          fail_on_conflict: true
          baseline_slug: support-agent   # gate regression vs production
Store your API key as a GitHub Actions secret named PROMPTEVAL_API_KEY. Never commit keys to your repository.

Prompts API (serving)

Fetch the production version of a prompt by slug. No AI call — pure library read. Safe to call on every request. Pro+.

GEThttps://prompt-eval.com/api/v1/prompts/{slug}
How to use this endpoint
1. Open the Library
2. Set a slug on your prompt (e.g. "customer-support")
3. Mark a version as production (score ≥ your threshold)
4. Call the endpoint with your API key (Pro or Team)
Responses are cached at the edge for 60 seconds. Teams can call this endpoint on every request — the CDN absorbs the load. This endpoint consumes no quota.

Examples

bashcurl https://prompt-eval.com/api/v1/prompts/customer-support \
  -H "Authorization: Bearer pe_live_your_key_here"

Risk mitigation

Supabase has a 99.9% SLA — that means up to ~8h of downtime per year. Don't let your app go down because of it.

Never depend 100% on an external API in production code. Always keep a local fallback.

No fallback — stops if the API goes down

python# ❌ NÃO faça isso — você depende 100% do PromptEval em produção
def handle_request(user_input):
    prompt = get_from_prompteval("customer-support")  # falha se a API cair
    return call_llm(prompt, user_input)

With fallback — never stops working

python# ✅ Faça isso — fallback local garante uptime
FALLBACK_PROMPT = "You are a helpful assistant..."  # versão hardcoded de emergência

def handle_request(user_input):
    try:
        prompt = get_from_prompteval("customer-support")
    except Exception:
        prompt = FALLBACK_PROMPT  # nunca para de funcionar
    return call_llm(prompt, user_input)
javascript// ✅ JavaScript — fallback local
const FALLBACK_PROMPT = "You are a helpful assistant...";

async function getPrompt(slug) {
  try {
    const res = await fetch(
      `https://prompt-eval.com/api/v1/prompts/${slug}`,
      { headers: { Authorization: `Bearer ${process.env.PROMPTEVAL_API_KEY}` } }
    );
    if (!res.ok) throw new Error(`HTTP ${res.status}`);
    const data = await res.json();
    return data.content;
  } catch {
    return FALLBACK_PROMPT; // nunca para de funcionar
  }
}
The fallback doesn't need to be perfect — it just needs to exist. Hardcode the last stable version of your prompt and update it occasionally.

Response — 200

json{
  "slug": "customer-support",
  "name": "Customer Support Agent",
  "version": 3,
  "content": "You are a customer support agent...",
  "score": 82,
  "production_set_at": "2026-05-14T10:00:00Z"
}
FieldTypeDescription
slugstringSlug set in the library
namestringPrompt name in the library
versionintegerProduction version number
contentstringPrompt content (use as your system prompt)
scoreinteger | nullVersion score (0–100), or null if not evaluated
production_set_atstringISO 8601 — when it was marked as production

Errors

StatusWhen it happens
400Invalid slug format in URL
401Missing, malformed, or revoked API key
403Valid key, but account is not on the Pro/Team plan
404Slug not found, or prompt has no production version
405Method not allowed (use GET)

Need higher quota? Contact us — we offer custom plans for high-volume teams. → francisco@prompt-eval.com

Missing something in the docs? Open an issue or send feedback →