api reference

API

Evaluate prompts, gate regression in CI, and serve production versions over HTTP. Open on every plan.

View plans →Generate my API key →

Authentication

Every request needs an API key in the Authorization header.

Authorization: Bearer pe_live_your_key_here

ℹ Generate and revoke keys in the dashboard, under API. Available on any plan — what changes per plan is the managed quota (see below).

Plans & quota

The Eval API is open on every plan. Each plan has a monthly managed quota (evals run on our infra). BYOK does not consume quota.

Plan	Managed quota/mo	Modes
free	10	lint
basic	30	lint
pro	75	lint + full
team	250	lint + full
BYOK	unlimited (your key)	lint + full on any plan

ℹ lint = score + 4 dimensions + conflicts + benchmark + pass (cheap). full = + graph + problems + recommendations + improved prompt (needs Pro/Team OR BYOK). Score cache does not consume quota (same prompt = same result, free). Rate limit: 5 req/min on all plans.

Endpoint

POSThttps://prompt-eval.com/api/v1/eval

Request body

json{
  "prompt": "Your prompt text here",
  "mode": "lint",
  "language": "en"
}

Parameters

Parameter	Type	Req.	Default	Description
prompt	string	Yes	—	The prompt to evaluate (10–20,000 chars)
mode	string	No	"lint"	"lint" or "full" (full needs Pro/Team or BYOK)
language	string	No	"en"	"en" or "pt" — analysis language
min_score	integer	No	your threshold / 75	Bar for the pass field
fail_on_conflict	boolean	No	false	Fail when a contradiction is detected
baseline_slug	string	No	—	Gate regression vs the slug's production version (Pro+)
max_drop	number	No	0	Tolerated score drop vs baseline
provider_key	string	No	—	BYOK — Anthropic key (prefer the X-Provider-Key header)

Constraints

prompt minimum: 10 characters · prompt maximum: 20,000 characters

Response

mode "lint" (default)

json{
  "mode": "lint",
  "score": 74,
  "dimensions": {
    "clarity": 68,
    "specificity": 81,
    "structure": 77,
    "robustness": 70
  },
  "conflicts": [
    { "a": "Be concise", "b": "Explain in full detail", "type": "contradiction" }
  ],
  "benchmark": { "top_percent": 22, "total": 1840 },
  "pass": false,
  "min_score": 75,
  "meta": {
    "prompt_chars": 1240,
    "evaluated_at": "2026-06-04T14:32:00Z",
    "plan": "pro",
    "byok": false,
    "calls_month": 12,
    "quota_month": 75,
    "quota_remaining": 63
  }
}

Response fields

Field	Type	Description
mode	string	"lint" or "full" (echoes the mode used)
score	integer	Overall quality score, 0–100
dimensions.clarity	integer	How unambiguous the instruction is
dimensions.specificity	integer	How precisely the output is defined
dimensions.structure	integer	How logically the prompt is organized
dimensions.robustness	integer	How well it handles edge cases
conflicts	object[]	Detected conflicts: { a, b, type }. type: contradiction \| tension \| redundancy
benchmark	object \| null	{ top_percent, total } — percentile vs all evals; null until enough data
pass	boolean	CI gate verdict (see section)
min_score	integer	The threshold actually used
meta.plan	string	The key's plan
meta.byok	boolean	Whether it ran on your own key
meta.calls_month	integer	Managed calls this month
meta.quota_month	integer	Plan managed quota
meta.quota_remaining	integer	Managed quota left

mode "full" — adds graph, problems, recommendations and improved prompt

json{
  "mode": "full",
  "score": 74,
  "dimensions": { "clarity": 68, "specificity": 81, "structure": 77, "robustness": 70 },
  "conflicts": [ /* ... */ ],
  "benchmark": { "top_percent": 22, "total": 1840 },
  "pass": false,
  "min_score": 75,
  "graph": { "nodes": [ /* ... */ ], "edges": [ /* ... */ ] },
  "problems": {
    "critical": ["Output format is underspecified."],
    "warnings": ["Persona may conflict with the tone constraint."]
  },
  "recommendations": [
    "Add an explicit output-format section with an example.",
    "Reconcile the tone constraint with the persona."
  ],
  "improved_prompt": "You are a customer support agent...",
  "meta": { "plan": "pro", "byok": false, "calls_month": 13, "quota_month": 75, "quota_remaining": 62 }
}

Field	Type	Description
graph	object	{ nodes, edges } — the prompt instruction graph
problems.critical	string[]	Critical problems
problems.warnings	string[]	Warnings
recommendations	string[]	Actionable improvements
improved_prompt	string	Rewritten version of the prompt
regression	object	Present only when baseline_slug is passed (see gate)

CI gate (the pass field)

The pass field is the boolean verdict to use in CI. It is true when ALL of the conditions below hold:

1. score ≥ min_score — precedence: body min_score > your saved production threshold > 75

2. no contradiction — OR fail_on_conflict is off (only contradictions fail, not tensions/redundancies)

3. no regression — if baseline_slug was passed and the drop exceeds max_drop, regressed=true and pass=false

Regression vs production

Pass baseline_slug to compare the current score against that slug's PRODUCTION version (Pro+, requires a slug + a production version). max_drop sets the tolerated drop (0 = any drop fails). The regression object is returned:

json"regression": {
  "baseline": { "slug": "support-agent" },
  "baseline_score": 82,
  "current_score": 74,
  "delta": -8,
  "max_drop": 0,
  "regressed": true
}

Errors

Status	Code	When it happens
400	Bad Request	prompt missing/short, invalid mode/language, malformed provider_key
401	Unauthorized	Missing/revoked API key, or BYOK key rejected
402	Quota Exhausted	Monthly managed quota used up (response carries an upgrade CTA)
403	Forbidden	mode "full" without Pro/Team and without BYOK
404	Not Found	baseline_slug has no prompt or no production version
413	Payload Too Large	Prompt exceeds 20,000 characters
422	Unprocessable	Baseline production version has no score yet
429	Too Many Requests	5 req/min rate limit reached
500	Server Error	Eval failed — retry

All errors return a JSON body:

json{ "error": "machine_code", "message": "Human-readable description." }

BYOK — bring your own key

Send the X-Provider-Key header with an Anthropic key (sk-ant-...). Inference runs on your key: it consumes no managed quota, costs us nothing, and unlocks full mode on any plan (including free).

bash# BYOK: run on your own Anthropic key.
# No managed quota is consumed, and it unlocks "full" on any plan.
curl -X POST https://prompt-eval.com/api/v1/eval \
  -H "Authorization: Bearer pe_live_your_key_here" \
  -H "X-Provider-Key: sk-ant-your-anthropic-key" \
  -H "Content-Type: application/json" \
  -d '{ "prompt": "You are a customer support agent...", "mode": "full" }'

ℹ The BYOK key is never stored — it is used only for that request. Prefer the X-Provider-Key header over the provider_key body field.

Examples

bashcurl -X POST https://prompt-eval.com/api/v1/eval \
  -H "Authorization: Bearer pe_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "You are a customer support agent. Answer clearly and concisely. Always end by asking if there is anything else you can help with.",
    "mode": "lint"
  }'

CI/CD integration

The recommended way is the official GitHub Action — it builds the request, fails the build on score, conflict or regression, and renders a job summary.

yaml# .github/workflows/prompt-check.yml
name: Prompt Check
on:
  pull_request:
    paths: ['prompts/**']   # adjust to your prompts directory
jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: FranciscoFerreiraff/prompteval-action@v1
        with:
          api_key: ${{ secrets.PROMPTEVAL_API_KEY }}
          prompt_file: prompts/support-agent.md
          min_score: 75
          fail_on_conflict: true
          baseline_slug: support-agent   # gate regression vs production

⚠ Store your API key as a GitHub Actions secret named PROMPTEVAL_API_KEY. Never commit keys to your repository.

Prompts API (serving)

Fetch the production version of a prompt by slug. No AI call — pure library read. Safe to call on every request. Pro+.

GEThttps://prompt-eval.com/api/v1/prompts/{slug}

How to use this endpoint

1. Open the Library

2. Set a slug on your prompt (e.g. "customer-support")

3. Mark a version as production (score ≥ your threshold)

4. Call the endpoint with your API key (Pro or Team)

ℹ Responses are cached at the edge for 60 seconds. Teams can call this endpoint on every request — the CDN absorbs the load. This endpoint consumes no quota.

Examples

bashcurl https://prompt-eval.com/api/v1/prompts/customer-support \
  -H "Authorization: Bearer pe_live_your_key_here"

Risk mitigation

Supabase has a 99.9% SLA — that means up to ~8h of downtime per year. Don't let your app go down because of it.

⚠ Never depend 100% on an external API in production code. Always keep a local fallback.

No fallback — stops if the API goes down

python# ❌ NÃO faça isso — você depende 100% do PromptEval em produção
def handle_request(user_input):
    prompt = get_from_prompteval("customer-support")  # falha se a API cair
    return call_llm(prompt, user_input)

With fallback — never stops working

python# ✅ Faça isso — fallback local garante uptime
FALLBACK_PROMPT = "You are a helpful assistant..."  # versão hardcoded de emergência

def handle_request(user_input):
    try:
        prompt = get_from_prompteval("customer-support")
    except Exception:
        prompt = FALLBACK_PROMPT  # nunca para de funcionar
    return call_llm(prompt, user_input)

javascript// ✅ JavaScript — fallback local
const FALLBACK_PROMPT = "You are a helpful assistant...";

async function getPrompt(slug) {
  try {
    const res = await fetch(
      `https://prompt-eval.com/api/v1/prompts/${slug}`,
      { headers: { Authorization: `Bearer ${process.env.PROMPTEVAL_API_KEY}` } }
    );
    if (!res.ok) throw new Error(`HTTP ${res.status}`);
    const data = await res.json();
    return data.content;
  } catch {
    return FALLBACK_PROMPT; // nunca para de funcionar
  }
}

ℹ The fallback doesn't need to be perfect — it just needs to exist. Hardcode the last stable version of your prompt and update it occasionally.

Response — 200

json{
  "slug": "customer-support",
  "name": "Customer Support Agent",
  "version": 3,
  "content": "You are a customer support agent...",
  "score": 82,
  "production_set_at": "2026-05-14T10:00:00Z"
}

Field	Type	Description
slug	string	Slug set in the library
name	string	Prompt name in the library
version	integer	Production version number
content	string	Prompt content (use as your system prompt)
score	integer \| null	Version score (0–100), or null if not evaluated
production_set_at	string	ISO 8601 — when it was marked as production

Errors

Status	When it happens
400	Invalid slug format in URL
401	Missing, malformed, or revoked API key
403	Valid key, but account is not on the Pro/Team plan
404	Slug not found, or prompt has no production version
405	Method not allowed (use GET)

Need higher quota? Contact us — we offer custom plans for high-volume teams. → francisco@prompt-eval.com

Missing something in the docs? Open an issue or send feedback →