about

What is PromptEval

PromptEval is a technical prompt evaluation platform for LLMs. It works like a linter for prompts: paste your prompt, get a 0–100 score with a 4-dimension diagnostic, a list of critical issues, and surgical recommendations — in seconds, no setup required.

Why PromptEval exists

Teams running LLMs in production break agents because of poorly written prompts — and have no objective way to diagnose the problem. Most prompt engineering guides say "be specific", but none tell you exactly what is wrong with your current prompt.

PromptEval solves this with a repeatable score based on structural principles — not automatic rewriting. The goal is to give you understanding, not just a new version of the prompt.

What the product does

Score 0-100

Objective, repeatable evaluation with a 4-dimension breakdown

Dimension diagnostics

Clarity, specificity, structure, and robustness with subscores

Critical issues

Prioritized list of failures with direct impact on output

Surgical recommendations

Minimal numbered edits — no full rewrites

Token optimizer

Compresses the prompt while preserving intent, shows % reduction

Production iterator

Generates fixes based on expected vs observed behavior

Versioned library

Score history per version with diffs (Pro)

Public leaderboard

Ranking of top prompts by category

Who built this

Francisco Ferreira

AI Engineer · Founder of PromptEval

francisco@prompt-eval.com

7 years in data and machine learning, with experience at financial sector companies working on LLM agent architecture, RAG pipelines, and production evaluation systems.

PromptEval was born from a real need: teams running LLMs in production were breaking agents due to poorly written prompts, with no objective way to diagnose the problem.

The idea was to formalize the industry's technical criteria for reviewing production prompts (clarity, specificity, structure, robustness) into an automated, auditable, and repeatable score for any team working with LLMs.

LLM AgentsRAGLLM-as-JudgePrompt EngineeringData EngineeringML EngineeringPythonTypeScript

Tech stack

Next.js (App Router)SupabaseAnthropic SDKStripeVercelResendSentryGA4

Try the diagnostic

3 free evaluations per month · no credit card required

Get started →Read the blog