about

What is PromptEval

PromptEval is a technical prompt evaluation platform for LLMs. It works like a linter for prompts: paste your prompt, get a 0–100 score with a 4-dimension diagnostic, a list of critical issues, and surgical recommendations — in seconds, no setup required.

Why PromptEval exists

Teams running LLMs in production break agents because of poorly written prompts — and have no objective way to diagnose the problem. Most prompt engineering guides say "be specific", but none tell you exactly what is wrong with your current prompt.

PromptEval solves this with a repeatable score based on structural principles — not automatic rewriting. The goal is to give you understanding, not just a new version of the prompt.

What the product does

Score 0-100
Objective, repeatable evaluation with a 4-dimension breakdown
Dimension diagnostics
Clarity, specificity, structure, and robustness with subscores
Critical issues
Prioritized list of failures with direct impact on output
Surgical recommendations
Minimal numbered edits — no full rewrites
Token optimizer
Compresses the prompt while preserving intent, shows % reduction
Production iterator
Generates fixes based on expected vs observed behavior
Versioned library
Score history per version with diffs (Pro)
Public leaderboard
Ranking of top prompts by category

Who built this

F
Francisco Ferreira
AI Engineer · Founder of PromptEval
francisco@prompt-eval.com

7 years in data and machine learning, with experience at financial sector companies working on LLM agent architecture, RAG pipelines, and production evaluation systems.

PromptEval was born from a real need: teams running LLMs in production were breaking agents due to poorly written prompts, with no objective way to diagnose the problem.

The idea was to formalize the industry's technical criteria for reviewing production prompts (clarity, specificity, structure, robustness) into an automated, auditable, and repeatable score for any team working with LLMs.

LLM AgentsRAGLLM-as-JudgePrompt EngineeringData EngineeringML EngineeringPythonTypeScript

Tech stack

Next.js (App Router)SupabaseAnthropic SDKStripeVercelResendSentryGA4

Try the diagnostic

3 free evaluations per month · no credit card required

Get started →Read the blog