Prompting Guide by Model
System and user prompt templates, best practices and real examples for developers running LLMs in production.
Claude
AnthropicOpus 4.7 · Opus 4.6 · Sonnet 4.6 · Haiku 4.5 — updated for latest models
Production best practices
Based on the official Anthropic prompt engineering guide ↗
Treat Claude like a brilliant new employee — it doesn't know your conventions. Be specific about format, length and expected behavior. Vague instructions produce vague outputs.
Create an analytics dashboardCreate an analytics dashboard. Include as many relevant features and interactions as possible. Go beyond the basics for a fully-featured implementation.XML tags help Claude parse complex prompts unambiguously, especially when mixing instructions, context, and examples.
You are a sales assistant. Customer query: {{query}}. History: {{history}}. Customer plan: {{plan}}. Answer professionally and suggest upgrade if relevant.<context>
Customer plan: {{plan}}
History: {{history}}
</context>
<query>
{{query}}
</query>
<task>
Answer the question. Suggest upgrade only if plan is Free and query involves a Pro feature.
</task>
<format>
Max 3 paragraphs. Direct tone.
</format>Even one sentence of role in the system prompt makes a difference. Adding context about WHY an instruction exists helps Claude generalize correctly.
system="You are a support assistant specializing in fintech. Prioritize clarity over completeness — our users are non-technical."Examples are the most reliable way to guide format, tone and structure. Wrap them in <example> tags so Claude distinguishes them from instructions. Use 3-5 diverse examples that cover edge cases.
"Be polite" is aspirational and useless. "DO NOT invent information. DO NOT make unauthorized discount promises. If uncertain, escalate to a human." — that's operational and works.
Claude calibrates length by perceived task complexity. If your product depends on a fixed output level, specify it explicitly.
Be concise. Skip non-essential context. Keep examples minimal. Respond in 3 paragraphs maximum.Base template (copy-ready)
Template applying Anthropic's best practices: clear separation of responsibilities, XML tags for structure, and explicit output format.
Message structure
Claude uses three distinct roles in the API. system defines persistent behavior (persona, tone, constraints). user contains the task or question. assistant is the generated response. API reference ↗
Effort levels (Opus 4.7 / Sonnet 4.6)
The effort parameter controls reasoning depth vs. token cost. Adjust based on your use case. Extended thinking docs ↗
Maximum performance. May overthink on simple tasks.
Best for coding and agents. Best cost-benefit on complex tasks.
Balanced. Minimum recommended for reasoning-intensive tasks.
Cost-sensitive. Trades intelligence for speed.
Short tasks and critical latency. Avoid for complex reasoning.
Adaptive thinking
Claude Opus 4.7, Opus 4.6 and Sonnet 4.6 use adaptive thinking — the model decides when and how much to reason based on query complexity and effort level. Use for multi-step agents, coding, and long-horizon tasks.
"Thinking adds latency and should only be used when it meaningfully improves answer quality. When in doubt, respond directly.""This task involves multi-step reasoning. Think carefully through the problem before responding."Full example — support agent
Production prompt applying all best practices: clear role, operational constraints, defined format, edge cases covered, proper system/user separation.
GPT-5.5
OpenAIResponses API + Chat Completions — outcome-first prompting and structured workflows
Production best practices
Based on OpenAI's official GPT-5.5 prompt engineering guide ↗
Start with what success looks like, not the steps to get there. GPT-5.5 reasons better when it understands the goal before reading constraints.
Read the document. Extract the key points. Check if they're relevant. Summarize.Goal: produce a 3-bullet executive summary. Each bullet must be actionable and mention a concrete number or deadline. Source: the document below.GPT-5.5 follows explicit personality descriptions very closely. Define communication style, vocabulary level and what to actively avoid.
## Personality
Direct and technical. No filler phrases like "Of course!" or "Great question!". Use industry jargon when the audience knows it. Max 3 paragraphs per response.When providing long documents, tell the model how deeply to read them. Without this, GPT-5.5 may skim rather than analyze.
Here's the contract. Does it have any issues?Read the full contract carefully, including footnotes and appendices. List all clauses that could create financial obligations above $10k.For high-stakes tasks like code generation or data analysis, add a self-review step explicitly in the prompt.
After generating the SQL query, review it for: (1) performance issues on large tables, (2) missing edge cases for null values, (3) correctness against the expected schema.Match effort to the task. Over-reasoning on simple queries wastes latency and tokens. Under-reasoning on complex tasks produces errors.
reasoning_effort="high" on every call (including simple lookups)"low" for routing/classification, "medium" for generation, "high" for reasoning/analysis/codeBase template (copy-ready)
Template following OpenAI's official GPT-5.5 best practices: outcome-first instructions, personality block, explicit format, and validation loop.
Message structure
GPT-5.5 works with two APIs. The Responses API is recommended for agents and multi-step workflows — it manages state and supports built-in tools. Chat Completions keeps the classic stateless interface, familiar to anyone already using the API. Responses API reference ↗
Effort levels (reasoning_effort)
The reasoning_effort parameter controls how deeply the model thinks before responding. Match it to the task complexity. Reasoning docs ↗
Multi-step reasoning, complex code, data analysis, legal/financial review. Expect higher latency.
Content generation, summarization, Q&A. Best cost/quality balance for most use cases.
Routing, classification, simple lookups. Fast and cheap — avoid for tasks needing inference.
Full example — document analysis agent
Production prompt applying all best practices: outcome-first goal, defined personality, validation loop, explicit effort level, and proper context structure.
Does your prompt follow these best practices?
PromptEval does a technical review of your prompt — score 0-100, critical issues, wasted tokens and improvement suggestions.
3 free evaluations per month · no credit card
Gemini 3 · Llama 3 · Mistral — guides coming soon