How to structure a system prompt for Claude?

Claude system prompts should define role, tone, constraints and output format. Use XML tags to separate sections like , and . Be specific and direct — Claude responds better to explicit instructions than vague guidelines.

What is the difference between system prompt and user prompt in Claude?

The system prompt defines persistent model behavior (persona, constraints, tone, format). The user prompt contains the specific task or question for each turn. In the Claude API, the system prompt goes in the "system" parameter and the user prompt goes in the "messages" array with role "user".

How to improve Claude prompt quality in production?

Use XML tags to structure prompts, provide examples (few-shot), be explicit about output format, place long data before instructions, and use the effort parameter to control reasoning depth. Avoid vague instructions like "be polite" — prefer concrete operational constraints.

templates · best practices · real examples

Prompting Guide by Model

System and user prompt templates, best practices and real examples for developers running LLMs in production.

Claude GPT-5.5Gemini 3 coming soonLlama 3 coming soonMistral coming soon

Claude

Anthropic

Opus 4.7 · Opus 4.6 · Sonnet 4.6 · Haiku 4.5 — updated for latest models

ler em português →

Production best practices

Based on the official Anthropic prompt engineering guide ↗

Be clear and direct

Treat Claude like a brilliant new employee — it doesn't know your conventions. Be specific about format, length and expected behavior. Vague instructions produce vague outputs.

less effective

Create an analytics dashboard

more effective

Create an analytics dashboard. Include as many relevant features and interactions as possible. Go beyond the basics for a fully-featured implementation.

Use XML tags for structure

XML tags help Claude parse complex prompts unambiguously, especially when mixing instructions, context, and examples.

less effective

You are a sales assistant. Customer query: {{query}}. History: {{history}}. Customer plan: {{plan}}. Answer professionally and suggest upgrade if relevant.

more effective

<context>
Customer plan: {{plan}}
History: {{history}}
</context>

<query>
{{query}}
</query>

<task>
Answer the question. Suggest upgrade only if plan is Free and query involves a Pro feature.
</task>

<format>
Max 3 paragraphs. Direct tone.
</format>

Give Claude a role and context

Even one sentence of role in the system prompt makes a difference. Adding context about WHY an instruction exists helps Claude generalize correctly.

system="You are a support assistant specializing in fintech. Prioritize clarity over completeness — our users are non-technical."

Use examples (few-shot)

Examples are the most reliable way to guide format, tone and structure. Wrap them in <example> tags so Claude distinguishes them from instructions. Use 3-5 diverse examples that cover edge cases.

Operational constraints, not aspirational

"Be polite" is aspirational and useless. "DO NOT invent information. DO NOT make unauthorized discount promises. If uncertain, escalate to a human." — that's operational and works.

Control verbosity explicitly

Claude calibrates length by perceived task complexity. If your product depends on a fixed output level, specify it explicitly.

Be concise. Skip non-essential context. Keep examples minimal. Respond in 3 paragraphs maximum.

Base template (copy-ready)

Template applying Anthropic's best practices: clear separation of responsibilities, XML tags for structure, and explicit output format.

system prompt

You are a [role] specialized in [domain].

Your tone is [tone: direct/consultative/technical/friendly].

<constraints>
- Never [what NOT to do — be operational, not aspirational]
- Always [mandatory behavior]
- If [edge case], then [specific action]
</constraints>

<output_format>
Respond in [format: prose/JSON/list]. [Length: X paragraphs / max Y words].
Expected structure: [describe structure]
</output_format>

user prompt

<context>
[Background information relevant to this specific task]
</context>

<task>
[What you want Claude to do — be specific and imperative]
</task>

<examples>
<example>
Input: [sample input]
Expected output: [ideal sample output]
</example>
</examples>

Tip: Place long documents and data before instructions — this can improve quality by up to 30% in long-context prompts (20k+ tokens).

Message structure

Claude uses three distinct roles in the API. system defines persistent behavior (persona, tone, constraints). user contains the task or question. assistant is the generated response. API reference ↗

API structure (Python)

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system="You are an expert assistant in [domain]. [Tone]. [Constraints].",
    messages=[
        {"role": "user", "content": "Your task or question here."}
    ],
)

Tip: Put persona, tone and global constraints in the system prompt. Leave the specific task in the user prompt. This reduces tokens per turn in long conversations.

Effort levels (Opus 4.7 / Sonnet 4.6)

The effort parameter controls reasoning depth vs. token cost. Adjust based on your use case. Extended thinking docs ↗

max

Maximum performance. May overthink on simple tasks.

xhigh

Best for coding and agents. Best cost-benefit on complex tasks.

high

Balanced. Minimum recommended for reasoning-intensive tasks.

medium

Cost-sensitive. Trades intelligence for speed.

low

Short tasks and critical latency. Avoid for complex reasoning.

usage example

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},  # recommended for coding and agents
    messages=[{"role": "user", "content": "..."}],
)

Adaptive thinking

Claude Opus 4.7, Opus 4.6 and Sonnet 4.6 use adaptive thinking — the model decides when and how much to reason based on query complexity and effort level. Use for multi-step agents, coding, and long-horizon tasks.

thinking too much?

"Thinking adds latency and should only be used when it meaningfully improves answer quality. When in doubt, respond directly."

not thinking enough?

"This task involves multi-step reasoning. Think carefully through the problem before responding."

Full example — support agent

Production prompt applying all best practices: clear role, operational constraints, defined format, edge cases covered, proper system/user separation.

system prompt (production)

You are a support assistant for Acme Corp, specialized in resolving questions about payments and deliveries.

<tone>
Direct and professional. No emojis. Respond in 1-3 paragraphs, conversational tone, with a clear action at the end.
</tone>

<constraints>
DO NOT invent information about deadlines, policies or products.
DO NOT make unauthorized discount promises or exceptions.
DO NOT ask for sensitive data (SSN, password, full card number).
If uncertain, escalate — never speculate.
</constraints>

<scenarios>
PAYMENT: If customer reports a problem, ask for order number (format: 8 digits) and confirm investigation.
DELIVERY: Standard deadline is 5-10 business days from payment confirmation.
IMMEDIATE ESCALATION: aggressive customer, refund/cancellation request, out-of-scope question, request to speak with manager.
</scenarios>

user prompt

<conversation_history>
[previous turns if any]
</conversation_history>

Customer message: {{customer_message}}

GPT-5.5

OpenAI

Responses API + Chat Completions — outcome-first prompting and structured workflows

ler em português →

Production best practices

Based on OpenAI's official GPT-5.5 prompt engineering guide ↗

Outcome-first prompting

Start with what success looks like, not the steps to get there. GPT-5.5 reasons better when it understands the goal before reading constraints.

less effective

Read the document. Extract the key points. Check if they're relevant. Summarize.

more effective

Goal: produce a 3-bullet executive summary. Each bullet must be actionable and mention a concrete number or deadline. Source: the document below.

Personality blocks improve consistency

GPT-5.5 follows explicit personality descriptions very closely. Define communication style, vocabulary level and what to actively avoid.

## Personality
Direct and technical. No filler phrases like "Of course!" or "Great question!". Use industry jargon when the audience knows it. Max 3 paragraphs per response.

Retrieval budgets for long context

When providing long documents, tell the model how deeply to read them. Without this, GPT-5.5 may skim rather than analyze.

less effective

Here's the contract. Does it have any issues?

more effective

Read the full contract carefully, including footnotes and appendices. List all clauses that could create financial obligations above $10k.

Validation loops for critical outputs

For high-stakes tasks like code generation or data analysis, add a self-review step explicitly in the prompt.

After generating the SQL query, review it for: (1) performance issues on large tables, (2) missing edge cases for null values, (3) correctness against the expected schema.

reasoning_effort by use case

Match effort to the task. Over-reasoning on simple queries wastes latency and tokens. Under-reasoning on complex tasks produces errors.

less effective

reasoning_effort="high" on every call (including simple lookups)

more effective

"low" for routing/classification, "medium" for generation, "high" for reasoning/analysis/code

Base template (copy-ready)

Template following OpenAI's official GPT-5.5 best practices: outcome-first instructions, personality block, explicit format, and validation loop.

instructions / system prompt

You are a [role] at [context].

## Goal
[What the model should accomplish — be outcome-focused, not process-focused]

## Personality
[Tone, communication style, what to avoid]

## Constraints
- DO NOT [hard restriction 1]
- DO NOT [hard restriction 2]
- If [edge case], then [specific action]

## Output format
[format: prose / JSON / list / markdown]
[length: max X paragraphs or Y words]
[structure: describe if needed]

user message

<context>
[Background relevant to this specific request]
</context>

<task>
[What you want GPT-5.5 to do — be specific and imperative]
</task>

Tip: Put the goal before the constraints. GPT-5.5 is trained to optimize for stated outcomes — if the goal is vague, constraints alone won't compensate.

Message structure

GPT-5.5 works with two APIs. The Responses API is recommended for agents and multi-step workflows — it manages state and supports built-in tools. Chat Completions keeps the classic stateless interface, familiar to anyone already using the API. Responses API reference ↗

Responses API (recommended for agents)

from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    instructions="You are a [role] specialized in [domain]. [Tone]. [Constraints].",
    input="Task or question here.",
    reasoning={"effort": "high"},       # high | medium | low
)

Chat Completions API (classic)

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[
        {"role": "system", "content": "You are a [role] specialized in [domain]."},
        {"role": "user",   "content": "Task or question here."}
    ],
    reasoning_effort="high",            # high | medium | low
)

Tip: Use the Responses API for production agents — it handles conversation state, supports tool calls, and produces cleaner code for multi-step workflows.

Effort levels (reasoning_effort)

The reasoning_effort parameter controls how deeply the model thinks before responding. Match it to the task complexity. Reasoning docs ↗

high

Multi-step reasoning, complex code, data analysis, legal/financial review. Expect higher latency.

medium

Content generation, summarization, Q&A. Best cost/quality balance for most use cases.

low

Routing, classification, simple lookups. Fast and cheap — avoid for tasks needing inference.

usage example (Responses API)

response = client.responses.create(
    model="gpt-5.5",
    instructions="You are a senior code reviewer. Be thorough.",
    input=f"Review this Python function for bugs and edge cases:\n\n{code}",
    reasoning={"effort": "high"},
)

Full example — document analysis agent

Production prompt applying all best practices: outcome-first goal, defined personality, validation loop, explicit effort level, and proper context structure.

instructions (Responses API)

You are a contract analysis assistant for a legal team.

## Goal
Identify all clauses that could create unexpected financial, operational, or legal obligations for our company. Prioritize findings by risk level (high / medium / low).

## Personality
Precise and conservative. Use legal terminology when appropriate. Never interpret ambiguity in the client's favor — flag it as a risk.

## Constraints
DO NOT summarize clauses that have no risk implications.
DO NOT speculate about intent — flag ambiguity as risk and quote the exact text.
If a clause references external documents not provided, mark as "external dependency — review required".

## Output format
JSON array. Each item: { "clause": "...", "risk": "high|medium|low", "reason": "...", "quote": "..." }
After the JSON, add a brief plain-language executive summary (max 3 sentences).

user message

<context>
Contract type: SaaS services agreement
Counterparty: enterprise client
Our company role: vendor
</context>

<task>
Analyze the contract below and identify all risk clauses.
Read the full document including appendices before responding.
After generating the output, review it to ensure no clause above medium risk was missed.
</task>

<document>
{{contract_text}}
</document>

evaluate your prompt

Does your prompt follow these best practices?

PromptEval does a technical review of your prompt — score 0-100, critical issues, wasted tokens and improvement suggestions.

Evaluate my prompt →See example report

3 free evaluations per month · no credit card

Gemini 3 · Llama 3 · Mistral — guides coming soon