2026-06-03·Francisco Ferreira·12 min read

How to Specify Output Format in AI Prompts

Learn exactly how to specify JSON, Markdown, CSV, and table outputs in AI prompts — with a format decision matrix and real before/after examples.

Quick Answer

To specify output format in AI prompts: (1) Choose the right format for your use case — JSON for machine parsing, Markdown for display, CSV for spreadsheets, tables for comparisons. (2) Define the exact structure: field names, types, column headers. (3) Place the instruction before your task and repeat it at the end. For >90% reliability in production, use the provider's native structured output API instead of prompt wording alone.

You asked for a product list. The model returned three paragraphs with the data scattered across sentences. You ran it again — this time a numbered list. Third run: bullet points. Same prompt, three different formats, none of them what you needed.

The fix isn't rewriting the prompt from scratch. It's adding explicit format instructions — the right ones, in the right place. Output format is a specificity problem: when you don't define the shape of the output, the model fills that silence with whatever format appeared most often in its training data for that type of task.

This guide is written by the PromptEval team. PromptEval is our product and is referenced throughout. The format techniques below work regardless of which tool you use to evaluate them.

At PromptEval, specificity is one of the four dimensions we score — clarity, specificity, structure, and robustness. Prompts that score below 40 on specificity almost always have the same failure mode: they describe what they want but not how they want it shaped. Across submissions we evaluate, prompts with no format instruction average around 38 on specificity; the same prompt with an explicit format schema definition averages around 57. That gap is the fastest single fix available in prompt engineering.

Why output format is a specificity problem

"Specificity" in prompting means reducing the number of decisions the model has to make on its own. Every degree of freedom you leave open is a chance for the model to substitute its judgment for yours — and to substitute differently on different runs.

When you write "give me a summary," you've left open: length, format (paragraph, bullets, numbered list, table), level of detail, what to prioritize, whether to include headers. The model resolves all of those. It resolves them differently depending on the input, the model version, and small variations in context length.

Specifying output format eliminates the format-choice dimension entirely. The model no longer needs to decide. For the full mechanics of prompt specificity — format is one of four dimensions, and typically the fastest to fix.

The Format Decision Matrix

Before writing a format instruction, the first question is: which format actually serves the use case? Most format failures happen because the instruction is added reflexively — "return as JSON" because it sounds structured — rather than matched to how the output will be consumed.

Two axes decide it: who reads the output (human or machine) and what happens to it next.

USE CASE READER BEST FORMAT PROMPT INSTRUCTION SYNTAX
UI display, documentation Human (rendered) Markdown "Respond in Markdown. Use ## for headers and - for bullets."
Spreadsheet import, data export Human or machine CSV "Output as CSV with headers: Name, Value, Date. No other text."
Comparison report, analysis Human (read directly) Markdown table "Markdown table: columns [X], [Y], [Z]. No intro paragraph."
API response, downstream parsing Machine (code) JSON "Return JSON: { name: string, score: number }. No preamble."
LLM-to-LLM pipeline Machine (next model) JSON (GPT/Gemini) or XML (Claude) Match format to the receiving model's training distribution
Direct chat, no copy-paste Human (chat) Plain text (default) (No instruction needed — models default here)

The instruction syntax in the right column isn't decorative. Copy it verbatim, swap in your field or column names, and append it to the end of your prompt. That's the minimal viable format instruction. More complex cases — specifically JSON and multi-turn conversations — need more, which the sections below cover.

PromptEval scores the specificity dimension of your prompt free. Paste the prompt you just formatted and see exactly where the model sees ambiguity — 3 evaluations/month, no credit card. Evaluate at prompt-eval.com/en →

JSON output: schema-first prompting

JSON is the right format when code needs to parse the output. A dashboard reading model responses, an automation processing extracted data, a pipeline where one model feeds another — all need predictable field names and types across every run.

The common mistake: asking for "JSON output" without defining the schema. The model produces valid JSON, but field names, nesting structure, and types vary between runs. Run "extract product details and return JSON" ten times and you'll see name, product_name, item, and title for the same field.

Schema-vague — field names vary across runs

Extract the product details from this text and return JSON.

Model invents field names, types, and nesting on every run. Downstream parsers break unpredictably.

Schema-first — consistent structure every run

Extract product details from the text below and return a JSON object with this exact structure:

{"name": string, "price_usd": number, "features": string[], "in_stock": boolean}

Return only the JSON. No explanations, no code fences.

Defines field names, types, and nesting. "No preamble" stops the model from wrapping JSON in a sentence.

Model differences matter. Claude handles JSON reliably via prompt instructions but performs better with XML natively — Claude's training included heavy XML-tagged data, which makes XML-delimited structures more format-stable when you're in a prompt-only environment. GPT-4o has a native response_format: { type: "json_object" } API parameter. Gemini supports response_mime_type: "application/json". If you're using Claude in a multi-agent pipeline where you need reliable structured data, XML tags often outperform raw JSON prompt instructions.

When JSON output breaks in long contexts, the json-repair Python library catches and fixes malformed output before it reaches your parser. Parse first, repair only if parse fails, throw only if repair fails — that three-step fallback handles most production edge cases.

Token cost by format. Verbose JSON with long descriptive field names costs more per API call than compact JSON with short keys. XML is the most expensive common format — every field requires an opening and closing tag. For pipelines running thousands of daily calls, switching from verbose JSON to compact JSON typically reduces output token count 20–40%. Pick the cheapest format that meets your parsing requirements.

Markdown, tables, and plain text

Markdown is for human-readable output that gets rendered: documentation, reports, article drafts, chat UI responses. The mistake is asking for "Markdown" without specifying which elements to use. Without element constraints, the model mixes heading levels, alternates between dashes and asterisks, and invents structure it thinks is helpful.

Specify elements directly:

Respond in Markdown. Use ## for the main section header, ### for subsections, and - for bullet lists. Bold key terms with **term**. Start directly with the first ## header — no introduction paragraph.

Tables need column definitions. Without them the model invents columns based on what it thinks matters — sometimes useful, never reliable for recurring tasks:

Present the comparison as a Markdown table. Columns: Product | Price | Key Feature | Rating. One row per product. No summary paragraph before or after the table.

Plain text is the right choice when output goes straight into a chat conversation and won't be processed, rendered, or copy-pasted anywhere. No instruction needed — it's the default. Adding format instructions when you don't need them adds tokens to your prompt and introduces potential for format drift.

Where to put format instructions

Placement changes compliance rate. A format instruction buried at the end of a 600-word prompt competes with the content that appears immediately before the model generates output. In long contexts, instructions far from the generation point carry less weight — the last few hundred tokens before output have disproportionate influence.

The triple-placement method cuts format drift in long prompts:

  1. System prompt — Establish the rule: "You always respond in JSON with this schema: {...}". Sets default behavior for every turn.
  2. User prompt, before the task — Restate before the task description: "Return JSON only. Task: extract the following..."
  3. User prompt, after the task — Compress and repeat at the end: "JSON only. No explanations."

For short prompts under ~400 tokens, placement matters less. For longer prompts — system prompts that have been edited across months, agent instructions with conditional logic, multi-section templates — triple placement is what separates consistent from drifting format compliance.

Format drift is when a model starts a response in the correct format and then slips: returning JSON for three fields and then appending a natural-language explanation, or starting a table and switching to bullets halfway through. Three causes: long context pushing early instructions far from the output generation point; competing instructions where a "conversational" system prompt fights a "JSON only" user instruction; and complex output structure where the model can't maintain the schema while reasoning through content simultaneously.

Fixes: triple placement, temperature 0 or 0.1 for strict format tasks, and chunking for complex outputs (request one JSON object at a time rather than an array of 20). How a prompt is structured overall directly affects how reliably any single instruction within it is followed — format instructions work best when the prompt groups them in a dedicated section rather than scattering them across the text.

When prompt instructions aren't enough

There's a threshold where prompting stops being the right tool: when you need format compliance above roughly 90% in a production system that runs without human review.

Below that threshold, good format instructions plus triple placement get compliance high enough for most practical uses — internal tools, content workflows, draft generation, anything where a human reviews before the output is consumed.

Above it — API responses your code parses automatically, data pipelines with no fallback, anything that fails silently if the format breaks — use the provider's native structured output mechanism. These don't rely on instruction following. They constrain the model's token generation using grammar-based enforcement at the decoding layer.

PROVIDER MECHANISM API PARAMETER COMPLIANCE
OpenAI JSON Schema enforcement (strict mode) response_format: { type: "json_schema", strict: true } Near 100% — CFG-enforced
Anthropic (Claude) Tool use with input schema tools: [{ input_schema: {...} }] + tool_choice forced Near 100% — schema enforced
Google (Gemini) MIME type + JSON schema response_mime_type: "application/json" + response_schema High — schema constrained

The API parameters are not better prompting — they're a different mechanism. Prompt-based instructions ask the model to comply. API-level enforcement prevents non-compliance at the generation level. Start with prompt-based format instructions. Switch to the API parameter when reliability becomes a correctness requirement, not a quality preference.

Common format mistakes

Mistake 1: "Return JSON" without a schema. The model picks field names from context. Run it across different inputs and the names drift. Fix: define every field with its type — { name: string, price: number, tags: string[] } leaves no field-naming decisions to the model.

Mistake 2: Conflicting format rules. A system prompt that says "always be conversational and explain your reasoning" directly competes with a user prompt that says "return JSON only." The model has to choose — and conversational instructions usually win. Fix: scope the conflict explicitly: "be conversational for open-ended questions; for data extraction tasks, return JSON only with no explanation, regardless of conversational defaults."

Mistake 3: Single format instruction in a multi-turn conversation. You specified the format in turn 1. By turn 7, after a stretch of unstructured back-and-forth, the format instruction is diluted by context. Fix: include a brief format reminder in any turn where structured output is required.

If you're unsure which dimension of your prompt is causing quality issues, how to evaluate AI prompt quality covers the full diagnostic across all four dimensions — format specification is the fastest part to fix once you've identified it as the gap.


You just built a more specific prompt. See the exact score it gets — PromptEval evaluates free with 3 credits. The specificity dimension score shows exactly how much format ambiguity remains before you ship. For a hands-on practice loop, the PromptEval Daily Challenge runs format-constrained prompt tasks every day — write a prompt that produces the required output in the specified format, score it, and see where it ranks on the leaderboard.

Frequently Asked Questions

Q How do I make an AI always return JSON?

For prompt-based control: include the JSON schema in your prompt, state "return JSON only, no explanations" at both the start and end of your instructions, and set temperature to 0. For near-100% reliability in production, use the provider's native parameter: response_format with json_schema on OpenAI (strict mode), tool use on Claude, or response_mime_type on Gemini.

Q Why does the AI keep ignoring my format instructions?

Three causes cover most cases: the format instruction is placed too far from where the output starts; a competing instruction contradicts it (a "conversational" system prompt fighting "return JSON only"); or the prompt is long enough that early instructions fade in context. The triple-placement method — format instruction in the system prompt, before the task, and compressed at the end — fixes the placement issue. Competing instructions need explicit scoping.

Q What's the difference between JSON mode and prompting for JSON?

Prompting for JSON is an instruction the model follows by convention — it will try to return JSON but nothing prevents non-JSON output if competing signals are strong. JSON mode (OpenAI's response_format with strict: true) constrains token generation at the decoding level using a context-free grammar. Non-JSON tokens cannot be generated. It's enforced at the architecture level, not requested through text.

Q Which output format uses the fewest tokens?

Cheapest to most expensive per unit of data: plain text → compact JSON (short keys, minimal nesting) → Markdown → verbose JSON (long descriptive field names) → XML. XML is the most expensive because every field requires both an opening and closing tag. Switching from verbose to compact JSON typically reduces output token count 20–40% for the same data. Pick the cheapest format that meets your parsing requirements.

Q Does specifying output format affect prompt quality scores?

Yes. On PromptEval, format specification feeds directly into the specificity dimension score. A prompt that names the format and defines the schema typically scores 15–25 points higher on specificity than the same prompt without format instructions. Specificity is one of four dimensions evaluated: clarity, specificity, structure, and robustness.

Apply what you just learned — evaluate your prompt free.

Try PromptEval →