Write a single, clear sentence stating exactly what output you want — the task definition.
AI Prompt Engineering Workflow
Build prompts that work the first time — and keep working. This phased workflow covers every step from precise task definition and few-shot examples to hallucination prevention and long-term prompt library maintenance. For more background and examples, see the guidance below; for built-in tools and options, use the quick tools guide.
Checklist Items
0 done•17 left•3 of 4 sections collapsed
Assign a role or persona at the start of the prompt to frame domain knowledge and tone.
Provide all necessary background context — audience, purpose, brand voice, technical level, and relevant constraints.
Write instructions using action verbs and explicit step-by-step structure for multi-part tasks.
Decide which components belong in the system prompt versus the user prompt for API-based integrations.
The six layers every effective prompt needs
Most prompts fail not because the model lacks capability, but because the prompt omits one of six functional layers. Each layer resolves a different ambiguity. A missing layer forces the model to guess — and it will, confidently and plausibly.
Which technique should you reach for first?
Not every prompt needs every technique. This decision guide matches common failure symptoms to their most likely causes — so you fix the right thing instead of iterating blindly.
| If the output is… | Likely cause | Reach for… |
|---|---|---|
| Generic or surface-level | Vague task or absent role | Sharper task definition + specific role |
| Wrong structure or format | Format not specified | Explicit format spec + example output |
| Shallow or inconsistent reasoning | No reasoning path forced | Chain-of-thought instruction |
| Plausible but invented facts | Model generating missing data | Factual constraint + provide source data in context |
| Right content, wrong tone | Audience or voice not specified | Context block with audience details + few-shot examples |
| Inconsistent across repeated runs | Ambiguous instructions or high temperature | Resolve the ambiguity + lower temperature toward 0 |
| Instructions partially ignored | Critical instructions buried in the middle | Move key instructions to the top and repeat at the end |
⚠️ Where you place instructions changes how reliably they're followed
Research on large language models documents a consistent pattern called the "lost in the middle" effect: models attend most reliably to content at the beginning and end of a long prompt. Instructions positioned in the middle of a long context — particularly when large documents are included — receive less reliable attention than the same instructions placed at the edges.
In practice: put your task definition, format requirements, and critical constraints at the very top of the prompt, before any background documents or examples. If you're providing a long document for analysis, repeat the core instruction after the document as well as before it. For any constraint that must not be violated — a factual restriction, a word limit, a scope boundary — position it prominently rather than burying it in a paragraph of context.
This effect matters more as context windows grow. A 128K-token or 200K-token context window does not mean every sentence in a long document is equally weighted. For document-heavy prompts, targeted excerpts often produce more reliable outputs than feeding the full text — the model focuses on what's there rather than averaging across too much.
📖 What a three-sentence prompt costs in production
A fintech startup built an automated customer email responder using an LLM. Their prompt was three sentences: task, tone, length limit. It worked fine in internal testing. In production, it started citing non-existent policy documents — professional, plausible, completely fabricated. Customers forwarded these responses to support agents expecting follow-through on promises that had never been made.
The fix was two prompt changes and an afternoon of work. Rebuilding the customer trust took months. The hallucination problem was entirely predictable and preventable — but only if engineered for before deployment, not after.
🧮 The one-change-at-a-time rule — why it matters
Prompt engineering is empirical. When a prompt produces the wrong output, the temptation is to fix everything at once: rewrite the task, add examples, adjust the constraints, and change the format spec in a single edit. The result is a prompt that may work better — but you have no idea which change caused the improvement.
This matters because the next time the prompt behaves unexpectedly, you'll have no basis for diagnosis. Treat each refinement as an experiment: one change, one re-test, one recorded observation. It's slower in the short term and dramatically faster over the life of a prompt used in production.
Prompt behavior varies across models — what to know before migrating
A prompt tuned for GPT-4o may produce meaningfully different output on Claude Sonnet or Gemini 1.5 Pro — not because one model is better, but because each has different training data, fine-tuning, and default behaviors for ambiguous instructions. Key practical differences to account for:
- Instruction following: Claude models tend to follow explicit constraints more literally. GPT models sometimes interpret instructions more liberally when they conflict with "helpful" defaults. If you migrate prompts, test constraint adherence explicitly.
- JSON reliability: Neither model guarantees valid JSON from prose instructions alone on complex outputs. Use JSON mode — available in both the OpenAI and Anthropic APIs — for any pipeline where parseable output is a hard requirement.
- Default verbosity: Models have different default output lengths for identical prompts. If length is a hard constraint, always specify it explicitly rather than assuming parity across models.
- Context window limits: Claude Sonnet 4.6 supports 200K tokens; GPT-4o supports 128K. For use cases involving very long documents, model selection is a practical infrastructure decision, not only a quality preference.
When migrating a prompt library from one model to another: treat it as a test-and-refine cycle, not a direct port. The structure of well-engineered prompts transfers reliably; the exact behavior often does not.
💡 Retrieval-augmented generation — when to reach for it
Retrieval-augmented generation (RAG) is a pattern where, instead of asking the model to recall facts from its training data, you retrieve the relevant source documents and inject them directly into the prompt context. The model then generates a response grounded in those documents rather than in its parametric memory.
RAG is not always necessary — but it's the right tool when: (1) the task requires accurate, up-to-date, or proprietary factual information that the model's training data doesn't reliably contain; (2) hallucination risk on specific claims is unacceptable; or (3) the source of truth needs to be auditable. For customer-facing content about your products, policies, or pricing — RAG transforms the hallucination problem from a prompt engineering challenge into a retrieval quality challenge, which is significantly easier to control.
For teams not yet using RAG: the simplest starting point is manually pasting the relevant source document into the prompt context and adding a constraint — "Base your response only on the document provided below. Do not add information from outside this document." This captures most of the hallucination-reduction benefit before investing in a full retrieval pipeline.
Master This Checklist Quickly
Every important button and option for this pre-made checklist, shown in a glance-friendly format.
Start Here
- 1
Click any item row to mark it complete.
- 2
Use the note row under each item for quick notes.
- 3
Use the tool row for undo, redo, reset, and check all.
- 4
Use Save Progress when you want to continue later.
Checklist Row Tools
Top Action Buttons
Share
Open all sharing and export options in one menu.
Add & Ask
Open one menu for apps and AI guidance.
Copy and customize
Create a new editable checklist pre-filled with your chosen content.
Save Progress
Adds this checklist to My Checklists and keeps your progress in this browser.
Most Natural Usage
Track over time
Check items -> Add notes where needed -> Save Progress
Send or export
Open Share -> Choose format -> Continue
Make your own version
Copy and customize -> Open create page -> Edit freely
Checklistify
Free Printable Checklists
AI Prompt Engineering Workflow
Build prompts that work the first time — and keep working. This phased workflow covers every step from precise task definition and few-shot examples to hallucination prevention and long-term prompt library maintenance.
Phase 1 — Define the Task and Context
Phase 2 — Provide Examples, Constraints, and Parameters
Phase 3 — Test and Refine
Phase 4 — Finalize and Maintain
Additional Notes
Use this space for follow-ups, reminders, and key references.
