Last month a teammate showed me a prompt he'd been fighting with for three days. He was trying to get Claude to generate SQL migrations from a schema diff, and the output was inconsistent enough to be useless. His conclusion was that AI tools "just can't do this kind of structured work reliably."
I took his prompt, made four changes, and it worked perfectly on the first try.
I'm not saying this to be smug about it. I'm saying it because I've been there. I blamed the models constantly in my first six months of working with LLMs. It was almost never the model. Here's what was actually going wrong.
Mistake 1: You're Asking a Question When You Should Be Giving Instructions
There's a massive difference between "Can you write a SQL migration for this?" and "Write a SQL migration for PostgreSQL 15 using this exact schema diff. Output only the SQL, no explanation."
The first is a question. The model will answer it conversationally and include a lot of explanation you didn't want. The second is a directive with explicit output requirements. The model treats these differently, and the output reflects it.
The fix: write prompts like you're writing a spec for a junior dev, not like you're asking Google a question.
Mistake 2: No Examples Means No Pattern to Follow
The fastest way to get consistent output is to show the model exactly what "correct" looks like. If you want function documentation in a specific format, show it one example of that format before asking it to write the docs. One-shot prompting (providing one example) dramatically improves output consistency on structured tasks.
My teammate's SQL migration prompt had zero examples of what a valid migration looked like. No wonder the output varied — the model had no pattern to anchor to.
Mistake 3: Too Many Requirements in One Request
I see this constantly. A single prompt that asks the AI to: read the code, identify problems, suggest fixes, write the fixed version, add documentation, and then explain the changes. That's six different tasks.
LLMs do sequence-of-thought reasoning, and when you pile on requirements, they start making trade-offs — usually prioritising the first few things you mentioned and giving weaker treatment to the rest. Split complex tasks into separate prompts. Chain them. The total output quality is significantly better.
Mistake 4: You're Not Giving the Model a Role
Starting your prompt with "You are a senior backend engineer specialising in PostgreSQL performance optimisation" genuinely changes the output. Not because the model doesn't know about PostgreSQL without the prompt — it does — but because the role anchors the model's approach, vocabulary level, and the kinds of tradeoffs it considers. It's not magic, but it's consistently useful for technical tasks.
Mistake 5: Vague Success Criteria
"Write a good explanation of this function." Good by whose standard? For what audience? In how many words? Vague success criteria produce vague output. If you're not specific about what you want, don't be surprised when you get something that doesn't quite fit.
Instead: "Write a two-paragraph explanation of this function for a developer who understands JavaScript but is new to async/await. Keep it under 150 words."
The One Question That Changed Everything for Me
Whenever a prompt produces bad output, I now ask myself: "If I handed this prompt to a very capable new developer, would they know exactly what I wanted?" Usually the answer is no. Fix the prompt for the human, and it'll work better for the model too.
The model isn't the bottleneck in most cases. The way we communicate with it is.