How do you write prompts for production AI systems?

Production prompts need three things: a clear role definition, explicit constraints, and structured output format. Include few-shot examples covering both common and edge cases. Always validate outputs with a schema library like Zod.

How do you reduce token costs in production?

Compress system prompts by removing verbose language, use shorter field names, and limit few-shot examples to essential cases. We reduced token usage by 40% while maintaining accuracy by rewriting prompts to be concise.

Should you version control prompts?

Absolutely. Prompts are code. Store them in versioned files with changelogs, A/B test configurations, and model specifications. This enables rollback, comparison testing, and team collaboration.

AI Engineering Production

Prompt Engineering for
Production Systems

Real prompts from production, not theory. System prompts, few-shot examples, structured outputs, and versioning strategies.

📖 14 min read January 24, 2026

Most prompt engineering content is useless. "Be specific" and "provide context" doesn't help when you need 99.9% reliability at 2 million requests per month. This is what actually works in production.

These are real prompts running in production systems—the patterns that survived A/B testing, edge cases, and scale.

Pattern 1: The Production System Prompt

A good system prompt does three things: defines the role, sets constraints, and specifies output format. Here's our lead qualification prompt:

System Prompt: Lead Qualifier

You are a lead qualification assistant for a real estate investment company. ROLE: Analyze incoming leads and extract structured data for our CRM. CONSTRAINTS: - Never invent information not present in the input - If a field is unclear, use null instead of guessing - Phone numbers must be 10 digits (US) or marked invalid - Dates should be ISO 8601 format OUTPUT: Always respond with valid JSON matching this schema: { "name": string | null, "phone": string | null, "email": string | null, "property_address": string | null, "motivation": "high" | "medium" | "low" | "unknown", "timeline": string | null, "confidence": number (0-1) } If input is completely unintelligible, return {"error": "unparseable", "raw": ""}.

Pattern 2: Few-Shot Examples

Examples are worth 1000 words of instructions. Include 2-3 examples covering common cases and edge cases:

few-shot-examples.ts
const fewShotExamples = [
  {
    role: "user",
    content: "hi im john smith at 123 main st need to sell fast my number is 5551234567"
  },
  {
    role: "assistant",
    content: JSON.stringify({
      name: "John Smith",
      phone: "5551234567",
      email: null,
      property_address: "123 Main St",
      motivation: "high",  // "sell fast" indicates urgency
      timeline: "immediate",
      confidence: 0.85
    })
  },
  {
    role: "user",
    content: "asdf keyboard smash 12345"
  },
  {
    role: "assistant",
    content: JSON.stringify({
      error: "unparseable",
      raw: "asdf keyboard smash 12345"
    })
  }
];

Critical: Always Include Edge Cases

Your few-shot examples should include at least one "bad input" example. Without it, the model will try to extract data from garbage, leading to hallucinated outputs that pollute your database.

Pattern 3: Structured Output Enforcement

JSON mode isn't enough. Validate and coerce outputs to your schema:

output-validation.ts
import { z } from 'zod';

const LeadSchema = z.object({
  name: z.string().nullable(),
  phone: z.string().regex(/^\d{10}$/).nullable(),
  email: z.string().email().nullable(),
  property_address: z.string().nullable(),
  motivation: z.enum(['high', 'medium', 'low', 'unknown']),
  timeline: z.string().nullable(),
  confidence: z.number().min(0).max(1)
});

async function qualifyLead(input: string): Promise<Lead> {
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 500,
    system: SYSTEM_PROMPT,
    messages: [...fewShotExamples, { role: 'user', content: input }]
  });
  
  const text = response.content[0].text;
  
  try {
    const parsed = JSON.parse(text);
    return LeadSchema.parse(parsed);  // Throws if invalid
  } catch (e) {
    // Log failure for prompt iteration
    await logPromptFailure(input, text, e);
    throw new Error('Output validation failed');
  }
}

Pattern 4: Token Optimization

Tokens cost money. At scale, every word matters:

Technique	Token Savings	Trade-off
Abbreviate instructions	20-30%	Slight accuracy drop
Remove verbose examples	40-50%	Edge case handling
Use shorter field names	10-15%	Readability
Compress system prompt	25-35%	Maintainability

Before: 847 tokens

You are a helpful assistant that analyzes real estate leads and extracts relevant information from them. Please carefully read the input and identify the following fields if they are present...

After: 312 tokens

Extract lead data. Output JSON only. Fields: name, phone (10 digits), email, address, motivation (high/med/low/unknown), timeline, confidence (0-1). Unknown = null. Bad input = {"error":"unparseable"}.

Pattern 5: Prompt Versioning

Prompts are code. Version them like code:

prompts/lead-qualifier/v2.3.ts
export const LEAD_QUALIFIER_PROMPT = {
  version: '2.3',
  model: 'claude-sonnet-4-20250514',
  
  system: `Extract lead data. Output JSON only...`,
  
  examples: [...],
  
  // Track changes
  changelog: [
    '2.3: Added confidence score',
    '2.2: Fixed phone validation edge case',
    '2.1: Reduced tokens by 40%',
    '2.0: Complete rewrite for Claude 3'
  ],
  
  // A/B test config
  testConfig: {
    enabled: true,
    variants: ['v2.2', 'v2.3'],
    metric: 'conversion_rate'
  }
};

Pattern 6: Graceful Degradation

When the model fails, have a fallback:

fallback-chain.ts
async function qualifyWithFallback(input: string) {
  // Try AI extraction first
  try {
    return await qualifyLead(input);
  } catch (e) {
    console.log('AI extraction failed, trying regex');
  }
  
  // Fallback to regex extraction
  const phone = input.match(/\d{10}/)?.[0] || null;
  const email = input.match(/[\w.-]+@[\w.-]+/)?.[0] || null;
  
  return {
    name: null,
    phone,
    email,
    property_address: null,
    motivation: 'unknown',
    timeline: null,
    confidence: 0.3,  // Low confidence for regex
    extraction_method: 'regex_fallback'
  };
}

Production Checklist

System prompt defines role, constraints, and output format
Few-shot examples cover common cases AND edge cases
Zod or similar validates all AI outputs
Prompts are versioned with changelogs
Token usage is monitored and optimized
Fallback extraction exists for failures
Failed extractions are logged for iteration
A/B testing infrastructure for prompt variants

The difference between a demo and production is error handling. Your prompt will fail. Plan for it.

Designing for Model Failure

Building AI Chatbots That Convert

The Real Cost of Serverless

Need Production AI Systems?

We build AI infrastructure that works at scale.

→ Get Started

Pattern 1: The Production System Prompt

Pattern 2: Few-Shot Examples

Pattern 3: Structured Output Enforcement

Pattern 4: Token Optimization

Pattern 5: Prompt Versioning

Pattern 6: Graceful Degradation

Production Checklist

Related Articles

Need Production AI Systems?