Back to Insights

Well-Architected AI: The "Go-Live" Playbook for Production Copilots

Master Class Series
Part 1: Architecture RAG Pattern & Azure Doc Intelligence Part 2: Implementation Step-by-Step Production Guide
Part 3: Go-Live Well-Architected Playbook
Production survives an adversary.

A demo just survives a presentation. This is the playbook for crossing the gap. We stop guessing and start engineering: dual-lane isolation, evidence-based refusal contracts, and Day-2 operations that keep you out of the headlines.

Get the Production Starter Kit

Don't start from a blank slate. Download the full UKLifeLabs Wave 1 Pack including:

  • Architecture Diagrams (Visio/Mermaid)
  • APIM Policy Pack (12+ policies)
  • Bicep/Terraform Modules (AKS, Search, AI)
  • Security Audit Checklist (Excel)
Download Kit (ZIP) View Source Repo

*Includes the official APIM Bicep Module (PremiumV2).

Architecting Production Ready AI Systems

The Cast

Upendra
Upendra
Lead Architect
Trinity
Trinity
Cloud Engineer
Morpheus
Morpheus
Security Architect
Project Manager
Project Manager

Decisions, RACI, milestones.


Scene 1: The Question that Matters

Project Manager: “We go live next week. Are we production-ready?”

Trinity: “The chat works. The ingestion pipeline is moving documents. It looks great in the playground.”

Morpheus: “That’s not production, Trinity. That’s just a working endpoint. A production system survives an adversary; a demo just survives a presentation.”

Upendra: “Morpheus is right. In this new era, 'production' is a high bar. It means being safe, reliable, cost-controlled, testable, and operable. Anything else is just compute.”

1) Why AI Architecture is Different

Traditional software fails in predictable ways. AI applications fail in surprising ways:

To counter this, we don’t just code. We architect guardrails.

2) WAF Pillars → Production Guardrails

Upendra turns “principles” into controls. This is what makes the design review easy.

WAF Pillar What it means for AI workloads Wave 1 guardrails to implement
Reliability Survive throttling + timeouts + backlogs Bulkheads (2 lanes), queue-based ingestion, retries/backoff, circuit breaker
Security Prevent data leakage + bypass routes APIM boundary, private endpoints, security trimming, content safety
Cost Optimization Stop silent spend creep Token budgets at gateway, per-product quotas, caching, PTU strategy
Operational Excellence Run it daily without heroics Trace IDs, dashboards, runbooks, incident playbooks, versioning
Performance Efficiency Keep p95 stable under load Retrieval time budget, topK caps, hybrid retrieval, graceful refusal paths

3) Reference Architecture: Two Lanes, One Gateway

To prevent downstream chaos, we separate the workload into two isolated lanes with one boundary.

Two Lanes AI Architecture Pattern

Why this works: bulkhead isolation. Ingestion can be on fire and chat still holds p95.

4) The Five-Layer Model

Upendra stops the “patchwork architecture” by layering the workload.

Five Layer AI Architecture
Layer Responsibility Azure component
1. Client UI only, keep it thin Teams / Web App / Open WebUI
2. Intelligence Orchestrator where rules live Orchestrator API
3. Inferencing Model calls + deployment versioning Azure OpenAI / Foundry
4. Knowledge Retrieval + grounding Azure AI Search + ADLS
5. Tools Actions + business APIs Internal APIs via APIM

Rule: the client is a view. The orchestrator is the brain.

5) The AI Workload Design Loop

Trinity: “Once deployed, we are done, right?”
Upendra: “In AI, you are never done. You ship, then you correct drift.”

Wave 1 needs this loop:

  1. Build: prompt, retrieval, policies, limits
  2. Measure: citation coverage, refusal rate, p95 latency, token/request
  3. Adapt: tuning chunking/retrieval, policy updates, index rebuilds
  4. Control change: version prompts, indexes, policies, model deployments

6) Platform Decisions (Wave 1 Tips)

AKS vs ACA

Training vs Inference Compute

7) The Three Golden Production Contracts

Upendra insists most copilots fail because they lack clear contracts.

Protecting APIs with APIM

Contract 1: Evidence-or-Refusal

If grounding exists → answer + citations.
If grounding is missing → refuse.

Enforce it server-side using a validator. Not just prompt wording.

Contract 2: The Audit Trail

A “200 OK” is not an audit. Log what influenced the answer associated with the `requestId`: chunk IDs, filter logic, model deployment, and index version.

Contract 3: Permission-Aware Retrieval

The model is not your security boundary. Retrieval is. Security trimming must be enforced using validated identity claims so users only see authorised content.

8) Grounding Data Design

Morpheus: “Show me the mechanism that prevents cross-team leaks.”
Upendra: “It’s not the model. It’s the metadata and filters.”

Minimum Chunk Metadata (Wave 1)

9) Data Platform Strategy

Don’t build an extra data platform “because AI”. Start simple with ADLS for raw/extracted content and AI Search for vectors. Add complexity (Fabric/OneLake) only when specific aggregation or governance needs demand it.

10) Operations: Monitor Signals, Not Hopes

11) Day-2 Operating Model

This avoids “who owns it?” fights during incidents.

AI Governance Structure
Area Primary owner What they own
APIM + identity boundary Platform team Auth, rate limits, token budgets, routing
Orchestrator API App team Prompting, validation, contracts, fallback logic
Search + Retrieval Data team Index schema, chunking, filters, freshness
Security + Compliance Security team Content safety, audit reviews, policy sign-off

12) Testing and Gates

Credibility comes from a Golden Test Set of 30–50 questions including "must refuse" scenarios and prompt injection attempts. Block release if citation coverage drops or cross-group retrieval is detected.

13) GenAIOps Change Control

GenAIOps Lifecycle

Version everything that changes behavior: `promptVersion`, `indexVersion`, `policyVersion`. Your release gates must verify security trimming and injection resistance before every deployment.

14) Responsible AI (Minimum Controls)

Responsible AI Controls

Responsible AI is not a slide. It’s enforcement. Ensure citations are transparent, content safety filters are active (input/output), and PII usage is explicitly governed.


Scene 2: The Definition of Done

Client Sponsor: “So, what did we really ship?”

Morpheus: “A Copilot that doesn’t leak data, doesn’t collapse under load, and leaves an evidence trail for every answer.”

Upendra: “That is Well-Architected AI. The rest is just compute.”

Ready to operationalize your AI?

I help organizations turn stalled pilots into governed production engines.

Contact Me Download Playbook
Back to Insights