Back to Insights

Document Intelligence Copilot on Azure: The Production-Ready Blueprint

Master Class Series
Part 1: Architecture RAG Pattern & Azure Doc Intelligence
Part 2: Implementation Step-by-Step Production Guide Part 3: Go-Live Well-Architected Playbook
Stop building toy copilots.

Marketing wants to chat with contracts. HR wants to chat with policies. But if you just "turn on" a model, you leak data. This is your production blueprint for a regulated Document Intelligence Copilot: zero leakage, full audit trails, and strict cost controls using Azure API Management and Application Gateway for Containers.

Get the Production Starter Kit

Don't start from any blank slate. Download the full UKLifeLabs Wave 1 Pack including:

  • Architecture Diagrams (Visio/Mermaid)
  • APIM Policy Pack (12+ policies)
  • Bicep/Terraform Modules (AKS, Search, AI)
  • Security Audit Checklist (Excel)
Download Kit (ZIP)

The Cast

Upendra
Upendra
Lead Architect
Trinity
Trinity
Cloud Engineer
Morpheus
Morpheus
Sec Architect
Project Manager
Project Manager
Plans, Progress, Risks & Issues

The Story

Scene 1: The "Chat with PDF" Demand

Project Manager: "Marketing wants a tool to 'Chat with Contracts'. HR wants 'Chat with Policies'. Can we just turn on a Copilot for everyone?"

Trinity: "If we just 'turn it on', Marketing might see HR's salary data. A generic chatbot has no idea who is allowed to see what."

Upendra: "Exactly. We need to build an airport, not just a library. APIM is the customs and security checkpoint. Nothing reaches the model without a boarding pass."

Morpheus: "And citations are non-negotiable. If the model can't prove where it found the answer, it refuses to speak. Zero hallucinations allowed."

Upendra: "Here is the blueprint for a regulated, audit-ready Document Intelligence Copilot."


1) TL;DR (30 seconds)

UKLifeLabs needs employees to upload large document packs and ask questions.

But answers must be:

This blog gives you a buildable reference architecture:

UI → APIM (AI Gateway) → RAG Orchestrator → AI Search → Azure OpenAI + Document Intelligence → Citations + Audit

🏁 Wave 1 Scope: What is IN and OUT?

2) Design principles (the Microsoft review-board version)

  1. APIM-first boundary: no direct calls from UI to Search, models, or ingestion.
  2. Two-lane architecture: chat stays fast, ingestion stays heavy and async.
  3. Evidence-or-refusal: answers must include citations (or the system refuses).
  4. Least privilege by default: retrieval filters enforce the user's access.
  5. Cost is a control plane concern: token budgets and rate limits are enforced at APIM.
  6. Operationally boring: everything is observable, alertable, and repeatable as code.

3) The Airport + Research Library analogy

We use the "Airport" analogy to explain the security model to stakeholders, but here is the strict mapping to technical components.

✈️ The Analogy ⚙️ The Technical Solution 🛑 The Production Reality
The Boarding Pass Entra ID + APIM <validate-jwt> Models have no auth. If you hit the endpoint directly, you bypass security. APIM forces the gate.
Airport Security APIM Token Limits & Throttling One department running a bulk job can starve the entire company. Quotas prevent "noisy neighbors".
Baggage Scanner Document Intelligence (Layout Model) PDFs are binary blobs. You cannot "search" pixels. You must extract structure first.
Restricted Lounge AI Search Security Trimming If you search for "Salary", the index must only return hits that match your group_id. Only APIM passes this context.
The Expert Panel Azure OpenAI (GPT-4o) Models are reasoning engines, not databases. They should only answer based on what the "Lounge" (Search) provided.

Rule: Nobody reaches the experts without passing security.

Target Throughput
1.1M TPM
Latency Target (P95)
< 4.0s
Rec. SKU (Search)
Standard S1
Rec. SKU (APIM)
Standard v2

3.5) Architecture Evolution: The Path to 2026+

⚠️ Critical Update: Kubernetes Ingress Landscape is Changing

In late 2024, the Kubernetes community announced that the ingress-nginx controller will be retired in March 2026. This affects millions of production deployments worldwide.

What This Means for You:

  • Community NGINX Ingress: ❌ No updates after March 2026
  • AKS Application Routing Add-on: ⚠️ Supported until November 2026
  • Application Gateway for Containers: ✅ Long-term Microsoft support

Microsoft's Strategic Direction:

  1. Application Gateway for Containers (AGC) - Available now
    • Azure-native Layer 7 load balancer
    • Kubernetes Gateway API support
    • Built-in WAF and security features
    • Direct pod communication (better performance)
  2. Gateway API with Istio - Coming H1 2026
    • Advanced traffic management
    • Service mesh capabilities

This Architecture Uses: Application Gateway for Containers (Future-Proof)

Want the complete implementation guide with setup steps, cost analysis, and migration strategies? Read the Complete Implementation Guide →

4) The architecture (practical reference stack)

UI layer (pick ONE)

UI is replaceable. The boundary is not.

APIM is the AI Gateway (non-negotiable)

APIM is the platform control plane. It enforces:

🛡️ The Audit Log Schema (Compliance Gold)

Don't just log "200 OK". Regulators need to know what was sent. Your APIM log-to-eventhub policy must capture this structure:

{
  "timestamp": "2026-01-18T10:00:00Z",
  "user_id": "upendra@company.com",
  "department": "IT_Architecture",
  "request_id": "req-guid-123",
  "action": "rag_query",
  "input_tokens": 150,
  "output_tokens": 300,
  "retrieved_documents": [
    { 
      "doc_id": "HR_Policy.pdf", 
      "index_version": "v1.2", 
      "classification": "Internal" 
    }
  ],
  "model_deployment": "gpt-4-turbo-0125"
}

Runtime layer (AKS vs ACA)

Recommendation: use AKS for production. It fits regulated enterprise patterns (network segmentation, control). Use ACA only for pilots.

Deep Dive: Unsure which to pick? Read AI Hosting Decision Tree: AKS vs. ACA vs. Web Apps.

Data + RAG layer

5) The two flows you must separate

This prevents production pain.

Flow A: Chat (user-facing, low latency)

  1. UI → APIM `/v1/private/chat`
  2. APIM validates JWT + limits
  3. Orchestrator queries AI Search with access filters
  4. Orchestrator calls Azure OpenAI with retrieved chunks
  5. Response returns with citations

📜 The Data Contract: How we prove it

Your generic chatbot fails because it returns text. A proper RAG system returns Evidence. The frontend must enforce this schema:

{
  "answer": "The standard policy allows 15 days of PTO...",
  "citations": [
    {
      "id": "doc-123",
      "filename": "HR_Policies_2025.pdf",
      "page_number": 5,
      "text_snippet": "...employees accrue 1.25 days per month...",
      "relevance_score": 0.89
    }
  ],
  "refusal_reason": null
}

Flow B: Ingestion (async, heavy workload)

  1. Document lands in storage
  2. Event triggers ingestion worker
  3. Document Intelligence extracts text
  4. Chunking + embeddings → Index into AI Search

6) Security trimming (do not skip this)

RAG must enforce permissions at retrieval time. Every chunk should carry `department`, `classification`, and `allowedGroups`.

Every query should filter by the user's Entra groups. If you skip this, you will leak data across teams.

7) Quota planning (TPM/RPM)

Quota is not a mystery. It’s a budget. Think of Azure OpenAI quota like a Family Data Plan:

🏗️ Design Decision: Why 550k TPM?

We don't plan by gut feel. We calculate target throughput based on specific load testing:

(140 req/min peak) × (3,000 tokens/req) = 420k TPM

We added 30% Headroom for spikes, landing at ~550k TPM. We purposely chose Standard (Pay-as-you-go) over PTU because our traffic is "bursty", not constant. PTU is for stable, 24/7 base loads.

8) The Architecture Board

Use these prompts to generate your official documentation diagrams. Standard notation matters.

Shared Hub
Prod AI Sub
User
Cloudflare
Tunnel
App Gateway
WAF v2
Azure Firewall
IDPS
API
APIM
Gateway
AKS Cluster
Orchestrator
OpenAI
HTTPS
TLS
INSP
mTLS/JWT

⚠️ The "Silent Killer": Private DNS

If your code works locally but fails inside AKS with 403 Forbidden or time-outs, it is almost always DNS. Ensure your Private Endpoints are registered in the privatelink.openai.azure.com and privatelink.search.windows.net zones and linked to the AKS VNET.

9) The Build Room (Copy-Paste)

Real production means infrastructure as code. Here are the core modules you need.

A) APIM Policy: The "Airport Security" Check

Don't rely on code for security. Enforce it at the gateway.


<!-- Validate JWT before anything else -->
<validate-jwt header-name="Authorization" failed-validation-httpcode="401">
    <openid-config url="https://login.microsoftonline.com/{tenantId}/v2.0/.well-known/openid-configuration" />
    <required-claims>
        <claim name="aud">
            <value>{audience}</value>
        </claim>
    </required-claims>
</validate-jwt>

<!-- Enforce Token Budget (Cost Control) -->
<rate-limit-by-key calls="500" renewal-period="60" counter-key="@(context.Request.IpAddress)" />
<azure-openai-token-limit tokens-per-minute="10000" counter-key="@(context.Subscription.Id)" />
        

B) Repo Structure (The "Standard")


/src
  /orchestrator (Python/FastAPI)
  /ingestion (Dotnet Worker)
/infra (Bicep)
  /modules
    /apim
    /ai-search
    /openai
  main.bicep
/policies (APIM XML)
  /fragments
    /security.xml
    /audit.xml
/tests
  /load-testing (Locust)
        

C) The Pipeline (GitHub Actions Snippet)

Friends don't let friends deploy broken XML. Validate it before the merge.


name: Validate APIM Policies
on: [pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      # Don't deploy. Just check if the XML is valid.
      - name: Check XML Syntax
        run: |
          find ./policies -name "*.xml" -print0 | xargs -0 -I {} xmllint --noout {}
          
      - name: Run OPA Policy Check
        run: ./scripts/check-compliance.sh # e.g., ensure no "star" allows
        

10) Go-live checklist


Essential Resources

Watch these sessions to understand the "Security Sandwich" architecture in depth.

Retrieval Augmented Generation with Azure AI Search

RAG in Azure AI Search

Azure Friday - Intro to RAG with Azure OpenAI

Build your own Copilot with Azure OpenAI Service

Build your own Copilot

Microsoft Mechanics - Copilot extensibility & Plugins

Vector Search in Azure AI Search

Vector Search Deep Dive

Microsoft Mechanics - RAG at Scale & Vector Search


Ready to operationalize your Azure journey?

I help organizations turn stalled cloud initiatives into execution engines.

Contact Me View the Toolkit
Back to Insights