IMPLEMENTATION GUIDE

Document Intelligence Copilot:
Complete Implementation Guide

Production-grade setup with Application Gateway for Containers, cost analysis, migration strategies, and operational excellence

January 18, 2026 25 min read Azure AI, AKS, AGC

Master Class Series

Part 1: Architecture RAG Pattern & Azure Doc Intelligence

Part 2: Implementation Step-by-Step Production Guide

Part 3: Go-Live Well-Architected Playbook

The Kuberenetes Ingress landscape is failing.

NGINX is retiring. The AKS Application Routing Add-on has an expiration date. If you are building for 2026 on legacy ingress controllers, you are building technical debt. It is time to standardize on Application Gateway for Containers (AGC).

Prerequisites: This guide assumes you've read the architecture overview. This is the deep-dive implementation guide with complete setup steps, cost breakdowns, and migration strategies.

Architecture Evolution: Why Application Gateway for Containers
Complete AGC Setup Guide (5 Steps)
Cost Analysis: $1K to $19K/Month Scenarios
7-Week Migration Guide (NGINX → AGC)
Monitoring & Observability
Troubleshooting Runbooks
Testing Strategies
Complete CI/CD Pipeline
YouTube Videos & Microsoft Learn References

1. Architecture Evolution: Why Application Gateway for Containers

In late 2024, the Kubernetes community announced that the ingress-nginx controller will be retired in March 2026. This affects millions of production deployments worldwide.

Timeline

2024 ────────── 2025 ────────── 2026 ────────── 2027+
  │                │                │                │
  ├─ AGC GA        ├─ NGINX         ├─ Gateway API  ├─ Full
  │  (Available)   │  Retirement    │  (Istio)      │  Migration
  │                │  (March)       │  Available    │  Complete

Microsoft's Strategic Direction

Microsoft is investing in two future-proof solutions:

1. Application Gateway for Containers (AGC) - Available Now

Azure-native Layer 7 load balancer
Kubernetes Gateway API support
Built-in WAF and security features
Direct pod communication (better performance)
Managed by Microsoft (no manual updates)

2. Gateway API with Istio - Coming H1 2026

Kubernetes Gateway API implementation
Powered by Istio control plane
Advanced traffic management
Service mesh capabilities

This guide uses Application Gateway for Containers (AGC) as the long-term, production-ready solution.

2. Complete AGC Setup Guide

Follow these 5 steps to deploy Application Gateway for Containers for your RAG Copilot.

Prerequisites

# Azure CLI version 2.50.0 or later
az version

# AKS cluster with managed identity
az aks show -g myResourceGroup -n myAKSCluster --query identity

# Required permissions
az role assignment list --assignee <your-identity> --scope <aks-resource-id>

Required Roles:

Contributor on the AKS cluster
Network Contributor on the VNET
Managed Identity Operator on the AKS managed identity

Step 1: Enable the AGC Add-on

# Register the feature (if not already registered)
az feature register \
  --namespace Microsoft.ContainerService \
  --name AKS-ExtensionManager

# Wait for registration (check status)
az feature show \
  --namespace Microsoft.ContainerService \
  --name AKS-ExtensionManager

# Enable the add-on
az aks enable-addons \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --addons azure-application-gateway-for-containers

# Verify installation
kubectl get pods -n azure-alb-system

Expected Output:

NAME                                  READY   STATUS    RESTARTS   AGE
alb-controller-7d4b8c9f5d-x7k2m      1/1     Running   0          2m
alb-controller-bootstrap-xyz123      0/1     Completed 0          3m

Step 2: Deploy the Traffic Controller (Bicep)

Create the AGC infrastructure using Bicep:

@description('Name of the Application Gateway for Containers')
param agcName string

@description('Location for all resources')
param location string = resourceGroup().location

@description('Subnet ID for AGC')
param subnetId string

// Application Gateway for Containers (Traffic Controller)
resource trafficController 'Microsoft.ServiceNetworking/trafficControllers@2023-11-01' = {
  name: agcName
  location: location
  properties: {
    associations: [
      {
        subnet: {
          id: subnetId
        }
      }
    ]
  }
}

// Frontend configuration
resource frontend 'Microsoft.ServiceNetworking/trafficControllers/frontends@2023-11-01' = {
  parent: trafficController
  name: '${agcName}-frontend'
  location: location
  properties: {
    fqdn: 'api.mycompany.com'
  }
}

output trafficControllerId string = trafficController.id
output frontendFqdn string = frontend.properties.fqdn

Deploy:

az deployment group create \
  --resource-group myResourceGroup \
  --template-file infra/modules/agc.bicep \
  --parameters agcName=agc-rag-copilot \
               subnetId=/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Network/virtualNetworks/{vnet}/subnets/agc-subnet

Step 3: Configure Gateway API Resources

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: rag-gateway
  namespace: ai-workloads
spec:
  gatewayClassName: azure-alb-external
  listeners:
  - name: https-listener
    protocol: HTTPS
    port: 443
    hostname: "api.mycompany.com"
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: tls-cert-secret

Step 4: Configure HTTPRoute for Path-Based Routing

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: chat-route
  namespace: ai-workloads
spec:
  parentRefs:
  - name: rag-gateway
  hostnames:
  - "api.mycompany.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /v1/chat
      method: POST
    backendRefs:
    - name: rag-orchestrator-service
      port: 8000

📋 The Index Schema (Minimum Viable)

You cannot implement security trimming without the right fields. Here is the JSON definition for your Azure AI Search index:

{
  "name": "rag-index-v1",
  "fields": [
    { "name": "id", "type": "Edm.String", "key": true },
    { "name": "content", "type": "Edm.String", "searchable": true },
    { "name": "embedding", "type": "Collection(Edm.Single)", "dimensions": 1536, "vectorSearchProfile": "my-profile" },
    { "name": "metadata_storage_path", "type": "Edm.String" },
    { "name": "page_number", "type": "Edm.Int32" },
    { "name": "group_ids", "type": "Collection(Edm.String)", "filterable": true },
    { "name": "classification", "type": "Edm.String", "filterable": true }
  ]
}

🔒 Step 4b: The Critical Security Filter

This single line is the difference between a prototype and production. Your RAG orchestrator MUST inject this filter into every AI Search query:

// The filter every query MUST have
{
  "search": "user question",
  "filter": "group_ids/any(g: search.in(g, 'user_department_id'))"
}

Step 5: The Ingestion Pipeline (The Missing Link)

Most guides skip this, but it's 50% of the work. Networking is plumbing; Data is gold. Here is the robust path from PDF to Index:

Extract (Layout Model): Do not use simple OCR. Use the prebuilt-layout model in Document Intelligence to identify paragraphs, tables, and headlines. Keep the page_number metadata for every span!
Chunk Strategy: Split by "Semantic Paragraph". Do not break sentences. Overlap by 50 tokens.
Embed: Pass chunks to text-embedding-ada-002 to get the 1536-dimensional vector.
Index: Push the JSON payload (Content + Vector + group_ids) to Azure AI Search.

Dev Tip: Run this pipeline efficiently using an Azure Function with an Event Grid trigger on the ADLS `landing/` container.

Step 6: Enable WAF Protection

apiVersion: alb.networking.azure.io/v1
kind: ApplicationLoadBalancerPolicy
metadata:
  name: waf-policy
  namespace: ai-workloads
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: Gateway
    name: rag-gateway
  waf:
    enabled: true
    mode: Prevention
    ruleSetType: OWASP
    ruleSetVersion: "3.2"

3. Cost Analysis: What Will This Actually Cost?

*Estimates based on East US pricing (Jan 2026). Costs may vary by region.

Let's break down the total cost of ownership with real numbers across three deployment scenarios.

Scenario 1: Small Deployment (100 users, 10k queries/month)

Component	Cost/Month
AGC	$92
AKS (2 nodes)	$385
APIM Basic	$150
OpenAI (GPT-3.5)	$150
AI Search Basic	$75
Doc Intelligence	$15
Networking	$100
Storage & Logs	$100
TOTAL	$1,067/month

Scenario 2: Medium Deployment (1,000 users, 100k queries/month)

Component	Cost/Month
AGC	$150
AKS (5 nodes)	$770
APIM Standard	$750
OpenAI (GPT-4)	$1,400
AI Search S1	$250
Doc Intelligence	$150
Networking	$900
Storage & Logs	$350
TOTAL	$4,720/month

Scenario 3: Enterprise Deployment (10,000 users, 1M queries/month)

Component	Cost/Month
AGC	$500
AKS (15 nodes)	$2,310
APIM Premium	$3,000
OpenAI (GPT-4 + PTU)	$8,000
AI Search S2	$1,000
Doc Intelligence	$1,500
Networking	$1,500
Storage & Logs	$1,000
TOTAL	$18,810/month

💡 Cost Optimization Strategies

Reserved Capacity: Save 30-50% on Azure OpenAI PTU for predictable workloads
APIM Semantic Caching: Reduce OpenAI calls by 40-60%
Spot VMs: Save 70-90% on compute costs for batch workloads
Log Analytics Commitment Tiers: Save 15-30% on logging costs

4. 7-Week Migration Guide (NGINX → AGC)

This parallel deployment strategy ensures zero downtime during migration.

Week 1: Preparation

Audit current NGINX configuration
Map Ingress resources to HTTPRoute
Create conversion scripts

Week 2: Deploy AGC

Deploy AGC infrastructure (Bicep)
Deploy Gateway resources in test namespace
Run smoke tests

Week 3: Canary Deployment

Deploy production Gateway
Configure traffic splitting
Route 10% of traffic to AGC

Weeks 4-6: Gradual Migration

Week 4: 25% traffic to AGC
Week 5: 50% traffic to AGC
Week 6: 100% traffic to AGC

Week 7: Decommission NGINX

Verify zero traffic on NGINX
Backup and remove NGINX
Clean up DNS records

🔧 Rollback Procedure

If issues occur, immediately switch DNS back to NGINX. Wait 5-15 minutes for DNS propagation and monitor traffic shift.

5. Monitoring & Observability

Key Metrics to Monitor

// AGC Capacity Utilization
AzureMetrics
| where ResourceProvider == "MICROSOFT.SERVICENETWORKING"
| where MetricName == "CapacityUnits"
| summarize avg(Average), max(Maximum) by bin(TimeGenerated, 5m)

// OpenAI Token Usage
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| extend tokens = toint(parse_json(properties_s).usage.total_tokens)
| summarize sum(tokens) by bin(TimeGenerated, 1h)

Critical Alerts

High Error Rate: >5% for 5 minutes
AGC Capacity: >85% utilization
OpenAI Quota: 429 errors detected
Low Citation Rate: <80% of responses

6. Troubleshooting Runbooks

Runbook 1: 403 Forbidden Errors (Private DNS Issue)

Root Cause: Private DNS zones not linked to AKS VNET

Solution:

az network private-dns link vnet create \
  --resource-group myResourceGroup \
  --zone-name privatelink.openai.azure.com \
  --name aks-vnet-link \
  --virtual-network /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Network/virtualNetworks/aks-vnet

Runbook 2: 429 Too Many Requests (Quota Exhaustion)

Immediate Fix: Increase TPM quota if available

Long-term Solution: Implement APIM retry policy and semantic caching

Runbook 3: Slow Query Performance

Solution: Reduce AI Search top results from 50 to 10, implement parallel processing

7. Testing Strategies

Unit Testing

def test_citation_extraction(orchestrator):
    mock_response = {
        "choices": [{
            "message": {
                "content": "The refund policy is... [doc1.pdf, page 5]"
            }
        }]
    }
    citations = orchestrator.extract_citations(mock_response)
    assert len(citations) == 1
    assert citations[0].document == "doc1.pdf"

Load Testing

# Test with 100 concurrent users
locust -f tests/load/locustfile.py \
  --host https://api.mycompany.com \
  --users 100 \
  --spawn-rate 10 \
  --run-time 10m

8. Complete CI/CD Pipeline

7-stage GitHub Actions workflow for production deployment:

Validate Infrastructure: Bicep validation and what-if analysis
Security Scanning: Trivy, Checkov, APIM policy validation
Build and Test: Unit tests, Docker build, image scanning
Deploy Infrastructure: Bicep deployment to Azure
Deploy Application: Push to ACR, deploy to AKS
Smoke Tests: Health checks, chat endpoint validation
Rollback: Automatic rollback on failure

9. YouTube Videos & Microsoft Learn References

Essential YouTube Videos

Microsoft Learn References

Azure Architecture Center

Application Gateway for Containers

Azure AI Services

Summary

This implementation guide provides everything you need to deploy a production-grade Document Intelligence Copilot using Application Gateway for Containers. You now have:

✅ Complete AGC setup guide (5 steps)
✅ Detailed cost analysis (3 scenarios)
✅ 7-week migration strategy
✅ Monitoring and troubleshooting runbooks
✅ Testing strategies and CI/CD pipeline
✅ Curated resources for deep learning

Back to Architecture Overview: Read the architecture overview and design principles

Document Intelligence Copilot:Complete Implementation Guide

Table of Contents