NGINX is retiring. The AKS Application Routing Add-on has an expiration date. If you are building for 2026 on legacy ingress controllers, you are building technical debt. It is time to standardize on Application Gateway for Containers (AGC).
Prerequisites: This guide assumes you've read the architecture overview. This is the deep-dive implementation guide with complete setup steps, cost breakdowns, and migration strategies.
Table of Contents
- Architecture Evolution: Why Application Gateway for Containers
- Complete AGC Setup Guide (5 Steps)
- Cost Analysis: $1K to $19K/Month Scenarios
- 7-Week Migration Guide (NGINX → AGC)
- Monitoring & Observability
- Troubleshooting Runbooks
- Testing Strategies
- Complete CI/CD Pipeline
- YouTube Videos & Microsoft Learn References
1. Architecture Evolution: Why Application Gateway for Containers
In late 2024, the Kubernetes community announced that the ingress-nginx controller will be
retired in March 2026. This affects millions of production deployments worldwide.
Timeline
2024 ────────── 2025 ────────── 2026 ────────── 2027+
│ │ │ │
├─ AGC GA ├─ NGINX ├─ Gateway API ├─ Full
│ (Available) │ Retirement │ (Istio) │ Migration
│ │ (March) │ Available │ Complete
Microsoft's Strategic Direction
Microsoft is investing in two future-proof solutions:
1. Application Gateway for Containers (AGC) - Available Now
- Azure-native Layer 7 load balancer
- Kubernetes Gateway API support
- Built-in WAF and security features
- Direct pod communication (better performance)
- Managed by Microsoft (no manual updates)
2. Gateway API with Istio - Coming H1 2026
- Kubernetes Gateway API implementation
- Powered by Istio control plane
- Advanced traffic management
- Service mesh capabilities
This guide uses Application Gateway for Containers (AGC) as the long-term, production-ready solution.
2. Complete AGC Setup Guide
Follow these 5 steps to deploy Application Gateway for Containers for your RAG Copilot.
Prerequisites
# Azure CLI version 2.50.0 or later
az version
# AKS cluster with managed identity
az aks show -g myResourceGroup -n myAKSCluster --query identity
# Required permissions
az role assignment list --assignee <your-identity> --scope <aks-resource-id>
Required Roles:
Contributoron the AKS clusterNetwork Contributoron the VNETManaged Identity Operatoron the AKS managed identity
Step 1: Enable the AGC Add-on
# Register the feature (if not already registered)
az feature register \
--namespace Microsoft.ContainerService \
--name AKS-ExtensionManager
# Wait for registration (check status)
az feature show \
--namespace Microsoft.ContainerService \
--name AKS-ExtensionManager
# Enable the add-on
az aks enable-addons \
--resource-group myResourceGroup \
--name myAKSCluster \
--addons azure-application-gateway-for-containers
# Verify installation
kubectl get pods -n azure-alb-system
Expected Output:
NAME READY STATUS RESTARTS AGE
alb-controller-7d4b8c9f5d-x7k2m 1/1 Running 0 2m
alb-controller-bootstrap-xyz123 0/1 Completed 0 3m
Step 2: Deploy the Traffic Controller (Bicep)
Create the AGC infrastructure using Bicep:
@description('Name of the Application Gateway for Containers')
param agcName string
@description('Location for all resources')
param location string = resourceGroup().location
@description('Subnet ID for AGC')
param subnetId string
// Application Gateway for Containers (Traffic Controller)
resource trafficController 'Microsoft.ServiceNetworking/trafficControllers@2023-11-01' = {
name: agcName
location: location
properties: {
associations: [
{
subnet: {
id: subnetId
}
}
]
}
}
// Frontend configuration
resource frontend 'Microsoft.ServiceNetworking/trafficControllers/frontends@2023-11-01' = {
parent: trafficController
name: '${agcName}-frontend'
location: location
properties: {
fqdn: 'api.mycompany.com'
}
}
output trafficControllerId string = trafficController.id
output frontendFqdn string = frontend.properties.fqdn
Deploy:
az deployment group create \
--resource-group myResourceGroup \
--template-file infra/modules/agc.bicep \
--parameters agcName=agc-rag-copilot \
subnetId=/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.Network/virtualNetworks/{vnet}/subnets/agc-subnet
Step 3: Configure Gateway API Resources
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: rag-gateway
namespace: ai-workloads
spec:
gatewayClassName: azure-alb-external
listeners:
- name: https-listener
protocol: HTTPS
port: 443
hostname: "api.mycompany.com"
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: tls-cert-secret
Step 4: Configure HTTPRoute for Path-Based Routing
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: chat-route
namespace: ai-workloads
spec:
parentRefs:
- name: rag-gateway
hostnames:
- "api.mycompany.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/chat
method: POST
backendRefs:
- name: rag-orchestrator-service
port: 8000
📋 The Index Schema (Minimum Viable)
You cannot implement security trimming without the right fields. Here is the JSON definition for your Azure AI Search index:
{
"name": "rag-index-v1",
"fields": [
{ "name": "id", "type": "Edm.String", "key": true },
{ "name": "content", "type": "Edm.String", "searchable": true },
{ "name": "embedding", "type": "Collection(Edm.Single)", "dimensions": 1536, "vectorSearchProfile": "my-profile" },
{ "name": "metadata_storage_path", "type": "Edm.String" },
{ "name": "page_number", "type": "Edm.Int32" },
{ "name": "group_ids", "type": "Collection(Edm.String)", "filterable": true },
{ "name": "classification", "type": "Edm.String", "filterable": true }
]
}
🔒 Step 4b: The Critical Security Filter
This single line is the difference between a prototype and production. Your RAG orchestrator MUST inject this filter into every AI Search query:
// The filter every query MUST have
{
"search": "user question",
"filter": "group_ids/any(g: search.in(g, 'user_department_id'))"
}
Step 5: The Ingestion Pipeline (The Missing Link)
Most guides skip this, but it's 50% of the work. Networking is plumbing; Data is gold. Here is the robust path from PDF to Index:
- Extract (Layout Model): Do not use simple OCR. Use the
prebuilt-layoutmodel in Document Intelligence to identify paragraphs, tables, and headlines. Keep thepage_numbermetadata for every span! - Chunk Strategy: Split by "Semantic Paragraph". Do not break sentences. Overlap by 50 tokens.
- Embed: Pass chunks to
text-embedding-ada-002to get the 1536-dimensional vector. - Index: Push the JSON payload (Content + Vector +
group_ids) to Azure AI Search.
Dev Tip: Run this pipeline efficiently using an Azure Function with an Event Grid trigger on the ADLS `landing/` container.
Step 6: Enable WAF Protection
apiVersion: alb.networking.azure.io/v1
kind: ApplicationLoadBalancerPolicy
metadata:
name: waf-policy
namespace: ai-workloads
spec:
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: rag-gateway
waf:
enabled: true
mode: Prevention
ruleSetType: OWASP
ruleSetVersion: "3.2"
3. Cost Analysis: What Will This Actually Cost?
*Estimates based on East US pricing (Jan 2026). Costs may vary by region.
Let's break down the total cost of ownership with real numbers across three deployment scenarios.
Scenario 1: Small Deployment (100 users, 10k queries/month)
| Component | Cost/Month |
|---|---|
| AGC | $92 |
| AKS (2 nodes) | $385 |
| APIM Basic | $150 |
| OpenAI (GPT-3.5) | $150 |
| AI Search Basic | $75 |
| Doc Intelligence | $15 |
| Networking | $100 |
| Storage & Logs | $100 |
| TOTAL | $1,067/month |
Scenario 2: Medium Deployment (1,000 users, 100k queries/month)
| Component | Cost/Month |
|---|---|
| AGC | $150 |
| AKS (5 nodes) | $770 |
| APIM Standard | $750 |
| OpenAI (GPT-4) | $1,400 |
| AI Search S1 | $250 |
| Doc Intelligence | $150 |
| Networking | $900 |
| Storage & Logs | $350 |
| TOTAL | $4,720/month |
Scenario 3: Enterprise Deployment (10,000 users, 1M queries/month)
| Component | Cost/Month |
|---|---|
| AGC | $500 |
| AKS (15 nodes) | $2,310 |
| APIM Premium | $3,000 |
| OpenAI (GPT-4 + PTU) | $8,000 |
| AI Search S2 | $1,000 |
| Doc Intelligence | $1,500 |
| Networking | $1,500 |
| Storage & Logs | $1,000 |
| TOTAL | $18,810/month |
💡 Cost Optimization Strategies
- Reserved Capacity: Save 30-50% on Azure OpenAI PTU for predictable workloads
- APIM Semantic Caching: Reduce OpenAI calls by 40-60%
- Spot VMs: Save 70-90% on compute costs for batch workloads
- Log Analytics Commitment Tiers: Save 15-30% on logging costs
4. 7-Week Migration Guide (NGINX → AGC)
This parallel deployment strategy ensures zero downtime during migration.
Week 1: Preparation
- Audit current NGINX configuration
- Map Ingress resources to HTTPRoute
- Create conversion scripts
Week 2: Deploy AGC
- Deploy AGC infrastructure (Bicep)
- Deploy Gateway resources in test namespace
- Run smoke tests
Week 3: Canary Deployment
- Deploy production Gateway
- Configure traffic splitting
- Route 10% of traffic to AGC
Weeks 4-6: Gradual Migration
- Week 4: 25% traffic to AGC
- Week 5: 50% traffic to AGC
- Week 6: 100% traffic to AGC
Week 7: Decommission NGINX
- Verify zero traffic on NGINX
- Backup and remove NGINX
- Clean up DNS records
🔧 Rollback Procedure
If issues occur, immediately switch DNS back to NGINX. Wait 5-15 minutes for DNS propagation and monitor traffic shift.
5. Monitoring & Observability
Key Metrics to Monitor
// AGC Capacity Utilization
AzureMetrics
| where ResourceProvider == "MICROSOFT.SERVICENETWORKING"
| where MetricName == "CapacityUnits"
| summarize avg(Average), max(Maximum) by bin(TimeGenerated, 5m)
// OpenAI Token Usage
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| extend tokens = toint(parse_json(properties_s).usage.total_tokens)
| summarize sum(tokens) by bin(TimeGenerated, 1h)
Critical Alerts
- High Error Rate: >5% for 5 minutes
- AGC Capacity: >85% utilization
- OpenAI Quota: 429 errors detected
- Low Citation Rate: <80% of responses
6. Troubleshooting Runbooks
Runbook 1: 403 Forbidden Errors (Private DNS Issue)
Root Cause: Private DNS zones not linked to AKS VNET
Solution:
az network private-dns link vnet create \
--resource-group myResourceGroup \
--zone-name privatelink.openai.azure.com \
--name aks-vnet-link \
--virtual-network /subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Network/virtualNetworks/aks-vnet
Runbook 2: 429 Too Many Requests (Quota Exhaustion)
Immediate Fix: Increase TPM quota if available
Long-term Solution: Implement APIM retry policy and semantic caching
Runbook 3: Slow Query Performance
Solution: Reduce AI Search top results from 50 to 10, implement parallel processing
7. Testing Strategies
Unit Testing
def test_citation_extraction(orchestrator):
mock_response = {
"choices": [{
"message": {
"content": "The refund policy is... [doc1.pdf, page 5]"
}
}]
}
citations = orchestrator.extract_citations(mock_response)
assert len(citations) == 1
assert citations[0].document == "doc1.pdf"
Load Testing
# Test with 100 concurrent users
locust -f tests/load/locustfile.py \
--host https://api.mycompany.com \
--users 100 \
--spawn-rate 10 \
--run-time 10m
8. Complete CI/CD Pipeline
7-stage GitHub Actions workflow for production deployment:
- Validate Infrastructure: Bicep validation and what-if analysis
- Security Scanning: Trivy, Checkov, APIM policy validation
- Build and Test: Unit tests, Docker build, image scanning
- Deploy Infrastructure: Bicep deployment to Azure
- Deploy Application: Push to ACR, deploy to AKS
- Smoke Tests: Health checks, chat endpoint validation
- Rollback: Automatic rollback on failure
9. YouTube Videos & Microsoft Learn References
Essential YouTube Videos
RAG at scale with Azure AI Search
Azure AI Document Intelligence: extraction patterns
Azure API Management: token control and guardrails
APIM as AI Gateway: securing and scaling AI APIs
Microsoft Learn References
Azure Architecture Center
Application Gateway for Containers
Azure AI Services
Summary
This implementation guide provides everything you need to deploy a production-grade Document Intelligence Copilot using Application Gateway for Containers. You now have:
- ✅ Complete AGC setup guide (5 steps)
- ✅ Detailed cost analysis (3 scenarios)
- ✅ 7-week migration strategy
- ✅ Monitoring and troubleshooting runbooks
- ✅ Testing strategies and CI/CD pipeline
- ✅ Curated resources for deep learning
Back to Architecture Overview: Read the architecture overview and design principles