Back to Insights
Regulator-Ready Series

Regulator-Ready AI: The 4-Subscription AI Fortress


"The regulator didn't care about the demo. They cared about the data residency."

In the "Demo-Ware" world, AI is easy. You grab an API key, throw some JSON at a public endpoint, and celebrate. But when a multi-billion pound life sciences regulator walks into the room, "easy" is a liability.

The Audit Challenge "Your AI looks great, Upendra," Mr. Compliance said, tapping his pen. "But can you guarantee that not one byte of our patient data ever touches a foreign server? And can you prove that even if your web server is breached, the attacker can't sniff our model traffic?"

I didn't give him a slide deck.
I gave him a 4-subscription Terraform plan.

Proven at Scale This architecture was deployed for a 10,000-person UK life sciences company, processing over 50 million AI requests per month while maintaining full MCSB v2 compliance and zero security incidents.

Want the Complete Terraform Code?

Get the full Infrastructure-as-Code with 5 architecture diagrams, MCSB v2 compliance matrix, and step-by-step deployment guides.

View on GitHub

⭐ Star the repo if it helps you build your AI fortress!

The "Private Jet" Pattern (PTU)

Most organizations use Pay-As-You-Go AI. It's like flying commercial—you're at the mercy of the airline's schedule and seat availability. For a regulator, that's not good enough. You need **PTU (Provisioned Throughput Units)**.

PTU is your AI's private jet. It's pre-allocated capacity that stays in one hangar (UK South). You pay for the plane whether it’s in the air or on the ground, but you’re guaranteed a flight. No noisy neighbors. No capacity-exhaustion errors during peak hours.

PTU vs Pay-As-You-Go Analogy and 30/10/10 Split

Figure 0: The "Private Jet" Model - Guaranteed Capacity & Logic Isolation.

The "30/10/10" Capacity Split

But private jets are expensive. To make the CFO smile, we sliced the PTU across three separate deployments. We allocated 30 PTU to Production, 10 PTU to Test, and 10 PTU to Dev on the same OpenAI account. This isolation allows our data scientists to run heavy batch tests during the day without ever impacting the live clinical trials. It's the only way to get "Five Nines" reliability without paying "Five Nines" prices.

High-Level Solution Overview

Figure 1: The Executive Summary - Global Traffic to Secure Spoke.

Network Flow Diagram

Figure 1.1: The Network Blueprint - Connectivity & Isolation.

Explore the Code: All Terraform modules, network configurations, and APIM policies are available on GitHub with detailed deployment guides.

Killing the Public Endpoint

A "Regulator-Ready" platform has zero public internet ingress to the backend. We achieved this by using **Azure Front Door Premium** with **Private Link Service (PLS)**.

Traffic hits the global edge, goes through a heavy-duty WAF, and then—instead of jumping onto the public internet to find the regional gateway—it slips through an **Invisible Underground Tunnel** directly into the private VNet. The outside world sees nothing.

Service Journey Map

Figure 2: The 6-Step Service Journey.

The "Dual-Gateway" Defense

A regulator asks: "Who watches the watcher?" We answered with two distinct API Management gateways.

Gateway 1: The Bouncer (External)

Sitting behind Front Door, this gateway handles the messy business of the outside world: mTLS authentication, subscription key validation, and basic rate limiting. It stops the DDoS attacks before they even smell the VNet.

Gateway 2: The Brain Surgeon (Internal AI Gateway)

This is where the magic happens. Sitting deep inside the private VNet, this second APIM instance talks only to the AI models. It handles the "Smart" logic:

APIM acts as the **VIP Concierge** for the AI model. It doesn't care about the user; it cares about the tokens. It provides a single "Glass Wall" for auditing every model request, enforcing tiered throttling, and ensuring that no pod overstays its welcome in the PTU first-class cabin.

Why can't the External APIM call OpenAI directly?
If the External Gateway skipped the Internal Gateway, we would lose our "Brain." We would bypass the Semantic Cache (Redis), the central PII masking rules, and the global token throttling. The Internal APIM is designed to be the only component with network access to the OpenAI Private Endpoint, forcing all traffic to pass through our AI safety checks.

Security Defense Flow

Figure 2.1: The Security Defense in Depth.

The Policy Guardrails (MCSB v2, NIST, PCI-DSS)

A "fortress" isn't just walls; it's the rules of engagement. We used Azure Policy to enforce the compliance standards automatically:

Beyond blocking bad resources with Azure Policy, we used Microsoft Purview to scan the data inside them. By leveraging Purview's Managed Virtual Network to connect into our Private Endpoints, we automatically classify any PII entering our SQL or Cosmos DB instances without it ever traversing the public internet—making our 'Data Residency' promise a verifiable fact.

Why MCSB v2 for AI Workloads?
Microsoft Cloud Security Benchmark v2 is the evolution of Azure Security Benchmark, adding specific guidance for AI workloads. The AS (AI Security) domain includes controls for securing AI platforms (AS-1: Private OpenAI), monitoring AI applications (AS-2: APIM logging), and protecting training data (AS-3: encrypted storage). This makes MCSB v2 the most comprehensive framework for regulator-ready AI architectures on Azure.

Results After 6 Months

Since deploying this architecture in production, we've achieved measurable success across security, performance, and cost efficiency:

50M+
AI Requests/Month
99.97%
Uptime (vs. 95% PAYG)
0
Security Incidents
72%
Redis Cache Hit Rate
£4,200
Monthly PTU Savings
3/3
Regulatory Audits Passed

How we sustained 99.97%: We built and deployed a custom AI SRE Agent (using Azure OpenAI & Logic Apps)—an autonomous watcher that monitors our telemetry. It auto-triages routine alerts (like transient latency spikes) and executes restart runbooks for non-critical pods, freeing our human engineers to focus on architecture, not firefighting.

"The 4-subscription model gave our CISO the confidence to approve AI in production. The audit trail alone saved us 200 hours during our ISO 27001 certification." — Head of IT Security

Ready to Build Your Own?

The complete Terraform infrastructure that powers these results is open-source and production-ready.

Get the Code Quick Start Guide

Hybrid Identity & Access

Identity is the new perimeter. We didn't create new bespoke credentials for this fortress. We extended the corporate trust:

What Happens When UK South Goes Down?

environment—we don't pay for the compute while it's sitting idle, but the VNet, the security rules, and the SQL Geo-Replication are already there, waiting for the signal.

Note on ACR (Azure Container Registry): The diagram shows ACR Geo-Replication. This ensures our Docker images are automatically synced to the UK West registry. If UK South fails, the standby AKS cluster pulls images locally from UK West, avoiding any cross-region dependency during a crisis.

DR Failover Journey (Executive View)

Figure 3: Automatic Global Failover in Action (PaaS Architecture).

Architectural Design Decisions

Every component in this architecture fought for its place. Here is why we chose them:

1. Why App Gateway in Prod/Non-Prod?

Verdict: Essential. It acts as the Ingress Controller (AGIC) for the AKS clusters. While Front Door handles global traffic, it cannot route to specific Kubernetes pods inside a private VNet. App Gateway bridges that gap securely.

2. Why No App Gateway in the Hub?

Verdict: Redundant. The Hub already hosts the External APIM which sits behind Azure Front Door Premium. Front Door provides the WAF and global caching. Adding another App Gateway in the Hub would just add latency and cost without adding value.

3. Why Redis Cache?

Verdict: Critical. We attached Azure Redis Cache Premium to the Internal APIM for Semantic Caching. If two researchers ask the same question, the AI model isn't even touched—Redis answers instantly from the cache. This saves massive amounts of PTU capacity and reduces latency from seconds to milliseconds.

Redis uses an LRU (Least Recently Used) eviction policy, automatically removing old responses when memory is full. Each response is stored with a hash of the prompt as the cache key, with a TTL of 1 hour. This means identical questions get instant answers, while the cache stays fresh.

4. Why GRS Storage?

Verdict: Compliance. We use Geo-Redundant Storage (GRS) to ensure that if UK South disappears, the audit logs and model artifacts exist in UK West. It underpins our "Passive Cold" DR strategy.

5. What about Key Vault?

Verdict: Mandatory. A fortress needs secure key storage. We use Azure Key Vault to manage the TLS certificates for the Application Gateway and APIM custom domains, as well as storing the API keys for any legacy subsystems that haven't migrated to Managed Identity yet.

Lessons Learned: What We'd Do Differently

No architecture survives first contact with production unchanged. Here are the hard-earned lessons that will save you weeks of troubleshooting:

1. The DNS Resolver Nightmare (3 Days Lost)

The Problem: We initially tried to use the Hub Firewall's DNS proxy feature to resolve privatelink.openai.azure.com. It worked for the Hub VNet but failed spectacularly for the Spoke VNets and on-premises VPN users.

The Fix: Deploy a dedicated Azure Private DNS Resolver in the Hub with inbound and outbound endpoints. Configure conditional forwarding rules for all privatelink.* zones. This added £50/month but saved our sanity.

Time Saved: If you do this from day one, you'll avoid the 3-day DNS debugging marathon we endured.

2. Redis Cache Warm-Up (Cold Start Pain)

The Problem: On day one, Redis was empty. Every single AI request was a cache miss, which meant zero performance benefit and confused stakeholders asking "Where's the 98% latency reduction?"

The Fix: We pre-populated the cache with the top 100 most common queries from our test environment. We also adjusted expectations: cache hit rates start at 0% and climb to 70-80% over 2-3 weeks as the cache learns usage patterns.

Lesson: Set realistic expectations. Redis is an investment that pays dividends over time, not instantly.

3. The "4-Subscription Tax" (Operational Overhead)

The Problem: Managing 4 separate Azure subscriptions (Shared Hub, Prod, Non-Prod, DR) created unexpected operational friction. Each subscription needed its own RBAC assignments, budgets, and policy exemptions.

The Fix: We created a Management Group hierarchy to apply policies at scale and used Azure Lighthouse for centralized monitoring across all subscriptions. We also standardized naming conventions religiously.

Lesson: The 4-subscription model is powerful for isolation, but you MUST invest in governance automation (Terraform, Azure Policy, Management Groups) or you'll drown in manual work.

The Engineering Reality Check (No Paper Architects Allowed)

Diagrams are easy. Production is hard. Here are the three "ugly truths" about this architecture that you need to know before you build it:


Skip the Manual Steps

Instead of clicking through the Azure Portal, deploy the entire infrastructure with Terraform in ~45 minutes.

View Terraform Code

Deployment Runbook: Start Here

Don't just read about it. Deploy it.

# 1. Clone the Regulator-Ready Repo

git clone https://github.com/appliedailearner/uklifelabsaisolution.git

# 2. Login to Azure

az login

# 3. Apply the Terraform Blueprints

cd infra && terraform init && terraform apply

The Junior Engineer's Glossary

Too many acronyms? Here is your cheat sheet:

Operationalize Your AI Journey

I help organizations turn stalled cloud initiatives into high-performance execution engines.

Contact Me View the Repository

Build Your AI Fortress

Get the complete Terraform infrastructure, compliance matrix, deployment guides, and architecture diagrams.

View on GitHub Fork Repository

Join 100+ engineers who have starred this repo

Back to Insights