Back to Insights

The UKLifeLabs AI Gateway Pattern

The room is calm until an auditor asks a simple question: "Show me where the data goes, which region it lands in, who can call the model, and what evidence you can produce."

That is the moment most AI pilots fail.

UKLifeLabs doesn’t ship pilots. We ship standards.

This post explains the AI Gateway architecture we use to keep copilots:

Fact check for geography: Azure’s UK regions are UK South and UK West. UKLifeLabs uses these two regions to meet sovereignty and resilience requirements.

Challenge 1: The Perimeter Fallacy

The Stakeholder: "We already pay for Palo Alto and FortiGate. Why are you proposing Application Gateway, Cloudflare, APIM, and Traffic Manager on top?"

The Principle: "Firewalls protect networks. They don’t govern APIs. Our Copilot is an API product, not a web page."

What Palo Alto + FortiGate do best

What they do not replace

Key Takeaway Network security is necessary. API governance is what makes it defensible.

Decision 2: Separation of Concerns (Ingress vs Firewall)

The Architecture: "Palo Alto decides whether a request can enter the estate. Application Gateway decides where the request goes inside Azure, with web-grade controls."

Why Application Gateway "shines" in this pattern

Application Gateway is built for Azure-native Layer 7 concerns:

The Reality: "Firewall rules should move slowly. App routing changes every sprint. Mix them and you create exceptions. Exceptions become bypasses. Bypasses become findings."

Decision 3: Edge Strategy (Cloudflare vs Azure Front Door)

This is not a "feature debate." It is an operating model decision.

What Azure Front Door is good at

Front Door is a strong Azure-native global edge service. For internet-first, globally distributed apps, it can be a great fit.

Why UKLifeLabs often keeps Cloudflare

UKLifeLabs prioritises a few realities common in UK financial services:

  1. Edge standardisation across vendors and clouds
    Cloudflare is often already the approved edge control plane. Replacing it is governance-heavy and slow.
  2. Edge security posture that’s already tuned
    Bot patterns, WAF rules, rate limits, and operational playbooks already exist. Rip-and-replace near go-live is risk.
  3. A clean sovereignty story
    UKLifeLabs wants the processing story to be simple: "We operate within UK West and UK South." Adding an additional global edge layer can complicate narrative and troubleshooting.
Key Takeaway If Cloudflare is the enterprise edge standard, adding another edge (Front Door) often creates overlap, not extra safety.

Pattern 4: Multi-Region High Availability

Because it solves a different job than Cloudflare.

Traffic Manager is DNS-based routing, used to steer clients to regional endpoints based on health and routing policies.

UKLifeLabs uses Traffic Manager to implement:

Cloudflare can load-balance too. Traffic Manager keeps regional failover logic inside the Azure operating model, which helps platform teams, incident response, and auditors.

Defense in Depth: Workforce Zero Trust

Common Question: "We already have Cloudflare. Why do we need Zscaler?"

Because Zscaler is typically about workforce and egress, not public ingress.

What Zscaler adds

The Differentiator: "Cloudflare protects the front door. Zscaler controls how our people reach internal services and what leaves the building."

The Target Architecture: The UKLifeLabs AI Gateway

Lane A: Chat (real-time)

  1. User enters via Cloudflare
  2. Traffic is enforced by Palo Alto (north-south)
  3. Application Gateway (WAF) routes to the right internal entry
  4. APIM enforces identity, product entitlements, quotas, and policy
  5. Backend calls:
    • Azure AI Search (retrieval)
    • Azure OpenAI / Foundry model deployment (generation)
  6. Response returns with citations + correlation IDs

Lane B: Ingestion (batch, controlled)

  1. Ingestion hits a separate APIM Product (different entitlements)
  2. Content is processed:
    • Document Intelligence (structure extraction)
    • chunking
    • indexing into Azure AI Search
  3. Index versions are promoted with approvals, not ad hoc pushes
Key Takeaway UKLifeLabs separates chat and ingestion because they require different controls, owners, and audit evidence.
UKLifeLabs AI Gateway Architecture Diagram
Figure: The high-level AI Gateway pattern separating Chat and Ingestion lanes.

Get the AI Gateway Accelerator

Launch your regulated copilot platform faster. Download the complete implementation suite including:

  • APIM Policy Snippets (Token budgeting, rate limiting)
  • Terraform Modules (App Gateway + WAF)
  • Audit Log Schemas (KQL queries)
Download Accelerator Kit (ZIP)

Governance Model: Corridors (Paths) vs Gates (Products)

Concept: "Paths are corridors. Products are gates."

Paths are corridors

Using only paths like /internal/* and /external/* makes governance soft:

Products are gates

APIM Products create explicit governance boundaries:

Key Takeaway Paths organise traffic. Products govern access. UKLifeLabs needs governance.
APIM Products vs Paths Diagram
Figure: Products act as gates, enforcing policy and quotas per consumer.

Sovereignty Strategy: Regional by Design

UKLifeLabs treats data residency and processing locality as a first-order constraint.

Data Residency and Failover Strategy
Figure: Regional failover logic stays within the Azure operating perimeter.

Why "Regional" beats "Global" for regulated AI

UKLifeLabs standard

Key Takeaway Regional deployments reduce sovereignty risk. Policy makes it enforceable.

Capacity Strategy: Throughput Modes (TPM vs PTU)

This is where pilots die in production.

TPM is a taxi

PTU is a private car

UKLifeLabs’ rule:

Key Takeaway TPM proves value. PTU proves resilience.
TPM vs PTU Comparison
Figure: Choosing the right model hosting tier.

Token Economics Calculator 💰

Estimate your monthly costs to decide: Taxi or Private Car?

1M 50 M 1B+
Taxi (TPM) $500 Pay-as-you-go
Private Car (PTU) $5,600 Provisioned
Result: Stick to the Taxi. Your volume is too low to justify a private fleet.

Pattern: Regulatory-Aware RAG

UKLifeLabs treats Retrieval-Augmented Generation as a controlled pattern, not a dev convenience.

The baseline pattern (the "gold standard" starting point)

What UKLifeLabs adds for financial services regulation

Key Takeaway If you can’t reproduce how an answer was created, it’s not production.

Future-Proofing: The "Agent-Ready" Gateway

We are not just building for chatbots. We are building for Autonomous Agents.

Using the "5 Blocks of AI Agents" pattern (as defined by Sam Bhagwat, Founder of Mastra), this architecture explicitly secures the critical components needed for agentic workflows:

Mapping the Gateway to Agent Architecture
  • Agent "Tools" = APIM Products
    Agents need a "Sandbox" to execute actions safely. APIM provides the governance layer for tool-use, ensuring an agent can't verify a payment without a trace.
  • Agent "Memory" = The Ingestion Lane
    Providing an agent with "long-term memory" isn't magic; it's a disciplined RAG Pipeline. Our Ingestion Lane builds the semantic index that agents read from.
  • Agent "Evals" = Gateway Observability
    You can't optimize what you can't see. The Gateway captures the inputs, outputs, and reasoning steps required to run Systematic Evals on agent performance.

Compliance Strategy: Evidence as a Product

Principle: "Auditors don’t want confidence. They want artifacts."

UKLifeLabs builds an evidence pack using:

Key Takeaway Evidence is designed upfront. If you "add it later," you add risk.

Operational Excellence: DevSecOps for AI

The Standard: "If gateway rules aren’t versioned, tested, and promoted like code, they will drift."

UKLifeLabs standardises:

Controls to Evidence Map
Figure: From Policy to Evidence - The Compliance Chain.

Service Architecture: The Request Journey

Leadership doesn’t buy boxes. They buy services and journeys.

Business Journey: "Regulatory Query to Evidence"

  1. Request intake (who, why, case ID)
  2. Policy decision (is the user allowed, which corpus scope)
  3. Retrieval and reasoning (which sources were used)
  4. Response delivery (answer + citations)
  5. Evidence trail (correlation IDs + versioned artifacts)
Service Map Journey Diagram
Figure: Mapping technology services to the business journey.

Technology services that support the journey

Key Takeaway The "extra layers" stop looking extra when you map them to journey control points.

9. YouTube Videos & Microsoft Learn References

Essential YouTube Videos

Azure API Management Products Explained

Azure APIM Deep Dive (Architecture)

Azure OpenAI Deployment Types

Microsoft Cloud for Financial Services

Architectural Assumptions & Constraints

This pattern is designed under specific constraints common to regulated industries. These boundary conditions define the valid use cases for this architecture.

Constraint Assumption
Sovereignty Data must technically reside and process within UK borders (no global routing).
Connectivity ExpressRoute is the primary ingress; Public Internet is for customer-facing endpoints only.
Identity Azure Entra ID (AAD) is the single source of truth for workforce and application identity.
Encryption TLS 1.2+ is mandatory; all data at rest uses Customer Managed Keys (CMK) where possible.

Key takeaways (UKLifeLabs standards in one view)

References & Further Reading

Join the Conversation

Discuss this architecture pattern on LinkedIn.


Ready to operationalize your Azure journey?

Contact Me View the Toolkit

Spread the Insight

Back to Insights