Back to Insights

Pick the Wrong Compute, Pay Forever: A Practical Azure AI Hosting Decision Tree

Mr. Technical Consultant opened the Agent Dashboard for UKLifeLabs. Yesterday, the demo was clean. Today, one graph was not. Token spend was rising, fast.

Mr. Project Manager pinged: "Plan by Friday. What are we hosting on. AKS or something simpler. One answer."

Mr. Cloud Engineer replied: "Pick the platform. I build today."

Mr. Cloud Architect added: "Landing zone first. Private endpoints, DNS, egress, no shortcuts."

Then Mr. Customer asked the only question that matters: "If this goes viral, do we stay in control, or do we burn money and trust?"

Mr. Technical Consultant knew the trap. Teams pick compute by habit. AI punishes that.

So he wrote one line on the whiteboard:

Compute is not where you run containers. Compute is how you control cost, scale, and blast radius.


The Cast

Upendra Kumar - Lead Architect
Lead Architect (Upendra)

Architecture, standards, trade-offs.

Trinity - Cloud Engineer
Cloud Engineer (Trinity)

Build, automation, operations.

Morpheus - Security Architect
Security Architect (Morpheus)

Identity, data controls, audit.

Upendra Kumar - Technical Consultant
Technical Consultant (Upendra)

Delivery strategy, Operating models, Scale-ready roadmaps.

Mr. Customer
Customer Leadership

Risk acceptance & operating model.

Mr. Project Manager
Project Manager

Decisions, RACI, milestones.


🍽️
The Analogy
The "Restaurant Menu"

Imagine you're at a restaurant. The menu has three options: Full Kitchen Service (AKS), Food Truck (ACA), and Microwave Meal (ACI). Each serves different needs.

If you order the Full Kitchen for a simple sandwich, you're paying for chefs you don't need. If you order the Microwave for a wedding feast, you'll fail. The decision tree is your menu guide—it helps you order what you actually need, not what sounds impressive.

Decision Factors Guide
Cold-Start Tolerance
<1s / 1-5s / >5s
Sub-second needs warm compute ($$$). >5s allows aggressive scale-to-zero ($0).
Runtime Type
real-time / async / batch
Async/Batch is perfect for ACA jobs. Real-time requires detailed scaling rules.
GPU Requirement
now / later / never
"Now" forces ACA or AKS. "Later" means avoiding ACI lock-in today.
Platform Team?
yes / no (AKS)
"No" disqualifies AKS immediately. Do not build what you cannot operate.
Token Quotas
yes / no
"Yes" makes APIM mandatory. Direct access relies on trust, which is risky.
Chargeback Logic
required / optional
Required means you need APIM to track tokens per tenant ID.

The Decision Tree

START
  |
  |-- Do you need Kubernetes-only features?
  |      (service mesh, complex scheduling, multi-tenant cluster governance,
  |       large dedicated GPU pools, strict node-level control)
  |         |-- YES --> AKS
  |         |-- NO  --> continue
  |
  |-- Is traffic bursty or unpredictable?
  |         |-- YES --> Azure Container Apps (ACA)
  |         |-- NO  --> continue
  |
  |-- Is it a simple "run and exit" CPU job?
  |         |-- YES --> ACI (CPU only)
  |         |-- NO  --> ACA by default
            

Thresholds That Stop Debates


🎯 Interactive Decision Navigator

Answer these questions to get your compute recommendation:

Question 1 of 3

Do you need Kubernetes-only features? (service mesh, complex scheduling, multi-tenant cluster governance, large dedicated GPU pools, strict node-level control)

Your Recommendation


🚪
The Analogy
The "Nightclub Bouncer"

Your AI models are like a VIP nightclub. Without a bouncer (APIM), anyone can walk in, order unlimited drinks (tokens), and trash the place (cost spike).

The bouncer checks IDs (JWT validation), enforces drink limits (quotas), and keeps a guest list (audit logs). No bouncer = chaos. No APIM = uncontrolled spend pipe.

The Non-Negotiable Rulebook

Mr. Customer did not care about AKS vs ACA. He cared about control.

So Mr. Technical Consultant drew this:

Clients/Apps
   |
   v
APIM (One Rulebook)
   - Entra ID auth (JWT validation)
   - Token quota + rate limits
   - Tool allowlist (only approved downstream APIs)
   - Audit logs + correlation IDs
   |
   v
Compute (ACA/AKS) -> Model + Data (prefer Private Endpoints)
            

If your apps call model endpoints directly, you do not have an AI platform. You have an uncontrolled spend pipe.

APIM Gateway Architecture
Figure 1: APIM as the single control point for AI model access

Before Scenarios: One APIM Trap to Avoid

People say "APIM internal" without saying what tier and what networking model.

Add this paragraph to prevent wrong builds:

This single clarification prevents weeks of rework.

Landing Zone Networking
Figure 2: Proper VNet integration, private endpoints, and DNS configuration

Five Scenarios. Five Reference Architectures.

Each scenario uses the same template so readers can act.

Scenario 1: Internal-Only Agents (Regulated Enterprise)

When: Employee copilots, internal knowledge agents, regulated data.

Non-negotiable constraints:

Bill of materials:

Critical policies:

What breaks first (plan for it):

90-minute lab outcome: A private agent API in ACA, fronted by APIM, calling a model endpoint through Private Link.

Internal APIM Pattern
Figure 3: Internal-only APIM pattern with VNet integration

Case Study: UKLifeLabs Picks the Right Home for Agentic Workloads

Mr. Project Manager asked, "We need a pilot by Friday. Does it go on the new AKS cluster?"

Mr. Cloud Engineer sighed. "The cluster isn't hardened for external tools yet."

Mr. Cloud Architect intervened. "The workload is bursty. The SRE team is booked."

Mr. Customer just wanted to know, "Can I see the bill per department?"

Mr. Technical Consultant smiled. "We don't need the cluster. We need a container."

Context

UKLifeLabs needed an internal research agent to query sensitive databases and summarize results.

Constraints

Decision

Azure Container Apps (ACA)

Architecture

Why not AKS in Phase 1?

Outcome

Mapping to the Decision Tree

Bursty traffic + No GPUs + No K8s-specific control needs = ACA is the correct default.


Scenario 2: Internet-Facing Customer AI (Public Entry, Private Model/Data)

When: Customer chat, partner agent APIs, public portals.

Non-negotiable constraints:

Bill of materials:

Critical policies:

What breaks first:

90-minute lab outcome: Public endpoint protected by WAF and APIM, backend is private.

Network Isolation
Figure 4: Network isolation between public and private components

🚚
The Analogy
Food Truck vs Full Restaurant

ACA is like a food truck—shows up when there's demand, disappears when it's quiet, you don't manage the kitchen. Perfect for bursty traffic.

AKS is like owning a full restaurant—you control everything (the menu, the chefs, the schedule), but you're paying rent even when it's empty, and you need a chef on-call 24/7. Only worth it if you're serving hundreds of customers daily.

Scenario 3: Hybrid Agents (Azure Runtime + On-Prem Tools)

When: Agent must call on-prem systems (CMDB, ITSM, legacy APIs).

Non-negotiable constraints:

Bill of materials:

What breaks first:

90-minute lab outcome: An agent that calls one on-prem API and one Azure API, both governed by the same APIM rulebook.


Scenario 4: GPU Skills Without Becoming an AKS Platform Team

When: Vision, heavy embeddings, GPU bursts, custom inference.

Non-negotiable constraints:

Bill of materials:

What breaks first:

90-minute lab outcome: A GPU "skill endpoint" behind APIM, autoscaling based on demand.


🔒
The Analogy
The "Secret Tunnel"

Your model is a celebrity living in a gated community (Azure). Public internet is the paparazzi-filled street.

A Private Endpoint is the secret underground tunnel—only authorized people (your apps) can use it, and no one on the street even knows it exists. No photos, no leaks, no unauthorized access.

Scenario 5: Batch Pipelines and Offline Jobs

When: Backfills, re-embedding runs, nightly summarization.

Non-negotiable constraints:

Bill of materials:

What breaks first:

90-minute lab outcome: A job that runs on a schedule, logs output, and exits cleanly.

Basic OpenAI Architecture
Figure 5: End-to-end Azure OpenAI architecture for context

The Moment It Clicked

Mr. Project Manager asked, "What do we tell the client?"

Mr. Technical Consultant wrote the final decision:

Mr. Cloud Architect nodded. "Now I can lock guardrails."

Mr. Cloud Engineer nodded. "Now I can build without rework."

Mr. Customer nodded. "Now I can trust the rollout."


Practical Checklist

  1. Put APIM in front of every model call and tool API.
  2. Use Entra ID at the gateway. Do not distribute model keys.
  3. Enforce token quotas and rate limits per client.
  4. Emit token metrics and correlate requests end-to-end.
  5. Use ACA unless you have a proven AKS platform requirement.
  6. Keep model and data private where required. Get Private DNS right.
  7. Treat DNS, egress, and quotas as production features, not "later".

Download the Implementation Toolkit

Get the complete implementation package including Terraform templates, APIM policies, and architecture diagrams:

Download AI Hosting Toolkit (v2.0)

What's included in v2.0:

  • Terraform Templates (VNet + ACA + OpenAI)
  • CFO-Ready Cost Calculator (.csv)
  • Go-Live Security Checklist (.md)
  • Full Architecture Icons & Labs List
  • 4 production-ready APIM policy XMLs
  • 5 architecture diagrams (PNG + SVG)
  • Comprehensive README with deployment guide
  • Cost optimization tips and troubleshooting

ZIP file (~3 MB) - Free download, no registration required


Resource Vault (Curated, High Signal)

Start Here (3 Links)

Build Labs (6 Links)

Watch

Secure and Scale AI APIs

Secure AI APIs with APIM

Essential security patterns for GenAI gateways.

Learn Live: AKS vs ACA vs ACI

Learn Live: AKS vs ACA vs ACI

Detailed comparison starting at 15:18.

Azure Container Apps Networking

ACA Networking Deep Dive

Virtual networks and security boundaries.


One-Line Takeaway

If you pick compute without constraints and ship agents without a gateway rulebook, you do not have an AI platform. You have a cost leak.

Unblock your cloud strategy. Start shipping.

Contact Me View the Toolkit

Join the Architecture Insider

Get these decision trees delivered to your inbox.

Back to Insights