Azure Auth Series — Blog 8: Production Readiness#

In Blog 7, we put an API gateway in front of three microservices. APIM validates JWTs, forwards claims as headers, and rate-limits users. The system works — but if a service starts failing at 3 AM, nobody knows until users complain. There are no metrics, no alerts, no health checks, and the Notification and Audit services are publicly reachable by anyone who guesses the URL.

This blog adds observability and security hardening. One new Terraform module creates Application Insights, Log Analytics, and alert rules. Three lines of Python give every service automatic distributed tracing. And two Terraform changes lock down internal services and add health probes.

Source code: github.com/MinhQuanBuiSco/Azure/.../08_production

The Azure Auth Series#

Blog	Topic	What You'll Learn
1. Basic Login	Frontend auth	Sign in with Microsoft Entra ID
2. Protected API	Backend auth	Build a FastAPI backend that validates tokens
3. RBAC	Authorization	Control access based on user roles
4. Managed Identity	Zero secrets	Deploy to Azure without storing credentials
5. Multi-Tenant	Organizations	Let users from any org sign in
6. Service-to-Service	OBO + Client Credentials	Authenticate services to each other
7. API Gateway	Centralized auth	APIM validates once, backends trust headers
8. Production Readiness	You are here	Monitoring, alerts, and security hardening

What We're Building#

Architecture#

Blog 7 (gateway, no monitoring):
  Frontend → APIM → Task API → Notification Svc
                              → Audit Svc
  No metrics. No alerts. No health checks.
  Notification + Audit publicly reachable.

Blog 8 (production-ready):
  Frontend → APIM → Task API → Notification Svc (internal only)
                              → Audit Svc (internal only)
                   ↘
             Application Insights ← OpenTelemetry (all services)
                   ↘
             Azure Monitor Alerts → Email notifications

What Changed from Blog 7#

Aspect	Blog 7	Blog 8
Telemetry	None	Application Insights + Log Analytics
Tracing	None	OpenTelemetry auto-instrumentation
APIM diagnostics	None	W3C trace correlation, request/response logging
Alerts	None	High error rate + slow response → email
Health probes	None	Liveness + readiness on all 4 services
Auto-scaling	min=0, max=1	min=1, max=3 replicas
Notification ingress	External (public)	Internal only
Audit ingress	External (public)	Internal only
Python dependency	—	`azure-monitor-opentelemetry`

The App#

The frontend looks and works the same as Blog 7 — sign in, manage tasks, see notification and audit badges. The difference is what happens behind the scenes.

Production Landing Page

Every request now flows through Application Insights. Create a task and you'll see the full trace — APIM gateway → Task API → OBO call to Notification → Client Credentials call to Audit — all correlated with a single trace ID.

Dashboard with Tasks

Step 1: The Monitoring Module — Terraform#

Blog 8 adds one new Terraform module: modules/monitoring/. It creates four resources.

infra/
├── main.tf
└── modules/
    ├── resource_group/
    ├── container_registry/
    ├── container_apps/        ← health probes + internal ingress
    ├── api_management/        ← APIM diagnostics
    └── monitoring/            ← NEW: App Insights + alerts

Log Analytics Workspace#

Every Azure monitoring resource needs a Log Analytics workspace as its backend store. Container Apps also uses it for platform logs.

resource "azurerm_log_analytics_workspace" "this" {
  name                = "${var.project_name}-logs"
  location            = var.location
  resource_group_name = var.resource_group_name
  sku                 = "PerGB2018"
  retention_in_days   = 30
}

Application Insights#

Application Insights is backed by the Log Analytics workspace. All telemetry from APIM and the three backend services flows into this single resource.

resource "azurerm_application_insights" "this" {
  name                = "${var.project_name}-appinsights"
  location            = var.location
  resource_group_name = var.resource_group_name
  application_type    = "web"
  workspace_id        = azurerm_log_analytics_workspace.this.id
}

Alert Rules#

Two metric alerts watch for problems:

# Alert: >10 failed requests in 5 minutes (Severity 2)
resource "azurerm_monitor_metric_alert" "high_error_rate" {
  name                = "${var.project_name}-high-error-rate"
  resource_group_name = var.resource_group_name
  scopes              = [azurerm_application_insights.this.id]
  severity            = 2
  frequency           = "PT1M"
  window_size         = "PT5M"
  description         = "Fires when more than 10 failed requests
                         occur within 5 minutes"

  criteria {
    metric_namespace = "microsoft.insights/components"
    metric_name      = "requests/failed"
    aggregation      = "Count"
    operator         = "GreaterThan"
    threshold        = 10
  }

  action {
    action_group_id = azurerm_monitor_action_group.email.id
  }
}

# Alert: average response time >5s over 5 minutes (Severity 3)
resource "azurerm_monitor_metric_alert" "slow_response" {
  name                = "${var.project_name}-slow-response"
  resource_group_name = var.resource_group_name
  scopes              = [azurerm_application_insights.this.id]
  severity            = 3
  frequency           = "PT1M"
  window_size         = "PT5M"

  criteria {
    metric_namespace = "microsoft.insights/components"
    metric_name      = "requests/duration"
    aggregation      = "Average"
    operator         = "GreaterThan"
    threshold        = 5000
  }

  action {
    action_group_id = azurerm_monitor_action_group.email.id
  }
}

Action Group#

When an alert fires, Azure sends an email:

resource "azurerm_monitor_action_group" "email" {
  name                = "${var.project_name}-alerts"
  resource_group_name = var.resource_group_name
  short_name          = "blog08alert"

  email_receiver {
    name          = "admin"
    email_address = var.alert_email
  }
}

Here's what the alert email looks like when the high error rate rule fires and then resolves:

Azure Monitor Alert Email

The alert description matches exactly what we defined in Terraform: "Fires when more than 10 failed requests occur within 5 minutes."

Step 2: OpenTelemetry — Three Lines of Code#

The biggest change to the Python services is surprisingly small. Each service adds one new config variable and three lines in main.py.

Config#

# config.py — NEW in Blog 8
APPINSIGHTS_CONN_STR = os.getenv(
    "APPLICATIONINSIGHTS_CONNECTION_STRING", ""
)

Auto-Instrumentation#

# main.py — at the very top, before other imports
from config import APPINSIGHTS_CONN_STR

if APPINSIGHTS_CONN_STR:
    from azure.monitor.opentelemetry import configure_azure_monitor
    configure_azure_monitor(connection_string=APPINSIGHTS_CONN_STR)

That's it. azure-monitor-opentelemetry automatically instruments:

FastAPI requests — every HTTP request is traced with duration, status code, and URL
httpx dependency calls — every outgoing HTTP call (OBO to Notification, Client Credentials to Audit) is tracked as a dependency
Exceptions — unhandled errors are captured with full stack traces

The if APPINSIGHTS_CONN_STR: guard means local development still works without Application Insights — just leave the env var empty.

Requirements#

One new dependency per service:

# requirements.txt — NEW in Blog 8
azure-monitor-opentelemetry==1.6.4

This pulls in OpenTelemetry, the Azure Monitor exporter, and auto-instrumentation for FastAPI, httpx, and other common libraries.

Step 3: APIM Diagnostics — Gateway-Level Telemetry#

The backend services send telemetry through OpenTelemetry, but we also want APIM itself to report to Application Insights. This gives us gateway-level metrics: request counts, latency, and error rates before traffic even reaches the backends.

# modules/api_management/main.tf

resource "azurerm_api_management_logger" "appinsights" {
  name                = "appinsights-logger"
  api_management_name = azurerm_api_management.this.name
  resource_group_name = var.resource_group_name

  application_insights {
    instrumentation_key = var.appinsights_instrumentation_key
  }
}

resource "azurerm_api_management_diagnostic" "appinsights" {
  identifier               = "applicationinsights"
  api_management_name      = azurerm_api_management.this.name
  resource_group_name      = var.resource_group_name
  api_management_logger_id = azurerm_api_management_logger
                               .appinsights.id

  sampling_percentage       = 100
  always_log_errors         = true
  log_client_ip             = true
  verbosity                 = "information"
  http_correlation_protocol = "W3C"

  frontend_request {
    body_bytes     = 0
    headers_to_log = ["X-User-OID", "X-Tenant-ID"]
  }

  frontend_response { body_bytes = 0 }
  backend_request   { body_bytes = 0 }
  backend_response  { body_bytes = 0 }
}

Key settings:

Setting	Value	Why
`sampling_percentage`	100	Log every request (fine for dev/low traffic)
`http_correlation_protocol`	W3C	Links APIM traces to backend OpenTelemetry traces
`headers_to_log`	`X-User-OID`, `X-Tenant-ID`	See which user/tenant made each request
`body_bytes`	0	Don't log request/response bodies (security)

The W3C correlation protocol is the key: APIM generates a traceparent header, and OpenTelemetry in the backends picks it up. One trace ID follows the request from APIM → Task API → Notification Service → Audit Service.

Step 4: Security Hardening — Internal Ingress#

In Blog 7, all three backend services had external ingress — anyone on the internet could call them directly. The Notification and Audit services should only be reachable from the Task API (via OBO and Client Credentials), not from the public internet.

Blog 8 changes their ingress to internal:

# Notification Service — Blog 7 vs Blog 8
ingress {
  external_enabled = false   # was: true
  target_port      = 8001
  transport        = "http"
}

# Audit Service — same change
ingress {
  external_enabled = false   # was: true
  target_port      = 8002
  transport        = "http"
}

With external_enabled = false, these services are only reachable from other Container Apps in the same environment. The Task API can still call them (they share a Container App Environment), but nobody outside Azure can.

This matters because when TRUST_GATEWAY=true, backends trust whatever is in the X-User-* headers. If someone could reach the Notification Service directly, they could forge those headers. Internal ingress eliminates that risk.

Step 5: Health Probes and Auto-Scaling#

Health Probes#

Every service exposes a /health endpoint:

@app.get("/health")
async def health():
    return {"status": "healthy", "service": "task-api"}

Terraform configures two probes per Container App:

liveness_probe {
  transport               = "HTTP"
  path                    = "/health"
  port                    = 8000
  initial_delay           = 10
  interval_seconds        = 30
  failure_count_threshold = 3
}

readiness_probe {
  transport               = "HTTP"
  path                    = "/health"
  port                    = 8000
  interval_seconds        = 10
  failure_count_threshold = 3
}

Probe	Purpose	What happens on failure
Liveness	"Is the process alive?"	Container is restarted
Readiness	"Can it handle traffic?"	Container is removed from load balancer

If a service crashes or hangs, Azure detects it within 30 seconds (liveness interval) and restarts the container automatically. No manual intervention needed.

Auto-Scaling#

template {
  min_replicas = 1    # was: 0 in Blog 7
  max_replicas = 3    # was: 1 in Blog 7
}

Setting	Blog 7	Blog 8	Why
`min_replicas`	0	1	Always-on — no cold start latency
`max_replicas`	1	3	Handle traffic spikes

With min_replicas = 1, there's always a warm instance ready. With max_replicas = 3, Container Apps can scale up when load increases and scale back down when it subsides.

Step 6: Wiring It Together — Root Module#

The root main.tf wires the monitoring module into the rest of the infrastructure:

# NEW in Blog 8
module "monitoring" {
  source = "./modules/monitoring"

  project_name        = var.project_name
  location            = module.resource_group.location
  resource_group_name = module.resource_group.name
  alert_email         = var.alert_email
}

module "container_apps" {
  source = "./modules/container_apps"
  # ... existing config ...

  # NEW: share Log Analytics + App Insights with Container Apps
  log_analytics_workspace_id    = module.monitoring
                                    .log_analytics_workspace_id
  appinsights_connection_string = module.monitoring
                                    .connection_string
}

module "api_management" {
  source = "./modules/api_management"
  # ... existing config ...

  # NEW: connect APIM to App Insights
  appinsights_instrumentation_key = module.monitoring
                                      .instrumentation_key
  appinsights_id                  = module.monitoring
                                      .app_insights_id
}

The monitoring module runs before Container Apps (because Container Apps needs the Log Analytics workspace ID) and before APIM (because APIM needs the instrumentation key). Terraform handles the dependency graph automatically.

The Container Apps module passes the App Insights connection string to each service as an environment variable:

env {
  name        = "APPLICATIONINSIGHTS_CONNECTION_STRING"
  secret_name = "appinsights-connection-string"
}

secret {
  name  = "appinsights-connection-string"
  value = var.appinsights_connection_string
}

The connection string is stored as a Container App secret — it never appears in plaintext in the Container App configuration.

Step 7: KQL Queries — Operational Intelligence#

Application Insights stores telemetry in Log Analytics, queryable with KQL (Kusto Query Language). Blog 8 includes five ready-to-use queries in the kql/ directory.

1. Failed Auth Requests#

Detect brute-force attempts or misconfigured clients:

requests
| where resultCode in ("401", "403")
| summarize count() by bin(timestamp, 5m), name, resultCode
| order by timestamp desc

2. Slow Requests#

Find endpoints where P95 latency exceeds 2 seconds:

requests
| summarize percentile(duration, 95) by bin(timestamp, 5m), name
| where percentile_duration_95 > 2000
| order by timestamp desc

3. Error Rate by Service#

Track which microservice is producing the most 5xx errors:

requests
| where toint(resultCode) >= 500
| summarize
    errorCount=count(),
    totalCount=count()
  by bin(timestamp, 5m), cloud_RoleName
| extend errorRate = round(100.0 * errorCount / totalCount, 2)
| order by timestamp desc

4. Top Callers#

Identify the most active users across all endpoints:

requests
| extend userOid = tostring(
    customDimensions["X-User-OID"]
  )
| where isnotempty(userOid)
| summarize requestCount=count() by userOid, name
| top 20 by requestCount desc

5. Dependency Failures#

Track failed service-to-service calls (OBO, Client Credentials):

dependencies
| where success == false
| summarize failureCount=count()
  by bin(timestamp, 5m), target, name, resultCode
| order by timestamp desc

To run these queries: open the Azure Portal → Application Insights → Logs → paste the query → Run.

Application Insights in Action#

Here's what the Application Insights overview looks like after running the system for a while:

Application Insights Overview

The overview shows two key charts:

Failed requests: Spikes when something goes wrong (the alert fires if this exceeds 10 in 5 minutes)
Server response time: Average latency across all services (200ms average in this case)

Step 8: Deployment#

The setup.sh script automates everything. It runs six phases:

./setup.sh

Phase 1: Azure AD Setup
  Create 4 app registrations (same as Blog 7)
  Create 3 test users with role assignments

Phase 2: Terraform
  terraform init && terraform apply
  Provisions: RG + ACR + 4 Container Apps + APIM
    + App Insights + Log Analytics + Alert Rules    ← NEW
  (APIM Developer tier takes ~30-45 minutes)

Phase 3: Docker Build + Push
  Build 4 images with --platform linux/amd64
  Push to ACR

Phase 4: Update Container Apps
  az containerapp update → point to real Docker images

Phase 5: Write .env Files
  task-api/.env: TRUST_GATEWAY=false (local dev)
  frontend/.env.local: API URL = APIM gateway URL

Phase 6: Verify Monitoring                          ← NEW
  Confirm App Insights resource exists
  List configured alert rules
  Show internal-only service URLs

After deployment, the setup script prints a monitoring summary:

==> Application Insights: blog08-prod-appinsights
==> Alert rules configured:
    - High error rate: >10 failed requests in 5 minutes (Severity 2)
    - Slow response: avg >5s over 5 minutes (Severity 3)
    - Alerts sent to: your-email@example.com

==> Security hardening:
    - notification-svc: internal ingress only
    - audit-svc: internal ingress only
    - All services: health probes (liveness + readiness)
    - All services: auto-scaling min=1, max=3

Cleanup#

./cleanup.sh

Destroys all Azure resources to avoid ongoing charges.

How Production Readiness Changes the System#

Without Monitoring (Blog 7)#

User creates a task
  → APIM validates JWT ✓
  → Task API creates task ✓
  → OBO → Notification Service ✓
  → Client Credentials → Audit Service ✓
  → Nobody knows if it's slow or failing ✗

With Monitoring (Blog 8)#

User creates a task
  → APIM logs request to App Insights (W3C trace ID)
  → Task API creates task (traced via OpenTelemetry)
  → OBO → Notification Service (dependency call traced)
  → Client Credentials → Audit Service (dependency call traced)
  → All telemetry correlated under one trace ID
  → If error rate spikes → alert email sent automatically
  → If service crashes → health probe restarts it
  → If traffic spikes → auto-scale to 3 replicas

Common Pitfalls#

1. OpenTelemetry Import Order#

configure_azure_monitor() must be called before importing FastAPI or any instrumented library. If you import FastAPI first, the auto-instrumentation hooks won't be installed.

# ✓ CORRECT — configure before imports
from config import APPINSIGHTS_CONN_STR
if APPINSIGHTS_CONN_STR:
    from azure.monitor.opentelemetry import configure_azure_monitor
    configure_azure_monitor(connection_string=APPINSIGHTS_CONN_STR)

from auth import validate_token   # FastAPI imported inside auth.py

# ✗ WRONG — FastAPI imported before configure
from fastapi import FastAPI
from config import APPINSIGHTS_CONN_STR
if APPINSIGHTS_CONN_STR:
    configure_azure_monitor(...)  # Too late!

2. Internal Ingress Doesn't Mean Private#

external_enabled = false restricts access to the Container App Environment, not to a specific app. Any Container App in the same environment can reach internal services. If you need stricter isolation, use separate environments or network security groups.

3. Alert Thresholds Need Tuning#

The default thresholds (10 failed requests, 5s response time) are reasonable starting points, but every system is different. After running in production for a week, review your actual baselines and adjust. Too sensitive = alert fatigue. Too relaxed = missed incidents.

4. Sampling in Production#

Blog 8 sets sampling_percentage = 100 on APIM diagnostics — every request is logged. For high-traffic production systems, reduce this to 10-25% to control costs and storage. Application Insights charges per GB ingested.

5. Connection String as a Secret#

The Application Insights connection string is stored as a Container App secret, not a plaintext environment variable. This prevents it from appearing in az containerapp show output or the Azure Portal configuration blade.

Cost Considerations#

Resource	Approximate Cost
APIM Developer tier	~$50/month
Application Insights	Free tier (5 GB/month ingestion)
Log Analytics	Free tier (5 GB/month, 31-day retention)
Alert rules	Free (included with Azure Monitor)
Action group (email)	Free
4 Container Apps (0.25 CPU, 0.5 GB, min=1)	~$0.07/hr each when active

The monitoring additions (App Insights, Log Analytics, alerts) are all free-tier eligible. The main cost is still APIM Developer tier at ~$50/month.

Run ./cleanup.sh when you're done testing to avoid charges.

What We Built#

Blog 8 took the Blog 7 gateway setup and made it production-ready:

Layer	What It Does
Application Insights	Centralized telemetry for all services
OpenTelemetry	Auto-instruments FastAPI + httpx (3 lines of code)
APIM Diagnostics	Gateway-level traces with W3C correlation
Alert Rules	Email on error spikes or slow responses
Health Probes	Auto-restart crashed containers
Auto-Scaling	1–3 replicas per service
Internal Ingress	Notification + Audit locked from public internet
KQL Queries	5 ready-to-use operational queries

The entire monitoring layer was added with:

1 new Terraform module (monitoring)
3 lines of Python per service (OpenTelemetry setup)
1 new pip dependency (azure-monitor-opentelemetry)
2 Terraform changes per internal service (ingress + probes)

No application logic changed. The Task API, Notification Service, and Audit Service work exactly the same as Blog 7 — they just report what they're doing now.

Azure Auth Series — Blog 8: Production Readiness