Back
View source
AI Engineering··16 min

Building Production LLM Guardrails: A Complete Guide to Input/Output Safety with Azure

Learn how to build production-ready LLM guardrails using FastAPI and Azure. Implement 5 essential guardrail layers: prompt injection detection, jailbreak prevention, PII masking, secret scanning, and content filtering with comprehensive audit logging.

Building Production LLM Guardrails: A Complete Guide to Input/Output Safety with Azure#

As LLMs become central to enterprise applications, implementing robust guardrails is no longer optional—it's essential. Prompt injection, jailbreak attacks, PII leakage, and credential exposure are no longer theoretical risks—they're happening in production right now.

In this tutorial, I'll show you how to build production-ready LLM guardrails that sit between your applications and LLM providers, providing comprehensive safety scanning for every request and response.

By the end, you'll have a fully functional guardrails system with 5 layers of protection:

  • Input Guardrail #1: Detects and blocks prompt injection attacks using pattern matching and heuristics
  • Input Guardrail #2: Identifies jailbreak attempts including DAN, STAN, and other known techniques
  • Input Guardrail #3: Automatically detects and masks PII (emails, SSNs, phone numbers, credit cards)
  • Input Guardrail #4: Scans for leaked secrets (API keys, passwords, connection strings)
  • Input Guardrail #5: Provides content filtering for harmful requests
  • Audit Layer: Maintains a complete audit trail of all requests

Live Demo#

Here's what the Guardrails dashboard looks like in action:

LLM Guardrails Dashboard

The dashboard provides real-time visibility into request volumes, blocked threats, and guardrail metrics with time-range filtering.


Why You Need LLM Guardrails#

The Safety Challenge#

LLM applications face unique safety challenges that traditional API gateways don't address:

ThreatGuardrail NeededImpact if Unguarded
Prompt InjectionInput validation guardrailComplete bypass of application logic
Jailbreak AttacksJailbreak detection guardrailGeneration of harmful content
PII LeakagePII masking guardrailPrivacy violations, GDPR/CCPA fines
Secret ExposureSecret scanning guardrailAccount compromise, data breach
Harmful ContentContent filtering guardrailReputation damage, legal liability

What Makes These Guardrails Enterprise-Ready?#

  • 5-Layer Protection: Five independent guardrails working in parallel
  • Configurable Policies: Block, filter, warn, or allow based on threat severity
  • PII Masking: Automatically redact sensitive data before it reaches the LLM
  • Audit Logging: Complete traceability stored in Azure Cosmos DB
  • Rate Limiting: Prevent abuse with Redis-backed rate limiting
  • OpenAI-Compatible API: Drop-in replacement for existing integrations

Guardrails Architecture#

High-Level Overview#

┌─────────────────────────────────────────────────────────────────┐
│                      Client Application                          │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ↓
┌─────────────────────────────────────────────────────────────────┐
│                     LLM GUARDRAILS LAYER                         │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │              INPUT GUARDRAILS (5 Layers)                  │  │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────┐ │  │
│  │  │Guardrail│ │Guardrail│ │Guardrail│ │Guardrail│ │Guard│ │  │
│  │  │   #1    │ │   #2    │ │   #3    │ │   #4    │ │ #5  │ │  │
│  │  │Injection│ │Jailbreak│ │   PII   │ │ Secrets │ │Cont.│ │  │
│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────┘ │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                │                                 │
│                    ┌───────────┴───────────┐                    │
│                    │  Action Determination  │                    │
│                    │  BLOCK / FILTER / WARN │                    │
│                    └───────────┬───────────┘                    │
│                                │                                 │
│  ┌─────────────────────────────┴─────────────────────────────┐  │
│  │                    Audit Logging                          │  │
│  │              Azure Cosmos DB / Redis Cache                │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ↓
┌─────────────────────────────────────────────────────────────────┐
│                    Azure AI Foundry (GPT-4o)                     │
└─────────────────────────────────────────────────────────────────┘

Request Flow#

  1. Client sends request to /v1/chat/completions (OpenAI-compatible)
  2. Input guardrails run all 5 protection layers in parallel
  3. Action determination based on threat severity and configured policies
  4. If blocked: Return error with reason (request never reaches LLM)
  5. If filtered: Apply transformations (PII masking) before forwarding
  6. Forward to LLM: Send processed request to Azure AI Foundry
  7. Output guardrails: Optionally scan LLM output for sensitive data
  8. Audit logging: Record full request/response with guardrail results
  9. Return response to client

The 5 Guardrail Layers#

Guardrail #1: Prompt Injection Detection#

The first guardrail uses pattern matching to identify attempts to override system instructions:

class PromptInjectionDetector:
    def __init__(self):
        self.patterns = [
            # Instruction Override
            (r"ignore\s+(all\s+)?(previous|above|prior)\s+(instructions|prompts|context)",
             "Instruction override attempt", 0.8),

            # Developer Mode
            (r"(developer|debug|admin|root)\s+mode",
             "Developer mode request", 0.7),

            # System Prompt Extraction
            (r"(show|reveal|display|output|print)\s+(me\s+)?(your|the)\s+(system|initial)\s+(prompt|instructions)",
             "System prompt extraction attempt", 0.9),

            # Role Manipulation
            (r"(act|behave|respond)\s+as\s+(if\s+)?(you\s+)?(are|were)\s+",
             "Role manipulation attempt", 0.6),

            # Delimiter Injection
            (r"(```|---|\*\*\*|###)\s*(system|instructions|prompt)",
             "Delimiter injection attempt", 0.7),
        ]

    def detect(self, text: str) -> ScanResult:
        findings = []
        for pattern, description, severity in self.patterns:
            matches = re.findall(pattern, text, re.IGNORECASE)
            if matches:
                findings.append({
                    "type": "prompt_injection",
                    "description": description,
                    "severity": severity,
                    "match": matches[0] if isinstance(matches[0], str) else matches[0][0]
                })

        return ScanResult(
            detected=len(findings) > 0,
            findings=findings,
            risk_score=max(f["severity"] for f in findings) if findings else 0.0
        )

Detected Patterns Include:

  • Instruction override attempts ("Ignore all previous instructions")
  • Developer/debug mode requests
  • System prompt extraction ("Show me your instructions")
  • Role manipulation ("Act as if you are...")
  • Delimiter injection (using markdown to inject fake system prompts)
  • Encoding obfuscation (Base64, ROT13)

Guardrail #2: Jailbreak Detection#

This guardrail identifies known jailbreak techniques and signatures:

class JailbreakDetector:
    KNOWN_JAILBREAKS = [
        # DAN (Do Anything Now)
        "DAN", "Do Anything Now", "STAN", "DUDE",
        # Developer Mode
        "Developer Mode", "Maintenance Mode",
        # Evil Mode
        "Evil Mode", "Opposite Mode", "Bad AI",
    ]

    JAILBREAK_PATTERNS = [
        # Persona Jailbreaks
        (r"you\s+are\s+now\s+[a-z]+\s+with\s+no\s+restrictions",
         "Persona jailbreak with no restrictions", 0.9),

        # Two-Response Attacks
        (r"give\s+me\s+two\s+responses.*one\s+(censored|normal).*one\s+(uncensored|unrestricted)",
         "Two-response jailbreak attempt", 0.85),

        # Grandma Exploit
        (r"(grandma|grandmother).*used\s+to\s+(tell|read|say)",
         "Social engineering (grandma exploit)", 0.7),

        # Token Smuggling
        (r"\\u[0-9a-fA-F]{4}",
         "Unicode escape sequence (token smuggling)", 0.6),
    ]

Example Detection:

Prompt Injection Blocked

The guardrails blocked a prompt injection attempt with 76% overall risk score, detecting the "Ignore all previous instructions" pattern.

Guardrail #3: PII Detection & Masking#

This guardrail uses Microsoft Presidio with spaCy NLP for entity recognition:

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

class PIIDetector:
    ENTITIES = [
        "PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER",
        "CREDIT_CARD", "US_SSN", "US_BANK_NUMBER", "IP_ADDRESS"
    ]

    def __init__(self):
        self.analyzer = AnalyzerEngine()
        self.anonymizer = AnonymizerEngine()

    def detect_and_mask(self, text: str) -> tuple[str, list[PIIEntity]]:
        # Detect PII entities
        results = self.analyzer.analyze(
            text=text,
            entities=self.ENTITIES,
            language="en"
        )

        # Mask detected entities
        masked_text = self.anonymizer.anonymize(
            text=text,
            analyzer_results=results
        ).text

        return masked_text, results

PII Guardrail in Action:

PII Detection and Masking

The guardrail detected 4 PII entities (email, names, phone number) and automatically masked them before forwarding to the LLM. The response shows placeholders like <EMAIL_ADDRESS> instead of actual data.

Risk Scoring by Entity Type:

Entity TypeRisk Weight
US_SSN1.0
CREDIT_CARD1.0
US_BANK_NUMBER0.8
EMAIL_ADDRESS0.5
PHONE_NUMBER0.5
IP_ADDRESS0.4
PERSON0.3

Guardrail #4: Secret Scanning#

This guardrail detects API keys, passwords, and credentials using precise regex patterns:

class SecretScanner:
    SECRET_PATTERNS = [
        # AWS Credentials
        (r"AKIA[0-9A-Z]{16}", "AWS Access Key", 0.95),
        (r"aws_secret_access_key\s*=\s*['\"]?([A-Za-z0-9/+=]{40})", "AWS Secret Key", 1.0),

        # Azure Credentials
        (r"[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}",
         "Azure Client ID/Subscription ID", 0.6),

        # API Keys
        (r"sk-[a-zA-Z0-9]{48}", "OpenAI API Key", 0.95),
        (r"sk-ant-[a-zA-Z0-9-]{95}", "Anthropic API Key", 0.95),
        (r"ghp_[a-zA-Z0-9]{36}", "GitHub Personal Access Token", 0.9),

        # Generic Patterns
        (r"['\"]?password['\"]?\s*[:=]\s*['\"]([^'\"]{8,})['\"]", "Password", 0.8),
        (r"Bearer\s+[a-zA-Z0-9._-]{20,}", "Bearer Token", 0.85),

        # Connection Strings
        (r"mongodb(\+srv)?://[^\s]+", "MongoDB Connection String", 0.9),
        (r"postgresql://[^\s]+", "PostgreSQL Connection String", 0.9),
        (r"redis://[^\s]+", "Redis Connection String", 0.85),

        # Private Keys
        (r"-----BEGIN\s+(RSA|DSA|EC|PGP)\s+PRIVATE\s+KEY-----", "Private Key", 1.0),
    ]

    def scan(self, text: str) -> list[SecretFinding]:
        findings = []
        for pattern, secret_type, severity in self.SECRET_PATTERNS:
            matches = re.finditer(pattern, text, re.IGNORECASE)
            for match in matches:
                findings.append(SecretFinding(
                    type=secret_type,
                    severity=severity,
                    location=(match.start(), match.end()),
                    redacted=self._redact(match.group(), secret_type)
                ))
        return findings

Redaction Strategy:

Secret TypeRedaction Example
AWS Access KeyAKIA****... (first 4 chars)
OpenAI API Keysk-****...**** (prefix + suffix)
Private Key-----BEGIN RSA... + [REDACTED]
Password**** (fully redacted)

Guardrail #5: Content Filtering#

The final guardrail enforces safety policies using both local patterns and Azure AI Content Safety:

class ContentFilter:
    HARMFUL_PATTERNS = [
        (r"how\s+to\s+(make|create|build)\s+a?\s*(bomb|explosive|weapon)",
         "Harmful instructions request", "high"),
        (r"(hack|break\s+into|exploit)\s+(the|a)?\s*(website|server|system)",
         "Malicious activity request", "high"),
        (r"(write|create|generate)\s+(malware|ransomware|virus|trojan)",
         "Malware creation request", "critical"),
    ]

    async def check_with_azure(self, text: str) -> ContentSafetyResult:
        """Use Azure AI Content Safety for additional analysis."""
        response = await self.content_safety_client.analyze_text(
            AnalyzeTextOptions(text=text)
        )

        categories = {
            "Hate": response.hate_result.severity,
            "Violence": response.violence_result.severity,
            "Sexual": response.sexual_result.severity,
            "SelfHarm": response.self_harm_result.severity,
        }

        # Block if any category exceeds threshold
        should_block = any(
            severity >= self.blocking_threshold
            for severity in categories.values()
        )

        return ContentSafetyResult(
            categories=categories,
            should_block=should_block
        )

Guardrails Playground#

The Guardrails Playground allows you to test various attack scenarios:

Guardrails Playground

The interactive playground provides pre-built test cases for injection attacks, jailbreaks, PII exposure, and secret leakage.

Test Categories:

  • Injection: Instruction override, delimiter injection, role manipulation
  • Jailbreak: DAN, STAN, developer mode, persona attacks
  • PII: Personal information, financial data
  • Secrets: API keys, passwords, connection strings
  • Benign: Normal queries to test false positive rates

Audit Logging#

Every request is logged to Azure Cosmos DB with full context:

Audit Logs

The audit log view shows all requests with status, detected threats, PII presence, token usage, and response times.

Audit Record Schema:

class AuditRecord(BaseModel):
    request_id: str           # Unique identifier
    timestamp: datetime       # Request time
    endpoint: str            # API endpoint called
    model: str               # LLM model used
    status: str              # allowed/blocked/filtered/error

    # Security Results
    threats_detected: list[ThreatInfo]
    pii_detected: list[PIIInfo]
    secrets_detected: list[SecretInfo]

    # Metrics
    input_tokens: int
    output_tokens: int
    response_time_ms: int

    # Request Context
    client_ip: str
    api_key_id: str          # Hashed API key
    user_id: Optional[str]

Guardrail Configuration & Policies#

Each guardrail supports configurable policies per threat type:

Settings Page

The settings page shows all enabled guardrails and current configuration.

Guardrail Policy Configuration:

class SecurityPolicies(BaseModel):
    prompt_injection: PolicyConfig = PolicyConfig(
        action="block",
        severity_threshold="medium",
        message="Potential prompt injection detected. Request blocked."
    )

    jailbreak: PolicyConfig = PolicyConfig(
        action="block",
        severity_threshold="high",
        message="Jailbreak attempt detected. Request blocked."
    )

    pii_detection: PolicyConfig = PolicyConfig(
        action="filter",  # Mask PII instead of blocking
        severity_threshold="low",
        message="PII detected and masked."
    )

    secret_scanning: PolicyConfig = PolicyConfig(
        action="block",
        severity_threshold="critical",
        message="Credentials detected. Request blocked."
    )

    content_filter: PolicyConfig = PolicyConfig(
        action="block",
        severity_threshold="medium",
        message="Content policy violation. Request blocked."
    )

Action Types:

ActionBehavior
blockReject request immediately, return error
filterApply transformations (masking), continue
warnLog warning, allow request to proceed
allowTake no action, just log

API Endpoints#

OpenAI-Compatible Endpoints#

# Chat Completions (with security scanning)
POST /v1/chat/completions
Content-Type: application/json

{
  "model": "gpt-4o",
  "messages": [
    {"role": "user", "content": "Hello, how are you?"}
  ]
}

# Text Completions (with security scanning)
POST /v1/completions

Security-Specific Endpoints#

# Standalone Security Scan (no LLM call)
POST /v1/security/scan
{
  "text": "Your text to scan...",
  "include_details": true
}

# Query Audit Logs
GET /api/audit?status=blocked&start_date=2024-01-01

# Get Analytics Summary
GET /api/audit/summary?period=24h

# Threat Analytics
GET /api/analytics/threats?period=7d

Deployment Architecture#

Azure Resources#

┌─────────────────────────────────────────────────────────────────┐
│                      Azure Container Apps                        │
│  ┌─────────────────┐              ┌─────────────────┐          │
│  │    Frontend     │              │    Backend      │          │
│  │   (React SPA)   │◄────────────►│   (FastAPI)     │          │
│  │   Port: 80      │              │   Port: 8000    │          │
│  └─────────────────┘              └────────┬────────┘          │
└─────────────────────────────────────────────┼──────────────────┘
                                              │
              ┌───────────────────────────────┼───────────────────┐
              │                               │                   │
              ▼                               ▼                   ▼
┌─────────────────────┐   ┌─────────────────────────┐   ┌─────────────────┐
│  Azure AI Foundry   │   │    Azure Cosmos DB      │   │  Azure Redis    │
│  - GPT-4o           │   │    - Audit Logs         │   │  - Rate Limits  │
│  - GPT-4o-mini      │   │    - Serverless Mode    │   │  - Session Cache│
└─────────────────────┘   └─────────────────────────┘   └─────────────────┘
              │
              ▼
┌─────────────────────────┐
│ Azure Content Safety    │
│ - Text Analysis         │
│ - Prompt Shields        │
└─────────────────────────┘

Infrastructure as Code (Terraform)#

# Azure Container Apps Environment
resource "azurerm_container_app_environment" "llm_gateway" {
  name                = "cae-llm-gateway"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
}

# Backend Container App
resource "azurerm_container_app" "backend" {
  name                         = "ca-backend"
  container_app_environment_id = azurerm_container_app_environment.llm_gateway.id
  resource_group_name          = azurerm_resource_group.main.name
  revision_mode                = "Single"

  template {
    container {
      name   = "backend"
      image  = "your-registry.azurecr.io/llm-gateway-backend:latest"
      cpu    = 1.0
      memory = "2Gi"

      env {
        name        = "AZURE_AI_ENDPOINT"
        secret_name = "azure-ai-endpoint"
      }
      env {
        name        = "COSMOS_CONNECTION_STRING"
        secret_name = "cosmos-connection"
      }
    }

    min_replicas = 1
    max_replicas = 10
  }
}

Performance Considerations#

Guardrail Latency Optimization#

Each guardrail is optimized for low-latency evaluation:

Guardrail LayerAvg Latency
#1 Prompt Injection5-10ms
#2 Jailbreak Detection5-15ms
#3 PII Detection50-100ms (first request), 10-30ms (cached)
#4 Secret Scanning5-10ms
#5 Content Filter (local)5-10ms
#5 Content Filter (Azure)100-200ms

Total guardrail overhead: ~100-300ms for all 5 layers

Optimization Techniques#

  1. Lazy Initialization: Presidio/spaCy models loaded on first use
  2. Parallel Scanning: All detectors run concurrently
  3. Result Caching: Redis cache for repeated queries
  4. Early Exit: Stop scanning on critical threat detection
  5. Warm-up: Pre-load models during application startup

Next Steps#

Now that you have working LLM guardrails, consider these enhancements:

  1. Custom Guardrails: Add industry-specific PII types or threat patterns
  2. ML-Based Guardrails: Train classifiers on your organization's attack data
  3. Output Guardrails: Scan LLM outputs for hallucination and sensitive data leakage
  4. Integration: Connect to SIEM systems for security monitoring
  5. A/B Testing: Compare detection rates between guardrail configurations

Source Code#

The complete source code for this project is available on GitHub:

LLM Security Gateway - GitHub Repository

The repository includes:

  • Full backend implementation with all 5 security detectors
  • React frontend with dashboard, audit logs, and playground
  • Terraform infrastructure-as-code for Azure deployment
  • Docker Compose for local development
  • Comprehensive configuration examples

Resources#


Conclusion#

Building safe LLM applications requires a defense-in-depth approach. These production LLM guardrails provide:

  • 5-layer protection against prompt injection, jailbreaks, and data leakage
  • Configurable policies that balance safety with usability
  • Complete audit trail for compliance and forensics
  • OpenAI-compatible API for easy integration

As LLM attacks become more sophisticated, having robust guardrails is no longer optional—it's essential for any production deployment.

Get started by cloning the GitHub repository and following the setup instructions. Happy building!