Building Production LLM Guardrails: A Complete Guide to Input/Output Safety with Azure#

As LLMs become central to enterprise applications, implementing robust guardrails is no longer optional—it's essential. Prompt injection, jailbreak attacks, PII leakage, and credential exposure are no longer theoretical risks—they're happening in production right now.

In this tutorial, I'll show you how to build production-ready LLM guardrails that sit between your applications and LLM providers, providing comprehensive safety scanning for every request and response.

By the end, you'll have a fully functional guardrails system with 5 layers of protection:

Input Guardrail #1: Detects and blocks prompt injection attacks using pattern matching and heuristics
Input Guardrail #2: Identifies jailbreak attempts including DAN, STAN, and other known techniques
Input Guardrail #3: Automatically detects and masks PII (emails, SSNs, phone numbers, credit cards)
Input Guardrail #4: Scans for leaked secrets (API keys, passwords, connection strings)
Input Guardrail #5: Provides content filtering for harmful requests
Audit Layer: Maintains a complete audit trail of all requests

Live Demo#

Here's what the Guardrails dashboard looks like in action:

LLM Guardrails Dashboard

The dashboard provides real-time visibility into request volumes, blocked threats, and guardrail metrics with time-range filtering.

Why You Need LLM Guardrails#

The Safety Challenge#

LLM applications face unique safety challenges that traditional API gateways don't address:

Threat	Guardrail Needed	Impact if Unguarded
Prompt Injection	Input validation guardrail	Complete bypass of application logic
Jailbreak Attacks	Jailbreak detection guardrail	Generation of harmful content
PII Leakage	PII masking guardrail	Privacy violations, GDPR/CCPA fines
Secret Exposure	Secret scanning guardrail	Account compromise, data breach
Harmful Content	Content filtering guardrail	Reputation damage, legal liability

What Makes These Guardrails Enterprise-Ready?#

5-Layer Protection: Five independent guardrails working in parallel
Configurable Policies: Block, filter, warn, or allow based on threat severity
PII Masking: Automatically redact sensitive data before it reaches the LLM
Audit Logging: Complete traceability stored in Azure Cosmos DB
Rate Limiting: Prevent abuse with Redis-backed rate limiting
OpenAI-Compatible API: Drop-in replacement for existing integrations

Guardrails Architecture#

High-Level Overview#

┌─────────────────────────────────────────────────────────────────┐
│                      Client Application                          │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ↓
┌─────────────────────────────────────────────────────────────────┐
│                     LLM GUARDRAILS LAYER                         │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │              INPUT GUARDRAILS (5 Layers)                  │  │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────┐ │  │
│  │  │Guardrail│ │Guardrail│ │Guardrail│ │Guardrail│ │Guard│ │  │
│  │  │   #1    │ │   #2    │ │   #3    │ │   #4    │ │ #5  │ │  │
│  │  │Injection│ │Jailbreak│ │   PII   │ │ Secrets │ │Cont.│ │  │
│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────┘ │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                │                                 │
│                    ┌───────────┴───────────┐                    │
│                    │  Action Determination  │                    │
│                    │  BLOCK / FILTER / WARN │                    │
│                    └───────────┬───────────┘                    │
│                                │                                 │
│  ┌─────────────────────────────┴─────────────────────────────┐  │
│  │                    Audit Logging                          │  │
│  │              Azure Cosmos DB / Redis Cache                │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ↓
┌─────────────────────────────────────────────────────────────────┐
│                    Azure AI Foundry (GPT-4o)                     │
└─────────────────────────────────────────────────────────────────┘

Request Flow#

Client sends request to /v1/chat/completions (OpenAI-compatible)
Input guardrails run all 5 protection layers in parallel
Action determination based on threat severity and configured policies
If blocked: Return error with reason (request never reaches LLM)
If filtered: Apply transformations (PII masking) before forwarding
Forward to LLM: Send processed request to Azure AI Foundry
Output guardrails: Optionally scan LLM output for sensitive data
Audit logging: Record full request/response with guardrail results
Return response to client

The 5 Guardrail Layers#

Guardrail #1: Prompt Injection Detection#

The first guardrail uses pattern matching to identify attempts to override system instructions:

class PromptInjectionDetector:
    def __init__(self):
        self.patterns = [
            # Instruction Override
            (r"ignore\s+(all\s+)?(previous|above|prior)\s+(instructions|prompts|context)",
             "Instruction override attempt", 0.8),

            # Developer Mode
            (r"(developer|debug|admin|root)\s+mode",
             "Developer mode request", 0.7),

            # System Prompt Extraction
            (r"(show|reveal|display|output|print)\s+(me\s+)?(your|the)\s+(system|initial)\s+(prompt|instructions)",
             "System prompt extraction attempt", 0.9),

            # Role Manipulation
            (r"(act|behave|respond)\s+as\s+(if\s+)?(you\s+)?(are|were)\s+",
             "Role manipulation attempt", 0.6),

            # Delimiter Injection
            (r"(```|---|\*\*\*|###)\s*(system|instructions|prompt)",
             "Delimiter injection attempt", 0.7),
        ]

    def detect(self, text: str) -> ScanResult:
        findings = []
        for pattern, description, severity in self.patterns:
            matches = re.findall(pattern, text, re.IGNORECASE)
            if matches:
                findings.append({
                    "type": "prompt_injection",
                    "description": description,
                    "severity": severity,
                    "match": matches[0] if isinstance(matches[0], str) else matches[0][0]
                })

        return ScanResult(
            detected=len(findings) > 0,
            findings=findings,
            risk_score=max(f["severity"] for f in findings) if findings else 0.0
        )

Detected Patterns Include:

Instruction override attempts ("Ignore all previous instructions")
Developer/debug mode requests
System prompt extraction ("Show me your instructions")
Role manipulation ("Act as if you are...")
Delimiter injection (using markdown to inject fake system prompts)
Encoding obfuscation (Base64, ROT13)

Guardrail #2: Jailbreak Detection#

This guardrail identifies known jailbreak techniques and signatures:

class JailbreakDetector:
    KNOWN_JAILBREAKS = [
        # DAN (Do Anything Now)
        "DAN", "Do Anything Now", "STAN", "DUDE",
        # Developer Mode
        "Developer Mode", "Maintenance Mode",
        # Evil Mode
        "Evil Mode", "Opposite Mode", "Bad AI",
    ]

    JAILBREAK_PATTERNS = [
        # Persona Jailbreaks
        (r"you\s+are\s+now\s+[a-z]+\s+with\s+no\s+restrictions",
         "Persona jailbreak with no restrictions", 0.9),

        # Two-Response Attacks
        (r"give\s+me\s+two\s+responses.*one\s+(censored|normal).*one\s+(uncensored|unrestricted)",
         "Two-response jailbreak attempt", 0.85),

        # Grandma Exploit
        (r"(grandma|grandmother).*used\s+to\s+(tell|read|say)",
         "Social engineering (grandma exploit)", 0.7),

        # Token Smuggling
        (r"\\u[0-9a-fA-F]{4}",
         "Unicode escape sequence (token smuggling)", 0.6),
    ]

Example Detection:

Prompt Injection Blocked

The guardrails blocked a prompt injection attempt with 76% overall risk score, detecting the "Ignore all previous instructions" pattern.

Guardrail #3: PII Detection & Masking#

This guardrail uses Microsoft Presidio with spaCy NLP for entity recognition:

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

class PIIDetector:
    ENTITIES = [
        "PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER",
        "CREDIT_CARD", "US_SSN", "US_BANK_NUMBER", "IP_ADDRESS"
    ]

    def __init__(self):
        self.analyzer = AnalyzerEngine()
        self.anonymizer = AnonymizerEngine()

    def detect_and_mask(self, text: str) -> tuple[str, list[PIIEntity]]:
        # Detect PII entities
        results = self.analyzer.analyze(
            text=text,
            entities=self.ENTITIES,
            language="en"
        )

        # Mask detected entities
        masked_text = self.anonymizer.anonymize(
            text=text,
            analyzer_results=results
        ).text

        return masked_text, results

PII Guardrail in Action:

PII Detection and Masking

The guardrail detected 4 PII entities (email, names, phone number) and automatically masked them before forwarding to the LLM. The response shows placeholders like <EMAIL_ADDRESS> instead of actual data.

Risk Scoring by Entity Type:

Entity Type	Risk Weight
US_SSN	1.0
CREDIT_CARD	1.0
US_BANK_NUMBER	0.8
EMAIL_ADDRESS	0.5
PHONE_NUMBER	0.5
IP_ADDRESS	0.4
PERSON	0.3

Guardrail #4: Secret Scanning#

This guardrail detects API keys, passwords, and credentials using precise regex patterns:

class SecretScanner:
    SECRET_PATTERNS = [
        # AWS Credentials
        (r"AKIA[0-9A-Z]{16}", "AWS Access Key", 0.95),
        (r"aws_secret_access_key\s*=\s*['\"]?([A-Za-z0-9/+=]{40})", "AWS Secret Key", 1.0),

        # Azure Credentials
        (r"[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}",
         "Azure Client ID/Subscription ID", 0.6),

        # API Keys
        (r"sk-[a-zA-Z0-9]{48}", "OpenAI API Key", 0.95),
        (r"sk-ant-[a-zA-Z0-9-]{95}", "Anthropic API Key", 0.95),
        (r"ghp_[a-zA-Z0-9]{36}", "GitHub Personal Access Token", 0.9),

        # Generic Patterns
        (r"['\"]?password['\"]?\s*[:=]\s*['\"]([^'\"]{8,})['\"]", "Password", 0.8),
        (r"Bearer\s+[a-zA-Z0-9._-]{20,}", "Bearer Token", 0.85),

        # Connection Strings
        (r"mongodb(\+srv)?://[^\s]+", "MongoDB Connection String", 0.9),
        (r"postgresql://[^\s]+", "PostgreSQL Connection String", 0.9),
        (r"redis://[^\s]+", "Redis Connection String", 0.85),

        # Private Keys
        (r"-----BEGIN\s+(RSA|DSA|EC|PGP)\s+PRIVATE\s+KEY-----", "Private Key", 1.0),
    ]

    def scan(self, text: str) -> list[SecretFinding]:
        findings = []
        for pattern, secret_type, severity in self.SECRET_PATTERNS:
            matches = re.finditer(pattern, text, re.IGNORECASE)
            for match in matches:
                findings.append(SecretFinding(
                    type=secret_type,
                    severity=severity,
                    location=(match.start(), match.end()),
                    redacted=self._redact(match.group(), secret_type)
                ))
        return findings

Redaction Strategy:

Secret Type	Redaction Example
AWS Access Key	`AKIA****...` (first 4 chars)
OpenAI API Key	`sk-**...**` (prefix + suffix)
Private Key	`-----BEGIN RSA...` + `[REDACTED]`
Password	`****` (fully redacted)

Guardrail #5: Content Filtering#

The final guardrail enforces safety policies using both local patterns and Azure AI Content Safety:

class ContentFilter:
    HARMFUL_PATTERNS = [
        (r"how\s+to\s+(make|create|build)\s+a?\s*(bomb|explosive|weapon)",
         "Harmful instructions request", "high"),
        (r"(hack|break\s+into|exploit)\s+(the|a)?\s*(website|server|system)",
         "Malicious activity request", "high"),
        (r"(write|create|generate)\s+(malware|ransomware|virus|trojan)",
         "Malware creation request", "critical"),
    ]

    async def check_with_azure(self, text: str) -> ContentSafetyResult:
        """Use Azure AI Content Safety for additional analysis."""
        response = await self.content_safety_client.analyze_text(
            AnalyzeTextOptions(text=text)
        )

        categories = {
            "Hate": response.hate_result.severity,
            "Violence": response.violence_result.severity,
            "Sexual": response.sexual_result.severity,
            "SelfHarm": response.self_harm_result.severity,
        }

        # Block if any category exceeds threshold
        should_block = any(
            severity >= self.blocking_threshold
            for severity in categories.values()
        )

        return ContentSafetyResult(
            categories=categories,
            should_block=should_block
        )

Guardrails Playground#

The Guardrails Playground allows you to test various attack scenarios:

Guardrails Playground

The interactive playground provides pre-built test cases for injection attacks, jailbreaks, PII exposure, and secret leakage.

Test Categories:

Injection: Instruction override, delimiter injection, role manipulation
Jailbreak: DAN, STAN, developer mode, persona attacks
PII: Personal information, financial data
Secrets: API keys, passwords, connection strings
Benign: Normal queries to test false positive rates

Audit Logging#

Every request is logged to Azure Cosmos DB with full context:

Audit Logs

The audit log view shows all requests with status, detected threats, PII presence, token usage, and response times.

Audit Record Schema:

class AuditRecord(BaseModel):
    request_id: str           # Unique identifier
    timestamp: datetime       # Request time
    endpoint: str            # API endpoint called
    model: str               # LLM model used
    status: str              # allowed/blocked/filtered/error

    # Security Results
    threats_detected: list[ThreatInfo]
    pii_detected: list[PIIInfo]
    secrets_detected: list[SecretInfo]

    # Metrics
    input_tokens: int
    output_tokens: int
    response_time_ms: int

    # Request Context
    client_ip: str
    api_key_id: str          # Hashed API key
    user_id: Optional[str]

Guardrail Configuration & Policies#

Each guardrail supports configurable policies per threat type:

Settings Page

The settings page shows all enabled guardrails and current configuration.

Guardrail Policy Configuration:

class SecurityPolicies(BaseModel):
    prompt_injection: PolicyConfig = PolicyConfig(
        action="block",
        severity_threshold="medium",
        message="Potential prompt injection detected. Request blocked."
    )

    jailbreak: PolicyConfig = PolicyConfig(
        action="block",
        severity_threshold="high",
        message="Jailbreak attempt detected. Request blocked."
    )

    pii_detection: PolicyConfig = PolicyConfig(
        action="filter",  # Mask PII instead of blocking
        severity_threshold="low",
        message="PII detected and masked."
    )

    secret_scanning: PolicyConfig = PolicyConfig(
        action="block",
        severity_threshold="critical",
        message="Credentials detected. Request blocked."
    )

    content_filter: PolicyConfig = PolicyConfig(
        action="block",
        severity_threshold="medium",
        message="Content policy violation. Request blocked."
    )

Action Types:

Action	Behavior
block	Reject request immediately, return error
filter	Apply transformations (masking), continue
warn	Log warning, allow request to proceed
allow	Take no action, just log

API Endpoints#

OpenAI-Compatible Endpoints#

# Chat Completions (with security scanning)
POST /v1/chat/completions
Content-Type: application/json

{
  "model": "gpt-4o",
  "messages": [
    {"role": "user", "content": "Hello, how are you?"}
  ]
}

# Text Completions (with security scanning)
POST /v1/completions

Security-Specific Endpoints#

# Standalone Security Scan (no LLM call)
POST /v1/security/scan
{
  "text": "Your text to scan...",
  "include_details": true
}

# Query Audit Logs
GET /api/audit?status=blocked&start_date=2024-01-01

# Get Analytics Summary
GET /api/audit/summary?period=24h

# Threat Analytics
GET /api/analytics/threats?period=7d

Deployment Architecture#

Azure Resources#

┌─────────────────────────────────────────────────────────────────┐
│                      Azure Container Apps                        │
│  ┌─────────────────┐              ┌─────────────────┐          │
│  │    Frontend     │              │    Backend      │          │
│  │   (React SPA)   │◄────────────►│   (FastAPI)     │          │
│  │   Port: 80      │              │   Port: 8000    │          │
│  └─────────────────┘              └────────┬────────┘          │
└─────────────────────────────────────────────┼──────────────────┘
                                              │
              ┌───────────────────────────────┼───────────────────┐
              │                               │                   │
              ▼                               ▼                   ▼
┌─────────────────────┐   ┌─────────────────────────┐   ┌─────────────────┐
│  Azure AI Foundry   │   │    Azure Cosmos DB      │   │  Azure Redis    │
│  - GPT-4o           │   │    - Audit Logs         │   │  - Rate Limits  │
│  - GPT-4o-mini      │   │    - Serverless Mode    │   │  - Session Cache│
└─────────────────────┘   └─────────────────────────┘   └─────────────────┘
              │
              ▼
┌─────────────────────────┐
│ Azure Content Safety    │
│ - Text Analysis         │
│ - Prompt Shields        │
└─────────────────────────┘

Infrastructure as Code (Terraform)#

# Azure Container Apps Environment
resource "azurerm_container_app_environment" "llm_gateway" {
  name                = "cae-llm-gateway"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
}

# Backend Container App
resource "azurerm_container_app" "backend" {
  name                         = "ca-backend"
  container_app_environment_id = azurerm_container_app_environment.llm_gateway.id
  resource_group_name          = azurerm_resource_group.main.name
  revision_mode                = "Single"

  template {
    container {
      name   = "backend"
      image  = "your-registry.azurecr.io/llm-gateway-backend:latest"
      cpu    = 1.0
      memory = "2Gi"

      env {
        name        = "AZURE_AI_ENDPOINT"
        secret_name = "azure-ai-endpoint"
      }
      env {
        name        = "COSMOS_CONNECTION_STRING"
        secret_name = "cosmos-connection"
      }
    }

    min_replicas = 1
    max_replicas = 10
  }
}

Performance Considerations#

Guardrail Latency Optimization#

Each guardrail is optimized for low-latency evaluation:

Guardrail Layer	Avg Latency
#1 Prompt Injection	5-10ms
#2 Jailbreak Detection	5-15ms
#3 PII Detection	50-100ms (first request), 10-30ms (cached)
#4 Secret Scanning	5-10ms
#5 Content Filter (local)	5-10ms
#5 Content Filter (Azure)	100-200ms

Total guardrail overhead: ~100-300ms for all 5 layers

Optimization Techniques#

Lazy Initialization: Presidio/spaCy models loaded on first use
Parallel Scanning: All detectors run concurrently
Result Caching: Redis cache for repeated queries
Early Exit: Stop scanning on critical threat detection
Warm-up: Pre-load models during application startup

Next Steps#

Now that you have working LLM guardrails, consider these enhancements:

Custom Guardrails: Add industry-specific PII types or threat patterns
ML-Based Guardrails: Train classifiers on your organization's attack data
Output Guardrails: Scan LLM outputs for hallucination and sensitive data leakage
Integration: Connect to SIEM systems for security monitoring
A/B Testing: Compare detection rates between guardrail configurations

Source Code#

The complete source code for this project is available on GitHub:

LLM Security Gateway - GitHub Repository

The repository includes:

Full backend implementation with all 5 security detectors
React frontend with dashboard, audit logs, and playground
Terraform infrastructure-as-code for Azure deployment
Docker Compose for local development
Comprehensive configuration examples

Resources#

Conclusion#

Building safe LLM applications requires a defense-in-depth approach. These production LLM guardrails provide:

5-layer protection against prompt injection, jailbreaks, and data leakage
Configurable policies that balance safety with usability
Complete audit trail for compliance and forensics
OpenAI-compatible API for easy integration

As LLM attacks become more sophisticated, having robust guardrails is no longer optional—it's essential for any production deployment.

Get started by cloning the GitHub repository and following the setup instructions. Happy building!