Building Production LLM Guardrails: A Complete Guide to Input/Output Safety with Azure#
As LLMs become central to enterprise applications, implementing robust guardrails is no longer optional—it's essential. Prompt injection, jailbreak attacks, PII leakage, and credential exposure are no longer theoretical risks—they're happening in production right now.
In this tutorial, I'll show you how to build production-ready LLM guardrails that sit between your applications and LLM providers, providing comprehensive safety scanning for every request and response.
By the end, you'll have a fully functional guardrails system with 5 layers of protection:
- Input Guardrail #1: Detects and blocks prompt injection attacks using pattern matching and heuristics
- Input Guardrail #2: Identifies jailbreak attempts including DAN, STAN, and other known techniques
- Input Guardrail #3: Automatically detects and masks PII (emails, SSNs, phone numbers, credit cards)
- Input Guardrail #4: Scans for leaked secrets (API keys, passwords, connection strings)
- Input Guardrail #5: Provides content filtering for harmful requests
- Audit Layer: Maintains a complete audit trail of all requests
Live Demo#
Here's what the Guardrails dashboard looks like in action:

The dashboard provides real-time visibility into request volumes, blocked threats, and guardrail metrics with time-range filtering.
Why You Need LLM Guardrails#
The Safety Challenge#
LLM applications face unique safety challenges that traditional API gateways don't address:
| Threat | Guardrail Needed | Impact if Unguarded |
|---|---|---|
| Prompt Injection | Input validation guardrail | Complete bypass of application logic |
| Jailbreak Attacks | Jailbreak detection guardrail | Generation of harmful content |
| PII Leakage | PII masking guardrail | Privacy violations, GDPR/CCPA fines |
| Secret Exposure | Secret scanning guardrail | Account compromise, data breach |
| Harmful Content | Content filtering guardrail | Reputation damage, legal liability |
What Makes These Guardrails Enterprise-Ready?#
- 5-Layer Protection: Five independent guardrails working in parallel
- Configurable Policies: Block, filter, warn, or allow based on threat severity
- PII Masking: Automatically redact sensitive data before it reaches the LLM
- Audit Logging: Complete traceability stored in Azure Cosmos DB
- Rate Limiting: Prevent abuse with Redis-backed rate limiting
- OpenAI-Compatible API: Drop-in replacement for existing integrations
Guardrails Architecture#
High-Level Overview#
┌─────────────────────────────────────────────────────────────────┐
│ Client Application │
└─────────────────────────────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────────────┐
│ LLM GUARDRAILS LAYER │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ INPUT GUARDRAILS (5 Layers) │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────┐ │ │
│ │ │Guardrail│ │Guardrail│ │Guardrail│ │Guardrail│ │Guard│ │ │
│ │ │ #1 │ │ #2 │ │ #3 │ │ #4 │ │ #5 │ │ │
│ │ │Injection│ │Jailbreak│ │ PII │ │ Secrets │ │Cont.│ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────┴───────────┐ │
│ │ Action Determination │ │
│ │ BLOCK / FILTER / WARN │ │
│ └───────────┬───────────┘ │
│ │ │
│ ┌─────────────────────────────┴─────────────────────────────┐ │
│ │ Audit Logging │ │
│ │ Azure Cosmos DB / Redis Cache │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────────────┐
│ Azure AI Foundry (GPT-4o) │
└─────────────────────────────────────────────────────────────────┘
Request Flow#
- Client sends request to
/v1/chat/completions(OpenAI-compatible) - Input guardrails run all 5 protection layers in parallel
- Action determination based on threat severity and configured policies
- If blocked: Return error with reason (request never reaches LLM)
- If filtered: Apply transformations (PII masking) before forwarding
- Forward to LLM: Send processed request to Azure AI Foundry
- Output guardrails: Optionally scan LLM output for sensitive data
- Audit logging: Record full request/response with guardrail results
- Return response to client
The 5 Guardrail Layers#
Guardrail #1: Prompt Injection Detection#
The first guardrail uses pattern matching to identify attempts to override system instructions:
class PromptInjectionDetector: def __init__(self): self.patterns = [ # Instruction Override (r"ignore\s+(all\s+)?(previous|above|prior)\s+(instructions|prompts|context)", "Instruction override attempt", 0.8), # Developer Mode (r"(developer|debug|admin|root)\s+mode", "Developer mode request", 0.7), # System Prompt Extraction (r"(show|reveal|display|output|print)\s+(me\s+)?(your|the)\s+(system|initial)\s+(prompt|instructions)", "System prompt extraction attempt", 0.9), # Role Manipulation (r"(act|behave|respond)\s+as\s+(if\s+)?(you\s+)?(are|were)\s+", "Role manipulation attempt", 0.6), # Delimiter Injection (r"(```|---|\*\*\*|###)\s*(system|instructions|prompt)", "Delimiter injection attempt", 0.7), ] def detect(self, text: str) -> ScanResult: findings = [] for pattern, description, severity in self.patterns: matches = re.findall(pattern, text, re.IGNORECASE) if matches: findings.append({ "type": "prompt_injection", "description": description, "severity": severity, "match": matches[0] if isinstance(matches[0], str) else matches[0][0] }) return ScanResult( detected=len(findings) > 0, findings=findings, risk_score=max(f["severity"] for f in findings) if findings else 0.0 )
Detected Patterns Include:
- Instruction override attempts ("Ignore all previous instructions")
- Developer/debug mode requests
- System prompt extraction ("Show me your instructions")
- Role manipulation ("Act as if you are...")
- Delimiter injection (using markdown to inject fake system prompts)
- Encoding obfuscation (Base64, ROT13)
Guardrail #2: Jailbreak Detection#
This guardrail identifies known jailbreak techniques and signatures:
class JailbreakDetector: KNOWN_JAILBREAKS = [ # DAN (Do Anything Now) "DAN", "Do Anything Now", "STAN", "DUDE", # Developer Mode "Developer Mode", "Maintenance Mode", # Evil Mode "Evil Mode", "Opposite Mode", "Bad AI", ] JAILBREAK_PATTERNS = [ # Persona Jailbreaks (r"you\s+are\s+now\s+[a-z]+\s+with\s+no\s+restrictions", "Persona jailbreak with no restrictions", 0.9), # Two-Response Attacks (r"give\s+me\s+two\s+responses.*one\s+(censored|normal).*one\s+(uncensored|unrestricted)", "Two-response jailbreak attempt", 0.85), # Grandma Exploit (r"(grandma|grandmother).*used\s+to\s+(tell|read|say)", "Social engineering (grandma exploit)", 0.7), # Token Smuggling (r"\\u[0-9a-fA-F]{4}", "Unicode escape sequence (token smuggling)", 0.6), ]
Example Detection:

The guardrails blocked a prompt injection attempt with 76% overall risk score, detecting the "Ignore all previous instructions" pattern.
Guardrail #3: PII Detection & Masking#
This guardrail uses Microsoft Presidio with spaCy NLP for entity recognition:
from presidio_analyzer import AnalyzerEngine from presidio_anonymizer import AnonymizerEngine class PIIDetector: ENTITIES = [ "PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD", "US_SSN", "US_BANK_NUMBER", "IP_ADDRESS" ] def __init__(self): self.analyzer = AnalyzerEngine() self.anonymizer = AnonymizerEngine() def detect_and_mask(self, text: str) -> tuple[str, list[PIIEntity]]: # Detect PII entities results = self.analyzer.analyze( text=text, entities=self.ENTITIES, language="en" ) # Mask detected entities masked_text = self.anonymizer.anonymize( text=text, analyzer_results=results ).text return masked_text, results
PII Guardrail in Action:

The guardrail detected 4 PII entities (email, names, phone number) and automatically masked them before forwarding to the LLM. The response shows placeholders like <EMAIL_ADDRESS> instead of actual data.
Risk Scoring by Entity Type:
| Entity Type | Risk Weight |
|---|---|
| US_SSN | 1.0 |
| CREDIT_CARD | 1.0 |
| US_BANK_NUMBER | 0.8 |
| EMAIL_ADDRESS | 0.5 |
| PHONE_NUMBER | 0.5 |
| IP_ADDRESS | 0.4 |
| PERSON | 0.3 |
Guardrail #4: Secret Scanning#
This guardrail detects API keys, passwords, and credentials using precise regex patterns:
class SecretScanner: SECRET_PATTERNS = [ # AWS Credentials (r"AKIA[0-9A-Z]{16}", "AWS Access Key", 0.95), (r"aws_secret_access_key\s*=\s*['\"]?([A-Za-z0-9/+=]{40})", "AWS Secret Key", 1.0), # Azure Credentials (r"[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}", "Azure Client ID/Subscription ID", 0.6), # API Keys (r"sk-[a-zA-Z0-9]{48}", "OpenAI API Key", 0.95), (r"sk-ant-[a-zA-Z0-9-]{95}", "Anthropic API Key", 0.95), (r"ghp_[a-zA-Z0-9]{36}", "GitHub Personal Access Token", 0.9), # Generic Patterns (r"['\"]?password['\"]?\s*[:=]\s*['\"]([^'\"]{8,})['\"]", "Password", 0.8), (r"Bearer\s+[a-zA-Z0-9._-]{20,}", "Bearer Token", 0.85), # Connection Strings (r"mongodb(\+srv)?://[^\s]+", "MongoDB Connection String", 0.9), (r"postgresql://[^\s]+", "PostgreSQL Connection String", 0.9), (r"redis://[^\s]+", "Redis Connection String", 0.85), # Private Keys (r"-----BEGIN\s+(RSA|DSA|EC|PGP)\s+PRIVATE\s+KEY-----", "Private Key", 1.0), ] def scan(self, text: str) -> list[SecretFinding]: findings = [] for pattern, secret_type, severity in self.SECRET_PATTERNS: matches = re.finditer(pattern, text, re.IGNORECASE) for match in matches: findings.append(SecretFinding( type=secret_type, severity=severity, location=(match.start(), match.end()), redacted=self._redact(match.group(), secret_type) )) return findings
Redaction Strategy:
| Secret Type | Redaction Example |
|---|---|
| AWS Access Key | AKIA****... (first 4 chars) |
| OpenAI API Key | sk-****...**** (prefix + suffix) |
| Private Key | -----BEGIN RSA... + [REDACTED] |
| Password | **** (fully redacted) |
Guardrail #5: Content Filtering#
The final guardrail enforces safety policies using both local patterns and Azure AI Content Safety:
class ContentFilter: HARMFUL_PATTERNS = [ (r"how\s+to\s+(make|create|build)\s+a?\s*(bomb|explosive|weapon)", "Harmful instructions request", "high"), (r"(hack|break\s+into|exploit)\s+(the|a)?\s*(website|server|system)", "Malicious activity request", "high"), (r"(write|create|generate)\s+(malware|ransomware|virus|trojan)", "Malware creation request", "critical"), ] async def check_with_azure(self, text: str) -> ContentSafetyResult: """Use Azure AI Content Safety for additional analysis.""" response = await self.content_safety_client.analyze_text( AnalyzeTextOptions(text=text) ) categories = { "Hate": response.hate_result.severity, "Violence": response.violence_result.severity, "Sexual": response.sexual_result.severity, "SelfHarm": response.self_harm_result.severity, } # Block if any category exceeds threshold should_block = any( severity >= self.blocking_threshold for severity in categories.values() ) return ContentSafetyResult( categories=categories, should_block=should_block )
Guardrails Playground#
The Guardrails Playground allows you to test various attack scenarios:

The interactive playground provides pre-built test cases for injection attacks, jailbreaks, PII exposure, and secret leakage.
Test Categories:
- Injection: Instruction override, delimiter injection, role manipulation
- Jailbreak: DAN, STAN, developer mode, persona attacks
- PII: Personal information, financial data
- Secrets: API keys, passwords, connection strings
- Benign: Normal queries to test false positive rates
Audit Logging#
Every request is logged to Azure Cosmos DB with full context:

The audit log view shows all requests with status, detected threats, PII presence, token usage, and response times.
Audit Record Schema:
class AuditRecord(BaseModel): request_id: str # Unique identifier timestamp: datetime # Request time endpoint: str # API endpoint called model: str # LLM model used status: str # allowed/blocked/filtered/error # Security Results threats_detected: list[ThreatInfo] pii_detected: list[PIIInfo] secrets_detected: list[SecretInfo] # Metrics input_tokens: int output_tokens: int response_time_ms: int # Request Context client_ip: str api_key_id: str # Hashed API key user_id: Optional[str]
Guardrail Configuration & Policies#
Each guardrail supports configurable policies per threat type:

The settings page shows all enabled guardrails and current configuration.
Guardrail Policy Configuration:
class SecurityPolicies(BaseModel): prompt_injection: PolicyConfig = PolicyConfig( action="block", severity_threshold="medium", message="Potential prompt injection detected. Request blocked." ) jailbreak: PolicyConfig = PolicyConfig( action="block", severity_threshold="high", message="Jailbreak attempt detected. Request blocked." ) pii_detection: PolicyConfig = PolicyConfig( action="filter", # Mask PII instead of blocking severity_threshold="low", message="PII detected and masked." ) secret_scanning: PolicyConfig = PolicyConfig( action="block", severity_threshold="critical", message="Credentials detected. Request blocked." ) content_filter: PolicyConfig = PolicyConfig( action="block", severity_threshold="medium", message="Content policy violation. Request blocked." )
Action Types:
| Action | Behavior |
|---|---|
| block | Reject request immediately, return error |
| filter | Apply transformations (masking), continue |
| warn | Log warning, allow request to proceed |
| allow | Take no action, just log |
API Endpoints#
OpenAI-Compatible Endpoints#
# Chat Completions (with security scanning) POST /v1/chat/completions Content-Type: application/json { "model": "gpt-4o", "messages": [ {"role": "user", "content": "Hello, how are you?"} ] } # Text Completions (with security scanning) POST /v1/completions
Security-Specific Endpoints#
# Standalone Security Scan (no LLM call) POST /v1/security/scan { "text": "Your text to scan...", "include_details": true } # Query Audit Logs GET /api/audit?status=blocked&start_date=2024-01-01 # Get Analytics Summary GET /api/audit/summary?period=24h # Threat Analytics GET /api/analytics/threats?period=7d
Deployment Architecture#
Azure Resources#
┌─────────────────────────────────────────────────────────────────┐
│ Azure Container Apps │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Frontend │ │ Backend │ │
│ │ (React SPA) │◄────────────►│ (FastAPI) │ │
│ │ Port: 80 │ │ Port: 8000 │ │
│ └─────────────────┘ └────────┬────────┘ │
└─────────────────────────────────────────────┼──────────────────┘
│
┌───────────────────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────────┐ ┌─────────────────────────┐ ┌─────────────────┐
│ Azure AI Foundry │ │ Azure Cosmos DB │ │ Azure Redis │
│ - GPT-4o │ │ - Audit Logs │ │ - Rate Limits │
│ - GPT-4o-mini │ │ - Serverless Mode │ │ - Session Cache│
└─────────────────────┘ └─────────────────────────┘ └─────────────────┘
│
▼
┌─────────────────────────┐
│ Azure Content Safety │
│ - Text Analysis │
│ - Prompt Shields │
└─────────────────────────┘
Infrastructure as Code (Terraform)#
# Azure Container Apps Environment resource "azurerm_container_app_environment" "llm_gateway" { name = "cae-llm-gateway" location = azurerm_resource_group.main.location resource_group_name = azurerm_resource_group.main.name } # Backend Container App resource "azurerm_container_app" "backend" { name = "ca-backend" container_app_environment_id = azurerm_container_app_environment.llm_gateway.id resource_group_name = azurerm_resource_group.main.name revision_mode = "Single" template { container { name = "backend" image = "your-registry.azurecr.io/llm-gateway-backend:latest" cpu = 1.0 memory = "2Gi" env { name = "AZURE_AI_ENDPOINT" secret_name = "azure-ai-endpoint" } env { name = "COSMOS_CONNECTION_STRING" secret_name = "cosmos-connection" } } min_replicas = 1 max_replicas = 10 } }
Performance Considerations#
Guardrail Latency Optimization#
Each guardrail is optimized for low-latency evaluation:
| Guardrail Layer | Avg Latency |
|---|---|
| #1 Prompt Injection | 5-10ms |
| #2 Jailbreak Detection | 5-15ms |
| #3 PII Detection | 50-100ms (first request), 10-30ms (cached) |
| #4 Secret Scanning | 5-10ms |
| #5 Content Filter (local) | 5-10ms |
| #5 Content Filter (Azure) | 100-200ms |
Total guardrail overhead: ~100-300ms for all 5 layers
Optimization Techniques#
- Lazy Initialization: Presidio/spaCy models loaded on first use
- Parallel Scanning: All detectors run concurrently
- Result Caching: Redis cache for repeated queries
- Early Exit: Stop scanning on critical threat detection
- Warm-up: Pre-load models during application startup
Next Steps#
Now that you have working LLM guardrails, consider these enhancements:
- Custom Guardrails: Add industry-specific PII types or threat patterns
- ML-Based Guardrails: Train classifiers on your organization's attack data
- Output Guardrails: Scan LLM outputs for hallucination and sensitive data leakage
- Integration: Connect to SIEM systems for security monitoring
- A/B Testing: Compare detection rates between guardrail configurations
Source Code#
The complete source code for this project is available on GitHub:
LLM Security Gateway - GitHub Repository
The repository includes:
- Full backend implementation with all 5 security detectors
- React frontend with dashboard, audit logs, and playground
- Terraform infrastructure-as-code for Azure deployment
- Docker Compose for local development
- Comprehensive configuration examples
Resources#
- GitHub Repository
- Azure AI Foundry Documentation
- Microsoft Presidio
- Azure AI Content Safety
- OWASP LLM Top 10
- FastAPI Documentation
Conclusion#
Building safe LLM applications requires a defense-in-depth approach. These production LLM guardrails provide:
- 5-layer protection against prompt injection, jailbreaks, and data leakage
- Configurable policies that balance safety with usability
- Complete audit trail for compliance and forensics
- OpenAI-compatible API for easy integration
As LLM attacks become more sophisticated, having robust guardrails is no longer optional—it's essential for any production deployment.
Get started by cloning the GitHub repository and following the setup instructions. Happy building!