Deep Research Agent Series — Blog 6: Security & Production Hardening#
You've built an AI research agent that works. Now make it safe. In this final post, we add six layers of security — from Cognito JWT authentication and WAF rate limiting to Bedrock Guardrails for AI content safety and GitHub OIDC federation for zero-credential deployments. Production isn't just about features that work; it's about features that work safely under adversarial conditions.
Series Navigation#
| Part | Topic | Status |
|---|---|---|
| Blog 1 | Architecture & Vision | Published |
| Blog 2 | Multi-Agent Orchestration | Published |
| Blog 3 | Smart Search & Source Intelligence | Published |
| Blog 4 | Real-Time Streaming with WebSocket | Published |
| Blog 5 | Cloud-Native Infrastructure on AWS | Published |
| Blog 6 | Security & Production Hardening | You are here |
The Security Stack#
Security isn't a single feature — it's a stack. Our platform implements six layers, each addressing a different threat surface:
- CloudFront + WAF — Edge protection, rate limiting, bot control
- Cognito JWT Auth — User identity and access tokens
- VPC Data Perimeter — Network-level isolation
- Secrets Manager — Zero secrets in code
- Bedrock Guardrails — AI content safety
- GitHub OIDC — Zero-credential CI/CD
Each layer operates independently. If one fails, the others still protect you. Let's walk through each one.
Cognito JWT Authentication#
Authentication is the front door. Every API request and WebSocket connection must carry a valid JWT issued by Amazon Cognito. Here's the full verification middleware:
import jwt from jwt import PyJWTError as JWTError import httpx _jwks_cache: dict | None = None async def get_jwks() -> dict: """Fetch and cache JWKS from Cognito.""" global _jwks_cache if _jwks_cache: return _jwks_cache url = ( f"https://cognito-idp.{settings.aws_region}.amazonaws.com" f"/{settings.cognito_user_pool_id}/.well-known/jwks.json" ) async with httpx.AsyncClient() as client: resp = await client.get(url) _jwks_cache = resp.json() return _jwks_cache async def verify_token(token: str) -> dict: """Verify a Cognito JWT access token.""" if not settings.cognito_user_pool_id: return {"sub": "dev-user", "email": "dev@local"} jwks = await get_jwks() unverified_header = jwt.get_unverified_header(token) key = None for k in jwks.get("keys", []): if k["kid"] == unverified_header.get("kid"): key = k break if not key: raise JWTError("No matching key found in JWKS") claims = jwt.decode( token, key, algorithms=["RS256"], issuer=( f"https://cognito-idp.{settings.aws_region}.amazonaws.com" f"/{settings.cognito_user_pool_id}" ), options={"verify_aud": False}, ) if claims.get("token_use") != "access": raise JWTError("Expected access token") if ( settings.cognito_app_client_id and claims.get("client_id") != settings.cognito_app_client_id ): raise JWTError("client_id mismatch") return claims
Several design decisions here are worth calling out. First, the JWKS is fetched once and cached in memory. Cognito rotates keys infrequently, so a process-level cache avoids a network round-trip on every request. Second, the dev-mode bypass — when cognito_user_pool_id is not configured, the middleware returns a synthetic user. This lets you run the full stack locally without standing up Cognito.
Why Access Tokens, Not ID Tokens?#
Many tutorials validate ID tokens for API authorization. This is wrong per the OAuth 2.0 spec. ID tokens contain user profile data — email, name, profile picture — and are intended for the frontend to display user information. Access tokens contain authorization data — scopes, client_id, token_use — and are the correct token type for API authorization.
This distinction matters practically: access tokens don't carry an aud (audience) claim in Cognito's implementation. Instead, we verify client_id to ensure the token was issued for our application. We also set verify_aud: False in the decode options to prevent PyJWT from rejecting a perfectly valid access token.
WebSocket Authentication#
WebSocket connections present a unique challenge: the WebSocket protocol doesn't support custom headers during the handshake. You can't send an Authorization: Bearer <token> header the way you would with an HTTP request.
The solution is straightforward — pass the token as a query parameter:
@app.websocket("/ws/chat/{session_id}") async def websocket_chat( websocket: WebSocket, session_id: str, token: str = Query(default=""), ): """WebSocket endpoint with token-based auth.""" try: claims = await verify_token(token) except JWTError: await websocket.close(code=4001, reason="Unauthorized") return await websocket.accept() user_id = claims.get("sub", "anonymous") # ... proceed with authenticated connection
The token is validated before the WebSocket is accepted. If verification fails, we close with a 4001 code — a custom close code that the frontend can catch and redirect to login.
WAF Protection#
AWS WAF sits in front of CloudFront, inspecting every request before it reaches your application. Our configuration uses four rule groups in priority order:
- AllowAppPaths (priority 0) — Explicitly allow
/ws/*,/api/*, and/healthbefore any managed rules fire - Rate Limiting (priority 1) — Cap at 2000 requests per 5 minutes per IP
- AWS Managed Rules (priority 2) — OWASP top 10, SQL injection, cross-site scripting
- Bot Control (priority 3) — Block known bad bots and scrapers
The ordering matters. AWS managed rules are aggressive — they can false-positive on WebSocket binary frames, JSON payloads with special characters, or API requests with encoded content. By placing AllowAppPaths at priority 0, we ensure that legitimate application traffic is always permitted. The managed rules then catch anything that falls through — direct access attempts, probing, and abuse.
Rate limiting at 2000 requests per 5 minutes translates to roughly 6-7 requests per second sustained. That's generous enough for a single user running multiple research sessions simultaneously, but tight enough to stop automated abuse.
VPC Data Perimeter#
Network isolation goes beyond security groups. Our VPC endpoints implement a data perimeter policy based on the AWS whitepaper pattern — every VPC endpoint is scoped to allow traffic only from the owning AWS account:
{ "Statement": [ { "Effect": "Allow", "Principal": "*", "Action": "*", "Resource": "*", "Condition": { "StringEquals": { "aws:PrincipalAccount": "123456789012" } } }, { "Effect": "Allow", "Principal": "*", "Action": "*", "Resource": "*", "Condition": { "Bool": { "aws:PrincipalIsAWSService": "true" } } } ] }
This prevents a critical attack vector: data exfiltration via AWS services. Without this policy, a compromised workload could call S3, Secrets Manager, or Bedrock endpoints in a different account, shipping your data elsewhere. The aws:PrincipalAccount condition locks every API call to your account. The aws:PrincipalIsAWSService exception allows AWS services like CloudTrail and Config to function normally — they use service-linked roles that wouldn't pass the account check otherwise.
Secrets Management#
Zero secrets in code, environment variables, or git. Every secret is stored in AWS Secrets Manager and retrieved at runtime:
def _load_tavily_key() -> str: """Load Tavily API key from Secrets Manager.""" if settings.tavily_api_key_secret_arn: secrets = boto3.client( "secretsmanager", region_name=settings.aws_region ) secret = secrets.get_secret_value( SecretId=settings.tavily_api_key_secret_arn ) return json.loads(secret["SecretString"])["TAVILY_API_KEY"] return settings.tavily_api_key or ""
The pattern is consistent across all secrets — Tavily API key, Google OAuth client credentials, any third-party API keys. CDK creates the secrets during deployment with initial placeholder values. After deployment, you update the secret values once through the console or CLI. The application retrieves them at startup and caches for the process lifetime.
The fallback to settings.tavily_api_key supports local development where you might set the key directly in a .env file. In production, the ARN is always configured, so the Secrets Manager path is taken.
Bedrock Guardrails#
Amazon Bedrock Guardrails adds a safety layer between the LLM and your users. Instead of hoping the model behaves, you enforce it:
- Content filtering — Block responses containing hate speech, violence, or sexual content
- PII detection — Automatically redact personal information from outputs
- Prompt injection detection — Catch attempts to override system instructions
- Topic denial — Block the model from discussing out-of-scope topics
Guardrails attach directly to the Bedrock model at creation time:
def _create_model() -> BedrockModel: """Create a Bedrock model with optional guardrails.""" kwargs = { "model_id": settings.bedrock_model_id, "region_name": settings.aws_region, } if settings.bedrock_guardrail_id: kwargs["guardrail_id"] = settings.bedrock_guardrail_id kwargs["guardrail_version"] = settings.bedrock_guardrail_version return BedrockModel(**kwargs)
Every agent in the system — orchestrator, researchers, and critique — uses this same factory function. That means guardrails are enforced uniformly across all LLM calls, not just user-facing ones. A researcher agent that encounters toxic content in a web search result will have its output filtered before it reaches the synthesis stage.
The guardrail configuration itself is defined in CDK and versioned alongside infrastructure. When you update content policies, you publish a new guardrail version and update the deployment — no application code changes required.
GitHub OIDC Federation#
The final layer eliminates the most common source of credential leaks: CI/CD secrets. Instead of storing long-lived AWS access keys in GitHub, we use OpenID Connect federation. GitHub Actions assumes an IAM role directly — no stored credentials anywhere.
The flow works like this: GitHub Actions requests an OIDC token from GitHub's identity provider during workflow execution. That token is presented to AWS STS via AssumeRoleWithWebIdentity. AWS validates the token against the OIDC provider configuration, checks the role's trust policy, and issues short-lived credentials (1 hour TTL).
The IAM role trust policy is scoped to specific repositories and branches:
{ "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::oidc-provider/token.actions.githubusercontent.com" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "token.actions.githubusercontent.com:aud": "sts.amazonaws.com" }, "StringLike": { "token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main" } } }
The benefits are substantial: no credential rotation schedules, no risk of leaked secrets in logs or pull requests, and a full audit trail in CloudTrail showing exactly which workflow run assumed which role. If you ever need to revoke access, you update the trust policy — no secret invalidation dance.
Observability#
Security without visibility is incomplete. The platform includes a CloudWatch-based observability stack:
- Dashboard — CPU utilization, memory usage, request counts, and error rates on a single pane
- Alarms — Automated alerts for sustained high CPU (>80%), 5xx error spikes, and memory pressure on Fargate tasks
- Structured logging — All application logs use
structlogin JSON format, making them searchable and parseable in CloudWatch Logs Insights
Structured logging is particularly important for security. When a JWT verification fails or a guardrail triggers, the structured log entry includes the user ID, session ID, and rejection reason — everything you need for incident investigation without manual log parsing.
Series Recap#
Over six posts, we built a production-grade AI research platform from the ground up:
| Blog | Title | What We Built |
|---|---|---|
| 1 | Architecture & Vision | System design, tech stack, research pipeline |
| 2 | Multi-Agent Orchestration | Parallel agents with Strands SDK, critique loop |
| 3 | Smart Search & Source Intelligence | Tavily integration, credibility scoring, circuit breaker |
| 4 | Real-Time Streaming | WebSocket protocol, CloudFront keepalive |
| 5 | Cloud-Native Infrastructure | 9 CDK stacks, VPC, Fargate, CloudFront |
| 6 | Security & Production Hardening | Six layers of defense (this post) |
The complete project demonstrates how to build, deploy, and secure a production-grade AI application on AWS. From multi-agent orchestration to real-time streaming to defense-in-depth security, every layer is designed for production — not as a prototype that "works on my machine," but as infrastructure you can hand off to a team and operate with confidence.