Deep Research Agent Series — Blog 6: Security & Production Hardening#

You've built an AI research agent that works. Now make it safe. In this final post, we add six layers of security — from Cognito JWT authentication and WAF rate limiting to Bedrock Guardrails for AI content safety and GitHub OIDC federation for zero-credential deployments. Production isn't just about features that work; it's about features that work safely under adversarial conditions.

Part	Topic	Status
Blog 1	Architecture & Vision	Published
Blog 2	Multi-Agent Orchestration	Published
Blog 3	Smart Search & Source Intelligence	Published
Blog 4	Real-Time Streaming with WebSocket	Published
Blog 5	Cloud-Native Infrastructure on AWS	Published
Blog 6	Security & Production Hardening	You are here

The Security Stack#

Security isn't a single feature — it's a stack. Our platform implements six layers, each addressing a different threat surface:

CloudFront + WAF — Edge protection, rate limiting, bot control
Cognito JWT Auth — User identity and access tokens
VPC Data Perimeter — Network-level isolation
Secrets Manager — Zero secrets in code
Bedrock Guardrails — AI content safety
GitHub OIDC — Zero-credential CI/CD

Each layer operates independently. If one fails, the others still protect you. Let's walk through each one.

Cognito JWT Authentication#

Authentication is the front door. Every API request and WebSocket connection must carry a valid JWT issued by Amazon Cognito. Here's the full verification middleware:

import jwt
from jwt import PyJWTError as JWTError
import httpx

_jwks_cache: dict | None = None

async def get_jwks() -> dict:
    """Fetch and cache JWKS from Cognito."""
    global _jwks_cache
    if _jwks_cache:
        return _jwks_cache
    url = (
        f"https://cognito-idp.{settings.aws_region}.amazonaws.com"
        f"/{settings.cognito_user_pool_id}/.well-known/jwks.json"
    )
    async with httpx.AsyncClient() as client:
        resp = await client.get(url)
        _jwks_cache = resp.json()
    return _jwks_cache

async def verify_token(token: str) -> dict:
    """Verify a Cognito JWT access token."""
    if not settings.cognito_user_pool_id:
        return {"sub": "dev-user", "email": "dev@local"}

    jwks = await get_jwks()
    unverified_header = jwt.get_unverified_header(token)

    key = None
    for k in jwks.get("keys", []):
        if k["kid"] == unverified_header.get("kid"):
            key = k
            break

    if not key:
        raise JWTError("No matching key found in JWKS")

    claims = jwt.decode(
        token,
        key,
        algorithms=["RS256"],
        issuer=(
            f"https://cognito-idp.{settings.aws_region}.amazonaws.com"
            f"/{settings.cognito_user_pool_id}"
        ),
        options={"verify_aud": False},
    )

    if claims.get("token_use") != "access":
        raise JWTError("Expected access token")
    if (
        settings.cognito_app_client_id
        and claims.get("client_id") != settings.cognito_app_client_id
    ):
        raise JWTError("client_id mismatch")
    return claims

Several design decisions here are worth calling out. First, the JWKS is fetched once and cached in memory. Cognito rotates keys infrequently, so a process-level cache avoids a network round-trip on every request. Second, the dev-mode bypass — when cognito_user_pool_id is not configured, the middleware returns a synthetic user. This lets you run the full stack locally without standing up Cognito.

Why Access Tokens, Not ID Tokens?#

Many tutorials validate ID tokens for API authorization. This is wrong per the OAuth 2.0 spec. ID tokens contain user profile data — email, name, profile picture — and are intended for the frontend to display user information. Access tokens contain authorization data — scopes, client_id, token_use — and are the correct token type for API authorization.

This distinction matters practically: access tokens don't carry an aud (audience) claim in Cognito's implementation. Instead, we verify client_id to ensure the token was issued for our application. We also set verify_aud: False in the decode options to prevent PyJWT from rejecting a perfectly valid access token.

WebSocket Authentication#

WebSocket connections present a unique challenge: the WebSocket protocol doesn't support custom headers during the handshake. You can't send an Authorization: Bearer <token> header the way you would with an HTTP request.

The solution is straightforward — pass the token as a query parameter:

@app.websocket("/ws/chat/{session_id}")
async def websocket_chat(
    websocket: WebSocket,
    session_id: str,
    token: str = Query(default=""),
):
    """WebSocket endpoint with token-based auth."""
    try:
        claims = await verify_token(token)
    except JWTError:
        await websocket.close(code=4001, reason="Unauthorized")
        return

    await websocket.accept()
    user_id = claims.get("sub", "anonymous")
    # ... proceed with authenticated connection

The token is validated before the WebSocket is accepted. If verification fails, we close with a 4001 code — a custom close code that the frontend can catch and redirect to login.

WAF Protection#

AWS WAF sits in front of CloudFront, inspecting every request before it reaches your application. Our configuration uses four rule groups in priority order:

AllowAppPaths (priority 0) — Explicitly allow /ws/*, /api/*, and /health before any managed rules fire
Rate Limiting (priority 1) — Cap at 2000 requests per 5 minutes per IP
AWS Managed Rules (priority 2) — OWASP top 10, SQL injection, cross-site scripting
Bot Control (priority 3) — Block known bad bots and scrapers

The ordering matters. AWS managed rules are aggressive — they can false-positive on WebSocket binary frames, JSON payloads with special characters, or API requests with encoded content. By placing AllowAppPaths at priority 0, we ensure that legitimate application traffic is always permitted. The managed rules then catch anything that falls through — direct access attempts, probing, and abuse.

Rate limiting at 2000 requests per 5 minutes translates to roughly 6-7 requests per second sustained. That's generous enough for a single user running multiple research sessions simultaneously, but tight enough to stop automated abuse.

VPC Data Perimeter#

Network isolation goes beyond security groups. Our VPC endpoints implement a data perimeter policy based on the AWS whitepaper pattern — every VPC endpoint is scoped to allow traffic only from the owning AWS account:

{
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "*",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:PrincipalAccount": "123456789012"
                }
            }
        },
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": "*",
            "Resource": "*",
            "Condition": {
                "Bool": {
                    "aws:PrincipalIsAWSService": "true"
                }
            }
        }
    ]
}

This prevents a critical attack vector: data exfiltration via AWS services. Without this policy, a compromised workload could call S3, Secrets Manager, or Bedrock endpoints in a different account, shipping your data elsewhere. The aws:PrincipalAccount condition locks every API call to your account. The aws:PrincipalIsAWSService exception allows AWS services like CloudTrail and Config to function normally — they use service-linked roles that wouldn't pass the account check otherwise.

Secrets Management#

Zero secrets in code, environment variables, or git. Every secret is stored in AWS Secrets Manager and retrieved at runtime:

def _load_tavily_key() -> str:
    """Load Tavily API key from Secrets Manager."""
    if settings.tavily_api_key_secret_arn:
        secrets = boto3.client(
            "secretsmanager", region_name=settings.aws_region
        )
        secret = secrets.get_secret_value(
            SecretId=settings.tavily_api_key_secret_arn
        )
        return json.loads(secret["SecretString"])["TAVILY_API_KEY"]
    return settings.tavily_api_key or ""

The pattern is consistent across all secrets — Tavily API key, Google OAuth client credentials, any third-party API keys. CDK creates the secrets during deployment with initial placeholder values. After deployment, you update the secret values once through the console or CLI. The application retrieves them at startup and caches for the process lifetime.

The fallback to settings.tavily_api_key supports local development where you might set the key directly in a .env file. In production, the ARN is always configured, so the Secrets Manager path is taken.

Bedrock Guardrails#

Amazon Bedrock Guardrails adds a safety layer between the LLM and your users. Instead of hoping the model behaves, you enforce it:

Content filtering — Block responses containing hate speech, violence, or sexual content
PII detection — Automatically redact personal information from outputs
Prompt injection detection — Catch attempts to override system instructions
Topic denial — Block the model from discussing out-of-scope topics

Guardrails attach directly to the Bedrock model at creation time:

def _create_model() -> BedrockModel:
    """Create a Bedrock model with optional guardrails."""
    kwargs = {
        "model_id": settings.bedrock_model_id,
        "region_name": settings.aws_region,
    }
    if settings.bedrock_guardrail_id:
        kwargs["guardrail_id"] = settings.bedrock_guardrail_id
        kwargs["guardrail_version"] = settings.bedrock_guardrail_version
    return BedrockModel(**kwargs)

Every agent in the system — orchestrator, researchers, and critique — uses this same factory function. That means guardrails are enforced uniformly across all LLM calls, not just user-facing ones. A researcher agent that encounters toxic content in a web search result will have its output filtered before it reaches the synthesis stage.

The guardrail configuration itself is defined in CDK and versioned alongside infrastructure. When you update content policies, you publish a new guardrail version and update the deployment — no application code changes required.

GitHub OIDC Federation#

The final layer eliminates the most common source of credential leaks: CI/CD secrets. Instead of storing long-lived AWS access keys in GitHub, we use OpenID Connect federation. GitHub Actions assumes an IAM role directly — no stored credentials anywhere.

The flow works like this: GitHub Actions requests an OIDC token from GitHub's identity provider during workflow execution. That token is presented to AWS STS via AssumeRoleWithWebIdentity. AWS validates the token against the OIDC provider configuration, checks the role's trust policy, and issues short-lived credentials (1 hour TTL).

The IAM role trust policy is scoped to specific repositories and branches:

{
    "Effect": "Allow",
    "Principal": {
        "Federated": "arn:aws:iam::oidc-provider/token.actions.githubusercontent.com"
    },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
        "StringEquals": {
            "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
        },
        "StringLike": {
            "token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
        }
    }
}

The benefits are substantial: no credential rotation schedules, no risk of leaked secrets in logs or pull requests, and a full audit trail in CloudTrail showing exactly which workflow run assumed which role. If you ever need to revoke access, you update the trust policy — no secret invalidation dance.

Observability#

Security without visibility is incomplete. The platform includes a CloudWatch-based observability stack:

Dashboard — CPU utilization, memory usage, request counts, and error rates on a single pane
Alarms — Automated alerts for sustained high CPU (>80%), 5xx error spikes, and memory pressure on Fargate tasks
Structured logging — All application logs use structlog in JSON format, making them searchable and parseable in CloudWatch Logs Insights

Structured logging is particularly important for security. When a JWT verification fails or a guardrail triggers, the structured log entry includes the user ID, session ID, and rejection reason — everything you need for incident investigation without manual log parsing.

Series Recap#

Over six posts, we built a production-grade AI research platform from the ground up:

Blog	Title	What We Built
1	Architecture & Vision	System design, tech stack, research pipeline
2	Multi-Agent Orchestration	Parallel agents with Strands SDK, critique loop
3	Smart Search & Source Intelligence	Tavily integration, credibility scoring, circuit breaker
4	Real-Time Streaming	WebSocket protocol, CloudFront keepalive
5	Cloud-Native Infrastructure	9 CDK stacks, VPC, Fargate, CloudFront
6	Security & Production Hardening	Six layers of defense (this post)

The complete project demonstrates how to build, deploy, and secure a production-grade AI application on AWS. From multi-agent orchestration to real-time streaming to defense-in-depth security, every layer is designed for production — not as a prototype that "works on my machine," but as infrastructure you can hand off to a team and operate with confidence.

Deep Research Agent Series — Blog 6: Security & Production Hardening