Cloud Sayings Architecture

Operations

The operations layer focuses on performance optimization, security hardening, and scalability considerations to ensure Cloud Sayings runs efficiently and securely in production.

Performance Considerations

Latency Optimization

Cache-First Strategy: Warm invocations use cache (<10ms)
Background Processing: Cache building doesn't block requests
Parallel Execution: Multiple LLM calls in parallel (ThreadPoolExecutor)
API Key Caching: 5-minute TTL reduces SSM calls
DynamoDB Caching: 5-minute TTL for single-item lookups

Cost Optimization

Cache Reduces LLM Calls: Warm invocations avoid API calls
Batch Generation: Cache building generates 3 sayings per call
Deduplication: Prevents duplicate API calls
Timeout Management: Prevents hanging requests (costs money)

Token Cost Calculation

Token costs are calculated per model for accurate cost tracking:

Claude Haiku 4.5: $0.0025/1K tokens (average input/output)
Claude Sonnet 4.5: $0.015/1K tokens (average input/output)
GPT-4.1-mini: $0.001/1K tokens (average input/output)
GPT-4.1: $0.005/1K tokens (average input/output)

Scalability

Stateless Design: Lambda functions are stateless (except in-memory cache)
Horizontal Scaling: API Gateway + Lambda auto-scales
DynamoDB: Pay-per-request, no capacity planning needed
CloudFront: Global CDN for static assets

Cold Start Mitigation

First 3 Requests: Always DynamoDB (fast, no LLM dependency)
Background Cache Building: Doesn't block user requests
Lambda Provisioned Concurrency: Can be enabled for critical paths (costs extra)

Security Architecture

IAM Roles & Policies

Lambda Execution Role requires:

logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents (CloudWatch Logs)
dynamodb:GetItem, dynamodb:BatchGetItem, dynamodb:Scan, dynamodb:UpdateItem (DynamoDB)
ssm:GetParameter, ssm:GetParameters (Parameter Store)
cloudwatch:PutMetricData (Custom metrics)
lambda:InvokeFunction (Dashboard stats invocation)

                    Least Privilege: Each Lambda has minimal required permissions. IAM policies are scoped to specific resources (e.g., specific Parameter Store parameters, specific DynamoDB tables).
                

API Gateway Security

CORS: Configured for specific origins (production) or * (development)
Rate Limiting: API Gateway throttling (configurable)
No Authentication: Public API (by design, for learning tool)

Data Security

API Keys: Encrypted in Parameter Store (KMS)
DynamoDB: Encryption at rest (AWS managed)
Logs: No sensitive data in logs (API keys never logged)

Network Security

VPC: Lambda functions can run in VPC (not required for this use case)
Private Endpoints: Can use VPC endpoints for DynamoDB/SSM (reduces internet traffic)

Security Considerations

Encryption: Parameters stored as SecureString (encrypted with KMS)

IAM Permissions: Lambda role requires:

{
  "Effect": "Allow",
  "Action": [
    "ssm:GetParameter",
    "ssm:GetParameters"
  ],
  "Resource": [
    "arn:aws:ssm:us-east-1:*:parameter/api-key-*"
  ]
}

Least Privilege: Lambda can only read specific parameters
No Hardcoding: API keys never appear in code or logs
Rotation: API keys can be rotated by updating Parameter Store (cache refreshes after TTL)

Monitoring & Observability

CloudWatch Logs

Structured JSON logging for all Lambda functions:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "INFO",
  "message": "Cache hit",
  "extra_fields": {
    "source": "haiku",
    "cache_size": 5,
    "request_number": 7
  }
}

CloudWatch Metrics

Custom metrics tracked:

TotalExecutionTime: End-to-end request time
APILatency: LLM API call time
TokensUsed: Total tokens per request
Cost: Estimated cost per request
CacheHit: Cache hit/miss rate

CloudWatch Logs Insights Queries

Example queries for troubleshooting:

# Error tracking by model
fields @timestamp, error_type, error_message, model
| filter error_type != ""
| stats count() by error_type, model

# Average response time by source
fields @timestamp, source, execution_time
| stats avg(execution_time) by source

# Cache hit rate
fields @timestamp, cache_hit
| stats count() by cache_hit

Conclusion

This architecture document provides a comprehensive overview of the Cloud Sayings system, covering:

LLM Gateway: Adapter pattern with provider abstraction
Caching: Intelligent in-memory cache with deduplication and service tracking
Dashboard: Near real-time analytics with filtering and provider-specific metrics
Prompting: Provider-specific prompts optimized for each LLM
Secrets: Parameter Store with caching for API keys

The system is designed for production use with:

High availability (multiple fallback layers)
Cost efficiency (cache reduces LLM calls)
Observability (structured logging and metrics)
Security (encrypted secrets, least-privilege IAM)

                    Key Takeaways:
                    Request-number-based flow optimizes for both cold and warm starts
Provider abstraction enables easy addition of new LLM providers
Intelligent caching reduces costs while maintaining fast response times
Structured logging and metrics enable effective troubleshooting and optimization
Security is built into every layer, from encrypted secrets to least-privilege IAM policies

                

← Infrastructure

Table of Contents