The operations layer focuses on performance optimization, security hardening, and scalability considerations to ensure Cloud Sayings runs efficiently and securely in production.
Performance Considerations
Latency Optimization
- Cache-First Strategy: Warm invocations use cache (<10ms)
- Background Processing: Cache building doesn't block requests
- Parallel Execution: Multiple LLM calls in parallel (ThreadPoolExecutor)
- API Key Caching: 5-minute TTL reduces SSM calls
- DynamoDB Caching: 5-minute TTL for single-item lookups
Cost Optimization
- Cache Reduces LLM Calls: Warm invocations avoid API calls
- Batch Generation: Cache building generates 3 sayings per call
- Deduplication: Prevents duplicate API calls
- Timeout Management: Prevents hanging requests (costs money)
Token Cost Calculation
Token costs are calculated per model for accurate cost tracking:
- Claude Haiku 4.5: $0.0025/1K tokens (average input/output)
- Claude Sonnet 4.5: $0.015/1K tokens (average input/output)
- GPT-4.1-mini: $0.001/1K tokens (average input/output)
- GPT-4.1: $0.005/1K tokens (average input/output)
Scalability
- Stateless Design: Lambda functions are stateless (except in-memory cache)
- Horizontal Scaling: API Gateway + Lambda auto-scales
- DynamoDB: Pay-per-request, no capacity planning needed
- CloudFront: Global CDN for static assets
Cold Start Mitigation
- First 3 Requests: Always DynamoDB (fast, no LLM dependency)
- Background Cache Building: Doesn't block user requests
- Lambda Provisioned Concurrency: Can be enabled for critical paths (costs extra)
Security Architecture
IAM Roles & Policies
Lambda Execution Role requires:
logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents (CloudWatch Logs)
dynamodb:GetItem, dynamodb:BatchGetItem, dynamodb:Scan, dynamodb:UpdateItem (DynamoDB)
ssm:GetParameter, ssm:GetParameters (Parameter Store)
cloudwatch:PutMetricData (Custom metrics)
lambda:InvokeFunction (Dashboard stats invocation)
Least Privilege: Each Lambda has minimal required permissions. IAM policies are scoped to specific resources (e.g., specific Parameter Store parameters, specific DynamoDB tables).
API Gateway Security
- CORS: Configured for specific origins (production) or
* (development)
- Rate Limiting: API Gateway throttling (configurable)
- No Authentication: Public API (by design, for learning tool)
Data Security
- API Keys: Encrypted in Parameter Store (KMS)
- DynamoDB: Encryption at rest (AWS managed)
- Logs: No sensitive data in logs (API keys never logged)
Network Security
- VPC: Lambda functions can run in VPC (not required for this use case)
- Private Endpoints: Can use VPC endpoints for DynamoDB/SSM (reduces internet traffic)
Security Considerations
- Encryption: Parameters stored as
SecureString (encrypted with KMS)
- IAM Permissions: Lambda role requires:
{
"Effect": "Allow",
"Action": [
"ssm:GetParameter",
"ssm:GetParameters"
],
"Resource": [
"arn:aws:ssm:us-east-1:*:parameter/api-key-*"
]
}
- Least Privilege: Lambda can only read specific parameters
- No Hardcoding: API keys never appear in code or logs
- Rotation: API keys can be rotated by updating Parameter Store (cache refreshes after TTL)
Monitoring & Observability
CloudWatch Logs
Structured JSON logging for all Lambda functions:
{
"timestamp": "2024-01-15T10:30:00Z",
"level": "INFO",
"message": "Cache hit",
"extra_fields": {
"source": "haiku",
"cache_size": 5,
"request_number": 7
}
}
CloudWatch Metrics
Custom metrics tracked:
TotalExecutionTime: End-to-end request time
APILatency: LLM API call time
TokensUsed: Total tokens per request
Cost: Estimated cost per request
CacheHit: Cache hit/miss rate
CloudWatch Logs Insights Queries
Example queries for troubleshooting:
# Error tracking by model
fields @timestamp, error_type, error_message, model
| filter error_type != ""
| stats count() by error_type, model
# Average response time by source
fields @timestamp, source, execution_time
| stats avg(execution_time) by source
# Cache hit rate
fields @timestamp, cache_hit
| stats count() by cache_hit
Conclusion
This architecture document provides a comprehensive overview of the Cloud Sayings system, covering:
- LLM Gateway: Adapter pattern with provider abstraction
- Caching: Intelligent in-memory cache with deduplication and service tracking
- Dashboard: Near real-time analytics with filtering and provider-specific metrics
- Prompting: Provider-specific prompts optimized for each LLM
- Secrets: Parameter Store with caching for API keys
The system is designed for production use with:
- High availability (multiple fallback layers)
- Cost efficiency (cache reduces LLM calls)
- Observability (structured logging and metrics)
- Security (encrypted secrets, least-privilege IAM)
Key Takeaways:
- Request-number-based flow optimizes for both cold and warm starts
- Provider abstraction enables easy addition of new LLM providers
- Intelligent caching reduces costs while maintaining fast response times
- Structured logging and metrics enable effective troubleshooting and optimization
- Security is built into every layer, from encrypted secrets to least-privilege IAM policies