The infrastructure layer manages secrets, orchestrates data flow through the system, and implements comprehensive error handling to ensure reliable operation at scale.
The system uses AWS Systems Manager Parameter Store (not Secrets Manager) for API key storage:
┌─────────────────────────────────────────────────────────┐
│ getSaying Lambda │
│ │
│ 1. Check in-memory API key cache (5-min TTL) │
│ 2. If cache miss: │
│ a. Call SSM get_parameter(Name, WithDecryption) │
│ b. Retrieve encrypted SecureString │
│ c. Update cache │
│ 3. Return API key │
└─────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ AWS Systems Manager Parameter Store │
│ │
│ Parameter: /api-key/anthropic │
│ Type: SecureString (encrypted with KMS) │
│ Value: sk-ant-... (Anthropic API key) │
│ │
│ Parameter: /api-key/openai │
│ Type: SecureString (encrypted with KMS) │
│ Value: sk-... (OpenAI API key) │
└─────────────────────────────────────────────────────────┘
CLAUDE_PARAMETER_NAME environment variableOPENAI_PARAMETER_NAME environment variablearn:aws:ssm:us-east-1:*:parameter/api-key-* to allow access to API key parameters without exposing specific namesdef get_api_key(parameter_name: str) -> str:
global _api_key_cache, _api_key_cache_timestamp
current_time = time.time()
# Return cached key if valid
if parameter_name in _api_key_cache and \
current_time - _api_key_cache_timestamp < API_KEY_CACHE_TTL:
return _api_key_cache[parameter_name]
try:
response = ssm.get_parameter(
Name=parameter_name,
WithDecryption=True # Decrypt SecureString
)
api_key = response['Parameter']['Value']
# Update cache
_api_key_cache[parameter_name] = api_key
_api_key_cache_timestamp = current_time
return api_key
except ClientError as e:
logger.error("Error getting API key", ...)
raise
API_KEY_CACHE_TTL = 300)The system uses AWS Systems Manager Parameter Store for API key management, providing cost-effective secret storage with low-latency access. Standard parameters are free, eliminating the recurring costs associated with dedicated secrets services while maintaining secure storage for API keys.
The in-memory caching layer reduces API calls to Parameter Store, improving performance and minimizing service interactions. Since API keys change infrequently in this application, the combination of Parameter Store with Lambda-level caching provides the right balance of security, performance, and operational simplicity.
┌─────────────────────────────────────────────────────────┐
│ 1. User clicks "Get Saying" (Frontend) │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 2. API Gateway: GET /sayings │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 3. getSaying Lambda Handler │
│ a. Check request number (tracked in CacheManager) │
│ b. Select source based on request flow logic │
└────────────────────┬────────────────────────────────────┘
│
┌────────────┴────────────┐
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Request 1-3 │ │ Request 4 │
│ DynamoDB │ │ Live LLM │
│ (fast) │ │ (25% each) │
└───────┬───────┘ └───────┬───────┘
│ │
│ ▼
│ ┌──────────────────┐
│ │ Select LLM │
│ │ Get API Key │
│ │ Create Adapter │
│ │ Select Service │
│ │ Generate Prompt │
│ │ Call LLM API │
│ │ Parse Response │
│ └────────┬─────────┘
│ │
└───────────────────────┼───────────┐
│ │
▼ ▼
┌───────────────────────────────┐
│ Return Response │
│ (saying, source, metrics) │
└───────────┬───────────────────┘
│
▼
┌───────────────────────────────┐
│ Frontend Display │
│ (saying + source attribution) │
└───────────────────────────────┘
While request #1-3 return DynamoDB, background threads build cache:
┌─────────────────────────────────────────────────────────┐ │ Background Thread Pool (ThreadPoolExecutor) │ │ │ │ ┌────────────────────┐ │ ┌────────────────────┐ │ │ │ Thread 1: Haiku │ │ │ Thread 2: Sonnet │ │ │ │ - Get API key │ │ │ - Get API key │ │ │ │ - Select 3 services│ │ │ - Select 3 services│ │ │ │ - Generate prompt │ │ │ - Generate prompt │ │ │ │ - Call Claude API │ │ │ - Call Claude API │ │ │ │ - Parse response │ │ │ - Parse response │ │ │ │ - Add to cache │ │ │ - Add to cache │ │ │ └────────────────────┘ │ └────────────────────┘ │ │ │ │ │ ┌────────────────────┐ │ ┌────────────────────┐ │ │ │ Thread 3: GPT-4.1 │ │ │ Thread 4: GPT- │ │ │ │ - Get API key │ │ │ 4.1-mini │ │ │ │ - Select 3 services│ │ │ - Get API key │ │ │ │ - Generate prompt │ │ │ - Select 3 services│ │ │ │ - Call OpenAI API │ │ │ - Generate prompt │ │ │ │ - Parse response │ │ │ - Call OpenAI API │ │ │ │ - Retry if needed │ │ │ - Retry if needed │ │ │ │ - Add to cache │ │ │ - Add to cache │ │ │ └────────────────────┘ │ └────────────────────┘ │ └──────────────────────────┴──────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ User clicks "👍" or "👎" (Frontend) │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ API Gateway: POST /feedback │
│ Body: { │
│ saying: "...", │
│ source: "Generated by Claude Haiku 4.5", │
│ feedbackType: "POSITIVE", │
│ executionTime: 1234.5, │
│ isCached: false, │
│ originalLlmTime: 1200.0 │
│ } │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ process_feedback Lambda │
│ 1. Parse source (extract vendor/model) │
│ 2. Look up saying_id in main_humor table │
│ 3. Update main_humor (upvotes/downvotes) │
│ 4. Store feedback in saying_feedback table │
│ 5. Record CloudWatch metrics │
└─────────────────────────────────────────────────────────┘
Request Flow:
1. Try cache (if available)
└─> If empty, try DynamoDB
└─> If error, return error message
Timeout:
Rate Limiting:
API Errors:
Item Not Found:
Throttling:
Access Denied / Parameter Not Found:
OpenAI Cache Building:
Anthropic Cache Building:
All errors are logged with structured JSON:
logger.error("Error in Anthropic adapter", extra={
'extra_fields': {
'model': self.config.model_name,
'error_type': type(e).__name__,
'error_message': str(e),
'api_time_ms': metrics['api_time']
}
}, exc_info=True) # Includes stack trace
This enables CloudWatch Logs Insights queries:
fields @timestamp, error_type, error_message, model | filter error_type != "" | stats count() by error_type, model
Errors are tracked in CloudWatch Metrics:
TotalExecutionTime (always recorded)APILatency (if API call was made)TokensUsed (if API call succeeded)Cost (if API call succeeded)