Cloud Sayings Architecture

Infrastructure

The infrastructure layer manages secrets, orchestrates data flow through the system, and implements comprehensive error handling to ensure reliable operation at scale.

Secrets Management

Architecture

The system uses AWS Systems Manager Parameter Store (not Secrets Manager) for API key storage:

┌─────────────────────────────────────────────────────────┐
│              getSaying Lambda                           │
│                                                          │
│  1. Check in-memory API key cache (5-min TTL)          │
│  2. If cache miss:                                      │
│     a. Call SSM get_parameter(Name, WithDecryption)    │
│     b. Retrieve encrypted SecureString                 │
│     c. Update cache                                     │
│  3. Return API key                                      │
└─────────────────────┬───────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────┐
│         AWS Systems Manager Parameter Store             │
│                                                          │
│  Parameter: /api-key/anthropic                         │
│  Type: SecureString (encrypted with KMS)                │
│  Value: sk-ant-... (Anthropic API key)                  │
│                                                          │
│  Parameter: /api-key/openai                             │
│  Type: SecureString (encrypted with KMS)                │
│  Value: sk-... (OpenAI API key)                         │
└─────────────────────────────────────────────────────────┘

Parameter Names

Anthropic: Parameter name configured via CLAUDE_PARAMETER_NAME environment variable
OpenAI: Parameter name configured via OPENAI_PARAMETER_NAME environment variable
IAM Policy: Uses wildcard pattern arn:aws:ssm:us-east-1:*:parameter/api-key-* to allow access to API key parameters without exposing specific names

API Key Retrieval

def get_api_key(parameter_name: str) -> str:
    global _api_key_cache, _api_key_cache_timestamp
    current_time = time.time()

    # Return cached key if valid
    if parameter_name in _api_key_cache and \
       current_time - _api_key_cache_timestamp < API_KEY_CACHE_TTL:
        return _api_key_cache[parameter_name]

    try:
        response = ssm.get_parameter(
            Name=parameter_name,
            WithDecryption=True  # Decrypt SecureString
        )
        api_key = response['Parameter']['Value']

        # Update cache
        _api_key_cache[parameter_name] = api_key
        _api_key_cache_timestamp = current_time

        return api_key
    except ClientError as e:
        logger.error("Error getting API key", ...)
        raise

Caching Strategy

Cache TTL: 5 minutes (API_KEY_CACHE_TTL = 300)
Cache Scope: Per Lambda instance (in-memory)
Purpose: Reduce SSM API calls (cost and latency)
Invalidation: Automatic after TTL expires

Parameter Store Architecture

The system uses AWS Systems Manager Parameter Store for API key management, providing cost-effective secret storage with low-latency access. Standard parameters are free, eliminating the recurring costs associated with dedicated secrets services while maintaining secure storage for API keys.

The in-memory caching layer reduces API calls to Parameter Store, improving performance and minimizing service interactions. Since API keys change infrequently in this application, the combination of Parameter Store with Lambda-level caching provides the right balance of security, performance, and operational simplicity.

Data Flow & Request Lifecycle

Complete Request Flow

┌─────────────────────────────────────────────────────────┐
│  1. User clicks "Get Saying" (Frontend)                │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│  2. API Gateway: GET /sayings                          │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│  3. getSaying Lambda Handler                           │
│     a. Check request number (tracked in CacheManager)  │
│     b. Select source based on request flow logic       │
└────────────────────┬────────────────────────────────────┘
                     │
        ┌────────────┴────────────┐
        │                         │
        ▼                         ▼
┌───────────────┐         ┌───────────────┐
│ Request 1-3   │         │ Request 4     │
│ DynamoDB      │         │ Live LLM      │
│ (fast)        │         │ (25% each)    │
└───────┬───────┘         └───────┬───────┘
        │                         │
        │                         ▼
        │              ┌──────────────────┐
        │              │ Select LLM       │
        │              │ Get API Key      │
        │              │ Create Adapter   │
        │              │ Select Service   │
        │              │ Generate Prompt  │
        │              │ Call LLM API     │
        │              │ Parse Response   │
        │              └────────┬─────────┘
        │                       │
        └───────────────────────┼───────────┐
                                │           │
                                ▼           ▼
                    ┌───────────────────────────────┐
                    │ Return Response               │
                    │ (saying, source, metrics)     │
                    └───────────┬───────────────────┘
                                │
                                ▼
                    ┌───────────────────────────────┐
                    │ Frontend Display              │
                    │ (saying + source attribution) │
                    └───────────────────────────────┘

Background Cache Building

While request #1-3 return DynamoDB, background threads build cache:

┌─────────────────────────────────────────────────────────┐
│  Background Thread Pool (ThreadPoolExecutor)            │
│                                                          │
│  ┌────────────────────┐  │  ┌────────────────────┐     │
│  │ Thread 1: Haiku    │  │  │ Thread 2: Sonnet  │     │
│  │ - Get API key      │  │  │ - Get API key     │     │
│  │ - Select 3 services│  │  │ - Select 3 services│     │
│  │ - Generate prompt  │  │  │ - Generate prompt  │     │
│  │ - Call Claude API  │  │  │ - Call Claude API  │     │
│  │ - Parse response   │  │  │ - Parse response   │     │
│  │ - Add to cache     │  │  │ - Add to cache     │     │
│  └────────────────────┘  │  └────────────────────┘     │
│                          │                              │
│  ┌────────────────────┐  │  ┌────────────────────┐     │
│  │ Thread 3: GPT-4.1  │  │  │ Thread 4: GPT-    │     │
│  │ - Get API key      │  │  │   4.1-mini        │     │
│  │ - Select 3 services│  │  │ - Get API key     │     │
│  │ - Generate prompt  │  │  │ - Select 3 services│    │
│  │ - Call OpenAI API  │  │  │ - Generate prompt  │     │
│  │ - Parse response   │  │  │ - Call OpenAI API  │     │
│  │ - Retry if needed  │  │  │ - Retry if needed │     │
│  │ - Add to cache     │  │  │ - Add to cache     │     │
│  └────────────────────┘  │  └────────────────────┘     │
└──────────────────────────┴──────────────────────────────┘

Feedback Flow

┌─────────────────────────────────────────────────────────┐
│  User clicks "👍" or "👎" (Frontend)                    │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│  API Gateway: POST /feedback                            │
│  Body: {                                                 │
│    saying: "...",                                        │
│    source: "Generated by Claude Haiku 4.5",            │
│    feedbackType: "POSITIVE",                            │
│    executionTime: 1234.5,                               │
│    isCached: false,                                     │
│    originalLlmTime: 1200.0                             │
│  }                                                       │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────┐
│  process_feedback Lambda                                │
│  1. Parse source (extract vendor/model)                  │
│  2. Look up saying_id in main_humor table               │
│  3. Update main_humor (upvotes/downvotes)               │
│  4. Store feedback in saying_feedback table             │
│  5. Record CloudWatch metrics                           │
└─────────────────────────────────────────────────────────┘

Error Handling & Resilience

Multi-Layer Fallback Strategy

Request Flow:
1. Try cache (if available)
   └─> If empty, try DynamoDB
       └─> If error, return error message

Error Categories

1. LLM API Errors

Timeout:

Single requests: 25s timeout
Cache building: 60s timeout (OpenAI)
Fallback: DynamoDB saying

Rate Limiting:

Logged with structured logging
Fallback: DynamoDB saying
Retry: Automatic for cache building (OpenAI)

API Errors:

Network errors, authentication errors, etc.
Logged with full context
Fallback: DynamoDB saying

2. DynamoDB Errors

Item Not Found:

Generate new random ID and retry
If still fails, return error message

Throttling:

Exponential backoff (handled by boto3)
Logged for monitoring

3. SSM Parameter Store Errors

Access Denied / Parameter Not Found:

Logged with full context
Lambda fails (cannot proceed without API key)

Retry Logic

OpenAI Cache Building:

Up to 10 retries until ≥2 sayings cached
Each retry is a full LLM API call
Logged with attempt number

Anthropic Cache Building:

Count-based (1 batch = 3 sayings)
No retry logic (typically reliable)

Logging Strategy

All errors are logged with structured JSON:

logger.error("Error in Anthropic adapter", extra={
    'extra_fields': {
        'model': self.config.model_name,
        'error_type': type(e).__name__,
        'error_message': str(e),
        'api_time_ms': metrics['api_time']
    }
}, exc_info=True)  # Includes stack trace

This enables CloudWatch Logs Insights queries:

fields @timestamp, error_type, error_message, model
| filter error_type != ""
| stats count() by error_type, model

Metrics Recording

Errors are tracked in CloudWatch Metrics:

TotalExecutionTime (always recorded)
APILatency (if API call was made)
TokensUsed (if API call succeeded)
Cost (if API call succeeded)

← Analytics & Prompting Operations →

Table of Contents