Cloud Sayings Architecture

Analytics & Prompting

The analytics dashboard provides near real-time insights into model performance and user feedback, while the prompting strategy ensures consistent, high-quality analogies optimized for each LLM provider.

Dashboard Analytics

Architecture

The dashboard aggregates feedback data from DynamoDB and CloudWatch to provide near real-time analytics:

┌─────────────────────────────────────────────────────────┐
│              getDashboardStats Lambda                    │
│                                                          │
│  1. Check 2-minute in-memory cache                      │
│  2. If cache miss:                                       │
│     a. Scan saying_feedback table (DynamoDB)            │
│     b. Query CloudWatch for total invocations           │
│     c. Aggregate by provider/model                      │
│     d. Extract AWS services from saying text             │
│     e. Calculate success rates, scores, response times  │
│     f. Update cache                                      │
│  3. Return JSON response                                │
└─────────────────────────────────────────────────────────┘

Data Aggregation

Provider-Level Metrics:

For each feedback record:

  • Count positive/negative feedback
  • Track execution times (filtering cached responses)
  • Calculate success rate: (positive / total) * 100
  • Calculate score: positive - negative
  • Calculate average response time: sum(execution_times) / len(execution_times)

Response Time Calculation

The system uses a priority-based approach for response time tracking:

  1. Priority 1: original_llm_time (most accurate - from new data)
    • Stored when feedback is submitted with cached response
    • Represents actual LLM API call time
  2. Priority 2: is_cached flag (explicit marking)
    • If is_cached=True and no original_llm_time, skip (likely cache lookup)
  3. Priority 3: Heuristic for historical data
    • If execution_time < 100ms, assume cached (skip)
    • If execution_time ≥ 100ms, assume direct LLM call (use it)
CACHE_THRESHOLD_MS = 100

if original_llm_time is not None:
    # Use original LLM API time (most accurate)
    exec_time_float = float(original_llm_time)
    provider_stats[vendor]['execution_times'].append(exec_time_float)
elif execution_time is not None:
    exec_time_float = float(execution_time)
    if is_cached:
        # Skip cached responses without original LLM time
        pass
    elif exec_time_float < CACHE_THRESHOLD_MS:
        # Likely cached - skip it (heuristic)
        pass
    else:
        # Likely direct LLM call - use it (heuristic for historical data)
        provider_stats[vendor]['execution_times'].append(exec_time_float)

AWS Service Extraction

The dashboard extracts AWS service names from saying text using regex patterns:

  1. Try full service name match (Amazon X or AWS X)
  2. Try abbreviation match (S3, Lambda, EC2, etc.)
  3. Try canonical name matching
  4. Fallback to "Unknown"

Service mappings handle variations:

  • "DynamoDB" → "Amazon DynamoDB"
  • "Lambda" → "AWS Lambda"
  • "EC2" → "Amazon EC2"

Caching Strategy

  • Cache TTL: 2 minutes
  • Cache Key: In-memory variable (per Lambda instance)
  • Cache Invalidation: Automatic after TTL expires
  • Purpose: Near real-time updates while reducing DynamoDB scan costs

LLM Prompting Strategy

Prompt Architecture

The system uses two distinct prompt types:

  1. Single Saying Prompt: For live LLM calls (request #4)
  2. Cache Building Prompt: For batch generation (3 sayings at once)

Provider-Specific Prompt Differences

Anthropic (Claude) Prompts

Philosophy: Detailed, structured, context-rich

Single Saying Prompt Structure:

### ROLE
You are a witty cloud computing expert...

### TASK
Create exactly ONE clever analogy for the AWS service: {service_name}

### CONTEXT
Category: {category_name}
Related services in this category:
- Service 1
- Service 2
...

### OUTPUT FORMAT
The analogy MUST follow this exact format:
"[Service Name] is like [clever comparison] - [technical insight]!"

### REQUIREMENTS
1. Use the exact service name: {service_name}
2. Maximum 120 characters total
3. Must end with an exclamation mark (!)
...

### STYLE GUIDELINES
- Tone: Upbeat, constructive, and positive
- Avoid: ninjas, martial arts, butlers, chefs
...

### EXAMPLES
Example 1: "Amazon S3 is like..."
Example 2: "AWS Lambda is like..."
...

### YOUR TASK
Now create an analogy for {service_name}...

OpenAI (GPT) Prompts

Philosophy: Concise, direct, optimized for single responses

Single Saying Prompt Structure:

System: You are a witty cloud computing expert who creates brand-positive,
        upbeat analogies that make AWS services relatable and memorable.

User: Generate exactly one analogy for {service_name}.

Requirements:
- Format: "[{service_name}] is like [comparison] - [insight]!"
- Under 120 characters total
- End with exclamation mark
- Output ONLY the analogy, nothing else

Key Differences

Aspect Anthropic (Claude) OpenAI (GPT)
Structure Hierarchical sections (ROLE, TASK, CONTEXT, etc.) Flat system/user messages
Detail Level Highly detailed with examples Concise and direct
Context Full category context Minimal context
Examples Multiple examples provided No examples in prompt
Format Instructions Repeated in multiple sections Single clear instruction block
Length ~500-800 tokens ~200-300 tokens

Service Selection Strategy

Critical: The system explicitly selects services rather than asking the LLM to pick randomly:

def get_llm_sayings_for_cache(provider: str, model_name: str):
    # Select 3 different services explicitly
    selected_services = []
    for _ in range(3):
        selection = select_random_service()  # Avoids recently used
        selected_services.append(selection.service_name)

    # Create prompt with explicit services
    services_list = '\n'.join([f"- {service}" for service in selected_services])
    # ... prompt includes these exact services

This ensures:

  • Variety: Services are explicitly different
  • Control: No reliance on LLM randomness
  • Tracking: Recently used services are avoided

Response Parsing

The system uses multiple parsing strategies to extract sayings from LLM responses:

Strategy 1: Newline Splitting

lines = [line.strip() for line in content.split('\n') if line.strip()]
for line in lines:
    saying = line.strip().strip('"\'')
    saying = re.sub(r'^\d+[\.\)]\s*', '', saying)  # Remove numbering
    saying = re.sub(r'^[-*]\s*', '', saying)  # Remove markdown
    if saying and saying.endswith('!') and len(saying) <= 130:
        sayings.append(saying)

Strategy 2: Quoted Sayings

quoted_matches = re.findall(r'"[^"]+!"', content)

Strategy 3: Pattern Matching

pattern = r'([A-Za-z0-9\s\-]+(?:AWS|Amazon)?\s+[A-Za-z0-9\s\-]+)\s+is\s+like\s+([^!]+)!'
matches = re.findall(pattern, content, re.IGNORECASE)
Character Limit Handling:
  • Target: 120 characters per saying
  • Acceptance: Up to 130 characters (allows slight overflow from OpenAI)
  • Truncation: If >130 chars, truncate at last complete sentence (ending with !)