Cloud Sayings is a serverless web application designed to compare the effectiveness of different Large Language Models (LLMs). While it generates witty analogies for AWS services, the real purpose is to provide a mechanism for side-by-side model comparison, allowing users to evaluate how different models (Claude-3, GPT-4, etc.) perform the same task. The system is designed for high availability, low latency, and cost efficiency through intelligent caching and provider abstraction.
Cloud Sayings leverages a serverless architecture on AWS, combining intelligent caching with multi-provider LLM integration to enable real-time comparison of model outputs. The system presents identical prompts to multiple models simultaneously, allowing users to evaluate differences in tone, accuracy, latency, and behavior across providers.
main_humor, saying_feedback)The system uses a request-number-based flow rather than pure probability-based selection, optimizing for both performance and user experience.
┌─────────────────────────────────────────────────────────┐
│ 1. User clicks "Get Saying" (Frontend) │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 2. API Gateway: GET /sayings │
└────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ 3. getSaying Lambda Handler │
│ a. Check request number (tracked in CacheManager) │
│ b. Select source based on request flow logic │
└────────────────────┬────────────────────────────────────┘
│
┌────────────┴────────────┐
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Request 1-3 │ │ Request 4 │
│ DynamoDB │ │ Live LLM │
│ (fast) │ │ (25% each) │
└───────┬───────┘ └───────┬───────┘
│ │
│ ▼
│ ┌──────────────────┐
│ │ Request 5+ │
│ │ Cache Lookup │
│ │ (from available │
│ │ LLMs) │
│ └────────┬─────────┘
│ │
└───────────────────────┼───────────┐
│ │
▼ ▼
┌───────────────────────────────┐
│ Return Response │
│ (saying, source, metrics) │
└───────────┬───────────────────┘
│
▼
┌───────────────────────────────┐
│ Frontend Display │
│ (saying + source attribution) │
└───────────────────────────────┘
The request-based architecture balances performance across different Lambda states. During cold starts, the system serves cached responses from DynamoDB with sub-50ms latency while building the in-memory cache in the background. Once warm, the Lambda maintains a local cache that eliminates LLM API calls during high-traffic periods.
This design provides immediate responses for initial requests while reducing LLM costs for active sessions. Users get fast responses from cache when available, and live LLM generations when engaging with fresh content, creating a responsive experience that scales cost-effectively.