Assay Architecture

Event-Driven Serverless Design for Document Intelligence

5. Scalability & Performance

Automatic Scaling

Workers: Cloud Functions scale automatically based on Pub/Sub message volume. During peak upload times, more workers spin up to handle the load. During quiet periods, workers scale down to zero, minimizing costs.

Frontend: Next.js application is statically generated and served via Firebase Hosting CDN, providing global edge caching and automatic scaling.

Database: Firestore automatically scales based on read/write volume, with no manual sharding required.

Parallel Processing

Multiple processing stages run in parallel, significantly reducing total processing time:

Metadata + Signals Extraction: Run simultaneously after text extraction
Theme Matching + Strategy Selection: Can run in parallel after signals extraction
Chunk Summarization: In hierarchical mode, chunks are processed in parallel

                    Performance Impact: A document that might take 60 seconds sequentially can complete in 20-30 seconds with parallel execution.
                

Real-Time Updates

Firestore onSnapshot listeners provide real-time updates to the frontend, so users see progress as it happens. No polling required—updates appear instantly as processing completes.

Update Flow:

Worker updates Firestore display/{documentId}
Firestore triggers onSnapshot listener in frontend
Frontend React component re-renders with new data
User sees progress update immediately

Processing Quality Tiers

Users can select processing quality, affecting speed and cost:

Fast (Flash) Quality:

Uses Gemini 2.5 Flash for all LLM operations
Faster processing (typically 20-30 seconds)
Lower cost per document
Sufficient quality for most use cases

Premium (Pro) Quality:

Uses Gemini 2.5 Pro for summary generation
Slower processing (typically 1-3 minutes)
Higher cost per document
Higher quality summaries with more detail

                    Note: Metadata extraction always uses Gemini 2.5 Pro regardless of user selection to ensure accuracy.
                

File Size Limits

Maximum file size: 5MB (enforced at upload and storage trigger)
Rationale: Balances processing time, cost, and user experience
Large documents: Users with documents >5MB should split into smaller sections

6. Document Discovery & Similarity

Beyond processing individual documents, Assay enables intelligent document discovery through theme-based search and similarity matching.

Theme-Based Search

Users can search for documents by selecting themes from the hierarchical taxonomy. The system uses fuzzy matching to find relevant themes, then queries documents that match those themes.

Search Implementation:

Client-Side: Uses Fuse.js for fast, fuzzy matching of theme labels
Searchable Fields: Theme label (weight: 0.5), Theme id (weight: 0.3), Theme synonyms[] (weight: 0.2)
Type-Ahead: Real-time suggestions as you type
Multi-Theme Selection: Select multiple themes to find documents matching any of them

Query Optimization:

Uses Firestore array-contains-any queries (supports up to 10 values per query)
Max 2 queries: One for L1 themes, one for L0 themes
Results merged and deduplicated client-side

7. Integration Patterns

Assay supports multiple integration patterns, enabling users to interact with their document library through various interfaces:

Web Interface

The primary interface is a Next.js web application that provides:

Real-time document processing updates
Theme-based search and document viewing
Upload and document management
Library dashboard with personalized insights

Technology Stack:

Frontend: Next.js 16 (React), TypeScript
Styling: Tailwind CSS, shadcn/ui components
State Management: React hooks, Firestore real-time listeners
Hosting: Firebase Hosting (CDN, automatic scaling)

REST API Integration

Cloud Functions expose HTTPS endpoints that enable programmatic access to document processing, search, and retrieval capabilities.

API Structure:

Base URL: https://api.assay.cirrusly-clever.com/api/v1
Authentication: API key in X-API-Key header
Format: ask_live_{keyId}_{keySecret}
Rate Limiting: Daily quotas (configurable per key, default limits apply)

API Key Management:

Lifespan: 24 hours (default), up to 7 days (maximum)
Limit: Maximum 2 active API keys per user
Security: Key ID (non-secret) + hashed secret, header-only (no query params)
Quotas: Daily request limits with automatic reset every 24 hours

Available Endpoints:

GET /api/v1/health - Health check
GET /api/v1/me - Current user/key info
GET /api/v1/documents/search - Unified search
GET /api/v1/documents/:id - Get document
GET /api/v1/documents/:id/summary - Get summaries (comprehensive, casual, or FAQ)
GET /api/v1/documents/:id/similar - Get similar documents
GET /api/v1/themes - Browse canonical themes

                    Read-Only Design: The API is read-only—all LLM synthesis happens on the user's side using their own API keys. Assay only provides data retrieval.
                

Model Context Protocol (MCP) Integration

Assay supports the Model Context Protocol, enabling integration with AI assistants and other tools that support the protocol.

MCP Server:

Location: User's local machine (runs as Node.js process)
Protocol: JSON-RPC 2.0 over stdio
Authentication: Firebase ID token (short-lived, 1 hour)
Transport: Standard input/output (stdio)

Available Tools (13 Total):

Search & Discovery:

search_documents - Search by theme, author, or title
search_by_theme - Search documents by specific canonical theme
search_by_author - Find documents by specific authors
search_by_title - Search document titles
search_by_keywords - Search in keywords, concepts, and phrases
browse_themes - Explore the canonical theme taxonomy
browse_all_documents - Browse all documents with optional filters

Document Retrieval:

get_document_summary - Get comprehensive, casual, or FAQ summaries
get_similar_documents - Find related documents using Jaccard similarity
get_library_insight - Get personalized research profile insight

Advanced Analysis:

ask_question - Ask questions about your library with AI-powered answer synthesis
compare_documents - Compare up to 10 documents with AI-powered comparison synthesis
produce_faq - Generate FAQs from multiple documents by theme

MCP Interaction Modes:

Mode 1: MCP Only (Basic Integration)

Tools return raw data from collection
Claude Desktop synthesizes responses using retrieved data
Flexible and cost-effective (no additional API costs)
Relies on Claude's interpretation of the data

Mode 2: MCP + Skills (Enhanced Integration)

Same tools, but with Assay Skill providing behavioral guidelines
More structured and directed responses
Optional server-side synthesis using Claude API (requires API key or credits)
Enforces verification patterns and proper attribution

← Processing & Design Operations →

Table of Contents