Assay Architecture

Event-Driven Serverless Design for Document Intelligence

5. Scalability & Performance

Automatic Scaling

Workers: Cloud Functions scale automatically based on Pub/Sub message volume. During peak upload times, more workers spin up to handle the load. During quiet periods, workers scale down to zero, minimizing costs.

Frontend: Next.js application is statically generated and served via Firebase Hosting CDN, providing global edge caching and automatic scaling.

Database: Firestore automatically scales based on read/write volume, with no manual sharding required.

Parallel Processing

Multiple processing stages run in parallel, significantly reducing total processing time:

  • Metadata + Signals Extraction: Run simultaneously after text extraction
  • Theme Matching + Strategy Selection: Can run in parallel after signals extraction
  • Chunk Summarization: In hierarchical mode, chunks are processed in parallel
Performance Impact: A document that might take 60 seconds sequentially can complete in 20-30 seconds with parallel execution.

Real-Time Updates

Firestore onSnapshot listeners provide real-time updates to the frontend, so users see progress as it happens. No polling required—updates appear instantly as processing completes.

Update Flow:

  1. Worker updates Firestore display/{documentId}
  2. Firestore triggers onSnapshot listener in frontend
  3. Frontend React component re-renders with new data
  4. User sees progress update immediately

Processing Quality Tiers

Users can select processing quality, affecting speed and cost:

Fast (Flash) Quality:

  • Uses Gemini 2.5 Flash for all LLM operations
  • Faster processing (typically 20-30 seconds)
  • Lower cost per document
  • Sufficient quality for most use cases

Premium (Pro) Quality:

  • Uses Gemini 2.5 Pro for summary generation
  • Slower processing (typically 1-3 minutes)
  • Higher cost per document
  • Higher quality summaries with more detail
Note: Metadata extraction always uses Gemini 2.5 Pro regardless of user selection to ensure accuracy.

File Size Limits

  • Maximum file size: 5MB (enforced at upload and storage trigger)
  • Rationale: Balances processing time, cost, and user experience
  • Large documents: Users with documents >5MB should split into smaller sections

6. Document Discovery & Similarity

Beyond processing individual documents, Assay enables intelligent document discovery through theme-based search and similarity matching.

Theme-Based Search

Users can search for documents by selecting themes from the hierarchical taxonomy. The system uses fuzzy matching to find relevant themes, then queries documents that match those themes.

Search Implementation:

  • Client-Side: Uses Fuse.js for fast, fuzzy matching of theme labels
  • Searchable Fields: Theme label (weight: 0.5), Theme id (weight: 0.3), Theme synonyms[] (weight: 0.2)
  • Type-Ahead: Real-time suggestions as you type
  • Multi-Theme Selection: Select multiple themes to find documents matching any of them

Query Optimization:

  • Uses Firestore array-contains-any queries (supports up to 10 values per query)
  • Max 2 queries: One for L1 themes, one for L0 themes
  • Results merged and deduplicated client-side

Similar Document Discovery

When viewing a document, users can discover similar documents based on theme overlap. The system calculates similarity scores by comparing theme sets between documents, prioritizing documents with more specific theme matches.

Jaccard Similarity Algorithm

Similarity is calculated using a weighted Jaccard coefficient that measures the proportion of overlapping themes between documents:

L1_Jaccard = |L1_intersection| / |L1_union|
L0_Jaccard = |L0_intersection| / |L0_union|
Final_Score = (0.8 × L1_Jaccard) + (0.2 × L0_Jaccard)

Why Jaccard?

  • Normalized: Accounts for document size (documents with many themes don't dominate)
  • Proportional: Measures overlap as a proportion, not absolute count
  • Built-in Diminishing Returns: Sharing 5 themes out of 10 is more significant than 5 out of 50

Weighting:

  • L1 Overlap: Weight = 0.8 (specific themes matter more)
  • L0 Overlap: Weight = 0.2 (broad domains matter less)

Query Process:

  1. Fetch source document themes (L1 and L0)
  2. Query Firestore for documents with overlapping themes (using array-contains-any)
  3. Calculate Jaccard similarity for each candidate
  4. Sort by score (descending), then by L1 count, then by L0 count, then by recency
  5. Group results: Private documents (max 15) and Public documents (max 15)

Result Grouping:

  • Private Documents: User's own documents (max 15 results)
  • Public Documents: Other users' public documents (max 15 results)
  • Results show "why matched" (which themes overlap)

7. Integration Patterns

Assay supports multiple integration patterns, enabling users to interact with their document library through various interfaces:

Web Interface

The primary interface is a Next.js web application that provides:

  • Real-time document processing updates
  • Theme-based search and document viewing
  • Upload and document management
  • Library dashboard with personalized insights

Technology Stack:

  • Frontend: Next.js 16 (React), TypeScript
  • Styling: Tailwind CSS, shadcn/ui components
  • State Management: React hooks, Firestore real-time listeners
  • Hosting: Firebase Hosting (CDN, automatic scaling)

REST API Integration

Cloud Functions expose HTTPS endpoints that enable programmatic access to document processing, search, and retrieval capabilities.

API Structure:

  • Base URL: https://api.assay.cirrusly-clever.com/api/v1
  • Authentication: API key in X-API-Key header
  • Format: ask_live_{keyId}_{keySecret}
  • Rate Limiting: Daily quotas (configurable per key, default limits apply)

API Key Management:

  • Lifespan: 24 hours (default), up to 7 days (maximum)
  • Limit: Maximum 2 active API keys per user
  • Security: Key ID (non-secret) + hashed secret, header-only (no query params)
  • Quotas: Daily request limits with automatic reset every 24 hours

Available Endpoints:

  • GET /api/v1/health - Health check
  • GET /api/v1/me - Current user/key info
  • GET /api/v1/documents/search - Unified search
  • GET /api/v1/documents/:id - Get document
  • GET /api/v1/documents/:id/summary - Get summaries (comprehensive, casual, or FAQ)
  • GET /api/v1/documents/:id/similar - Get similar documents
  • GET /api/v1/themes - Browse canonical themes
Read-Only Design: The API is read-only—all LLM synthesis happens on the user's side using their own API keys. Assay only provides data retrieval.

Model Context Protocol (MCP) Integration

Assay supports the Model Context Protocol, enabling integration with AI assistants and other tools that support the protocol.

MCP Server:

  • Location: User's local machine (runs as Node.js process)
  • Protocol: JSON-RPC 2.0 over stdio
  • Authentication: Firebase ID token (short-lived, 1 hour)
  • Transport: Standard input/output (stdio)

Available Tools (13 Total):

Search & Discovery:

  • search_documents - Search by theme, author, or title
  • search_by_theme - Search documents by specific canonical theme
  • search_by_author - Find documents by specific authors
  • search_by_title - Search document titles
  • search_by_keywords - Search in keywords, concepts, and phrases
  • browse_themes - Explore the canonical theme taxonomy
  • browse_all_documents - Browse all documents with optional filters

Document Retrieval:

  • get_document_summary - Get comprehensive, casual, or FAQ summaries
  • get_similar_documents - Find related documents using Jaccard similarity
  • get_library_insight - Get personalized research profile insight

Advanced Analysis:

  • ask_question - Ask questions about your library with AI-powered answer synthesis
  • compare_documents - Compare up to 10 documents with AI-powered comparison synthesis
  • produce_faq - Generate FAQs from multiple documents by theme

MCP Interaction Modes:

Mode 1: MCP Only (Basic Integration)

  • Tools return raw data from collection
  • Claude Desktop synthesizes responses using retrieved data
  • Flexible and cost-effective (no additional API costs)
  • Relies on Claude's interpretation of the data

Mode 2: MCP + Skills (Enhanced Integration)

  • Same tools, but with Assay Skill providing behavioral guidelines
  • More structured and directed responses
  • Optional server-side synthesis using Claude API (requires API key or credits)
  • Enforces verification patterns and proper attribution