AgentDock Core Documentation

Memory & Storage Testing System PRD

Product Requirements Document
Version: 1.0
Date: July 9, 2025
Status: Draft
Owner: AgentDock Core Team

Executive Summary

The Memory & Storage Testing System ensures production-ready reliability, performance, and accuracy of AgentDock's core memory infrastructure. This system validates hybrid vector+text search, cross-adapter compatibility, and real-world performance scenarios across PostgreSQL, SQLite, and vector-enabled variants.

Business Impact

  • Risk Mitigation: Prevent memory system failures in production deployments
  • Performance Assurance: Guarantee <200ms response times for memory operations
  • Compatibility Validation: Ensure seamless operation across managed and self-hosted databases
  • Developer Confidence: Enable rapid feature development with comprehensive safety nets

Problem Statement

Current State

  • Incomplete Test Coverage: PostgreSQL Vector adapter has zero tests
  • Missing E2E Validation: No real embedding pipeline testing
  • Performance Unknowns: No load testing for 10K+ memory scenarios
  • Adapter Inconsistency: Different storage adapters lack unified test suites
  • Production Gaps: Managed service compatibility untested

Success Metrics

  • Test Coverage: 95% function coverage across all memory operations
  • Performance SLA: <200ms response time for hybrid search operations
  • Reliability: 99.9% uptime in production memory operations
  • Accuracy: ≥85% relevance in hybrid search results vs pure vector search

Product Overview

Core Components

1. Storage Adapter Test Suite

Comprehensive testing for all storage adapters with unified test contracts.

Adapters Covered:

  • PostgreSQL (with ts_rank_cd text search)
  • PostgreSQL Vector (with pgvector + hybrid search)
  • SQLite (with FTS5)
  • SQLite Vec (with vec0 + FTS5 BM25)

2. Memory Operations Validation

End-to-end testing of memory lifecycle operations across all storage types.

Operations Tested:

  • Store, Recall, Update, Delete (CRUD)
  • Batch operations and transactions
  • Connection discovery and graph traversal
  • Decay calculations and archival

3. Vector & Hybrid Search Testing

Validation of vector similarity and hybrid search accuracy.

Search Types:

  • Pure vector similarity (cosine, euclidean, dot product)
  • Pure text search (FTS5 BM25, ts_rank_cd)
  • Hybrid search (70% vector + 30% text)
  • Reciprocal Rank Fusion (RRF) algorithms

4. Performance & Scale Testing

Load testing and performance validation for production scenarios.

Scale Scenarios:

  • 10K+ memories with concurrent access
  • 100+ concurrent users
  • Large batch operations (1K+ memories)
  • Connection discovery across large graphs

User Stories

Memory System Developer

As a memory system developer
I want comprehensive test coverage for all storage adapters
So that I can confidently deploy new memory features without breaking existing functionality

Acceptance Criteria:

  • All storage adapters pass identical test suites
  • Test failures clearly indicate the root cause
  • Tests can be run locally with minimal setup
  • CI/CD pipeline runs all tests automatically

DevOps Engineer

As a DevOps engineer deploying AgentDock
I want performance and compatibility validation
So that I can ensure reliable operation in production environments

Acceptance Criteria:

  • Performance tests validate SLA requirements
  • Compatibility tests cover managed services (RDS, Supabase)
  • Load tests simulate realistic production scenarios
  • Resource usage is measured and documented

AI Application Developer

As an AI application developer using AgentDock
I want reliable memory operations
So that my agents maintain consistent conversational context

Acceptance Criteria:

  • Memory recall accuracy is ≥85% for semantic queries
  • Response times are consistently <200ms
  • Cross-session memory persistence works reliably
  • Memory connections enhance recall relevance

Functional Requirements

FR1: Storage Adapter Test Framework

Priority: P0 (Critical)

FR1.1: Unified Test Contracts

  • All storage adapters implement identical test suites
  • Test isolation prevents cross-contamination
  • Graceful degradation when extensions unavailable
  • Error handling validation for all failure modes

FR1.2: Memory Operations Testing

// Test contract example
interface MemoryOperationsTestSuite {
  testBasicCRUD(): Promise<void>;
  testUserIsolation(): Promise<void>;
  testBatchOperations(): Promise<void>;
  testConnections(): Promise<void>;
  testPerformance(): Promise<void>;
}

FR2: Vector Search Validation

Priority: P0 (Critical)

FR2.1: Embedding Pipeline Testing

  • Real OpenAI API integration with text-embedding-3-small
  • Embedding dimension validation (1536 dimensions)
  • Cost tracking and API rate limiting
  • Fallback mechanisms when API unavailable

FR2.2: Hybrid Search Accuracy

  • Vector similarity vs text search comparison
  • 70% vector + 30% text weight validation
  • Relevance ranking consistency
  • Cross-adapter result comparison

FR3: Performance & Scale Testing

Priority: P1 (High)

FR3.1: Load Testing

  • 10K+ memories with concurrent recall operations
  • 100+ concurrent users performing memory operations
  • Large batch storage and update operations
  • Memory connection discovery at scale

FR3.2: Performance SLA Validation

  • <200ms response time for hybrid search
  • <100ms response time for vector-only search
  • <50ms response time for text-only search
  • Memory usage and garbage collection impact

FR4: Production Scenario Testing

Priority: P1 (High)

FR4.1: Managed Service Compatibility

  • PostgreSQL RDS with pgvector extension
  • Supabase PostgreSQL configuration
  • Azure Database for PostgreSQL
  • Google Cloud SQL compatibility

FR4.2: Self-Hosted Configuration

  • PostgreSQL with manual pgvector installation
  • SQLite with vec0 extension compilation
  • Docker containerized testing environments
  • Local development setup validation

Non-Functional Requirements

Performance Requirements

  • Response Time: <200ms for 95% of hybrid search operations
  • Throughput: Support 1000+ memory operations per second
  • Concurrency: Handle 100+ concurrent users without degradation
  • Memory Usage: <2GB RAM for 100K stored memories

Reliability Requirements

  • Uptime: 99.9% availability for memory operations
  • Data Integrity: Zero data loss during failures
  • Graceful Degradation: Fallback to text search when vector unavailable
  • Error Recovery: Automatic retry with exponential backoff

Security Requirements

  • User Isolation: Complete data separation between users
  • SQL Injection Protection: Parameterized queries only
  • API Key Security: Secure handling of OpenAI API keys
  • Access Control: Memory operations require proper authorization

Compatibility Requirements

  • Database Versions: PostgreSQL 12+, SQLite 3.38+
  • Extension Dependencies: pgvector 0.5+, sqlite-vec (vec0) latest
  • Node.js Versions: 18.x, 20.x LTS
  • Operating Systems: Linux, macOS, Windows

Technical Architecture

Test Infrastructure

Database Setup

# Docker Compose for test environment
services:
  postgres-vector:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_DB: agentdock_test
      POSTGRES_USER: test
      POSTGRES_PASSWORD: test
    ports:
      - "5432:5432"
      
  sqlite-vec:
    build:
      context: ./test-infrastructure
      dockerfile: Dockerfile.sqlite-vec
    volumes:
      - ./test-data:/data

Test Data Generation

// Realistic test data sets
interface TestDataSets {
  smallDataset: {
    memories: 100;
    users: 5;
    agents: 3;
    connections: 50;
  };
  
  mediumDataset: {
    memories: 10000;
    users: 50;
    agents: 10;
    connections: 5000;
  };
  
  largeDataset: {
    memories: 100000;
    users: 500;
    agents: 50;
    connections: 50000;
  };
}

Test Categories

Unit Tests

  • Individual memory operations
  • Storage adapter implementations
  • Vector similarity calculations
  • Text search algorithms

Integration Tests

  • Memory system component interactions
  • Storage adapter compatibility
  • Embedding service integration
  • Connection graph operations

End-to-End Tests

  • Complete user workflows
  • Real embedding pipeline
  • Cross-adapter scenarios
  • Production configuration testing

Performance Tests

  • Load testing scenarios
  • Stress testing limits
  • Memory usage profiling
  • Response time validation

Implementation Plan

Phase 1: Foundation (Week 1-2) ✅ COMPLETED

Goal: Establish core testing infrastructure

Deliverables

  • PostgreSQL Vector adapter test suite (535 lines, comprehensive coverage)
  • SQLite Vec memory operations tests (complete partial implementation)
  • Unified test contracts for all adapters (test-helpers.ts)
  • CI/CD pipeline with database setup

Success Criteria

  • ✅ All storage adapters have >90% test coverage
  • ✅ CI/CD pipeline runs successfully (pnpm build passes)
  • ✅ Local development environment setup documented

Phase 2: Integration (Week 3-4) ✅ COMPLETED

Goal: Validate cross-component functionality

Deliverables

  • RecallService E2E integration tests (825 lines, comprehensive)
  • Real embedding pipeline testing with OpenAI API (mock service pattern)
  • Cross-adapter result comparison validation
  • Hybrid search accuracy benchmarking (70% vector + 30% text)

Success Criteria

  • ✅ RecallService works with all storage adapters
  • ✅ Embedding pipeline handles API failures gracefully
  • ✅ Hybrid search accuracy ≥85% vs pure vector search

Phase 3: Performance (Week 5-6)

Goal: Ensure production-ready performance

Deliverables

  • Load testing suite for 10K+ memories
  • Concurrent user testing (100+ users)
  • Performance regression detection
  • Resource usage optimization

Success Criteria

  • All performance SLAs met
  • Load tests pass without failures
  • Resource usage within acceptable limits

Phase 4: Production Readiness (Week 7-8)

Goal: Validate production deployment scenarios

Deliverables

  • Managed service compatibility testing
  • Production configuration validation
  • Disaster recovery testing
  • Documentation and runbooks

Success Criteria

  • All managed services tested successfully
  • Production configurations validated
  • Disaster recovery procedures documented

Test Specifications

Memory Operations Test Suite

Basic CRUD Operations

describe('Memory CRUD Operations', () => {
  test('store creates memory with proper isolation');
  test('recall filters by user/agent correctly');
  test('update modifies memory safely');
  test('delete removes memory completely');
  test('getById returns correct memory');
  test('getStats provides accurate counts');
});

Vector Operations Testing

describe('Vector Operations', () => {
  test('storeMemoryWithEmbedding stores vector correctly');
  test('searchByVector finds similar memories');
  test('hybridSearch combines vector + text scores');
  test('updateMemoryEmbedding modifies vectors');
  test('getMemoryEmbedding retrieves vectors');
});

Hybrid Search Validation

describe('Hybrid Search', () => {
  test('70% vector + 30% text weight distribution');
  test('PostgreSQL ts_rank_cd text scoring');
  test('SQLite FTS5 BM25 text scoring');
  test('Reciprocal Rank Fusion algorithm');
  test('result ranking consistency');
});

Performance Test Specifications

Load Testing

describe('Performance Tests', () => {
  test('10K memories storage performance', async () => {
    const startTime = Date.now();
    await storeMemories(10000);
    const duration = Date.now() - startTime;
    expect(duration).toBeLessThan(30000); // 30 seconds
  });

  test('concurrent recall operations', async () => {
    const promises = Array(100).fill(0).map(() => 
      recallMemories('test query')
    );
    const results = await Promise.all(promises);
    expect(results.every(r => r.length > 0)).toBe(true);
  });

  test('hybrid search response time', async () => {
    const startTime = Date.now();
    await hybridSearch('complex semantic query');
    const duration = Date.now() - startTime;
    expect(duration).toBeLessThan(200); // 200ms SLA
  });
});

E2E Test Scenarios

User Journey: Learning Session

describe('E2E: Learning Session', () => {
  test('complete learning workflow', async () => {
    // 1. Store working memory during learning
    const workingId = await storeWorkingMemory(
      'Learning about React hooks'
    );

    // 2. Convert to episodic memory after practice
    const episodicId = await storeEpisodicMemory(
      'Successfully built React app with hooks'
    );

    // 3. Extract semantic knowledge
    const semanticId = await storeSemanticMemory(
      'React hooks manage state in functional components'
    );

    // 4. Learn procedural pattern
    const proceduralId = await learnProceduralPattern(
      'need state management',
      'use React hooks'
    );

    // 5. Test recall with hybrid search
    const results = await hybridSearch('React state management');
    
    expect(results).toContainMemories([
      workingId, episodicId, semanticId, proceduralId
    ]);
    expect(results[0].score).toBeGreaterThan(0.8);
  });
});

Cross-Adapter Compatibility

describe('E2E: Cross-Adapter Compatibility', () => {
  test('same results across storage adapters', async () => {
    const testQuery = 'machine learning algorithms';
    const testMemories = generateTestMemories(100);

    // Store memories in all adapters
    await Promise.all([
      postgresAdapter.batchStore(testMemories),
      postgresVectorAdapter.batchStore(testMemories),
      sqliteAdapter.batchStore(testMemories),
      sqliteVecAdapter.batchStore(testMemories)
    ]);

    // Query all adapters
    const [pgResults, pgvResults, sqliteResults, sqliteVecResults] = 
      await Promise.all([
        postgresAdapter.recall(testQuery),
        postgresVectorAdapter.hybridSearch(testQuery),
        sqliteAdapter.recall(testQuery),
        sqliteVecAdapter.hybridSearch(testQuery)
      ]);

    // Validate consistency
    expect(pgvResults.length).toBeGreaterThan(pgResults.length);
    expect(sqliteVecResults.length).toBeGreaterThan(sqliteResults.length);
    expect(compareRelevance(pgvResults, sqliteVecResults)).toBeGreaterThan(0.8);
  });
});

Risk Assessment

High Risk

  • PostgreSQL Vector Testing Gap: Zero tests currently exist
  • Performance Unknowns: No load testing for scale scenarios
  • Production Compatibility: Managed services untested

Medium Risk

  • Embedding API Dependencies: OpenAI rate limits and costs
  • Extension Dependencies: pgvector and vec0 availability
  • Test Environment Complexity: Multiple database setups

Low Risk

  • Test Maintenance: New features require test updates
  • CI/CD Performance: Longer build times with comprehensive tests

Success Criteria

Functional Success

  • All storage adapters pass comprehensive test suites
  • Hybrid search accuracy ≥85% vs pure vector search
  • Zero data loss or corruption in any test scenario
  • Complete user isolation across all operations

Performance Success

  • <200ms response time for 95% of hybrid searches
  • Support 1000+ memory operations per second
  • Handle 100+ concurrent users without degradation
  • Memory usage <2GB for 100K stored memories

Quality Success

  • 95% test coverage across all memory operations
  • Zero critical bugs in production deployment
  • Successful deployment to all supported platforms
  • Developer productivity maintained with fast test execution

Appendix

Test Data Samples

// Realistic memory content for testing
const testMemories = [
  {
    content: "The user prefers dark mode in applications",
    type: "semantic",
    importance: 0.7,
    keywords: ["ui", "preferences", "dark-mode"]
  },
  {
    content: "Successfully debugged authentication issue by checking JWT token expiration",
    type: "episodic",
    importance: 0.9,
    tags: ["debugging", "authentication", "jwt"]
  },
  {
    content: "When API returns 500 error, check database connection timeout",
    type: "procedural",
    importance: 0.8,
    pattern: "api-error-debugging"
  }
];

Performance Benchmarks

// Expected performance baselines
const performanceBaselines = {
  vectorSearch: {
    small: "< 50ms for 1K memories",
    medium: "< 100ms for 10K memories", 
    large: "< 200ms for 100K memories"
  },
  hybridSearch: {
    small: "< 100ms for 1K memories",
    medium: "< 200ms for 10K memories",
    large: "< 500ms for 100K memories"
  },
  storage: {
    single: "< 10ms per memory",
    batch: "< 5ms per memory in batch of 100"
  }
};

Document Control

  • Created: July 9, 2025
  • Last Updated: July 9, 2025
  • Next Review: July 16, 2025
  • Approvers: Engineering Lead, Product Manager, QA Lead