Memory & Storage Testing System PRD
Product Requirements Document
Version: 1.0
Date: July 9, 2025
Status: Draft
Owner: AgentDock Core Team
Executive Summary
The Memory & Storage Testing System ensures production-ready reliability, performance, and accuracy of AgentDock's core memory infrastructure. This system validates hybrid vector+text search, cross-adapter compatibility, and real-world performance scenarios across PostgreSQL, SQLite, and vector-enabled variants.
Business Impact
- Risk Mitigation: Prevent memory system failures in production deployments
- Performance Assurance: Guarantee <200ms response times for memory operations
- Compatibility Validation: Ensure seamless operation across managed and self-hosted databases
- Developer Confidence: Enable rapid feature development with comprehensive safety nets
Problem Statement
Current State
- Incomplete Test Coverage: PostgreSQL Vector adapter has zero tests
- Missing E2E Validation: No real embedding pipeline testing
- Performance Unknowns: No load testing for 10K+ memory scenarios
- Adapter Inconsistency: Different storage adapters lack unified test suites
- Production Gaps: Managed service compatibility untested
Success Metrics
- Test Coverage: 95% function coverage across all memory operations
- Performance SLA: <200ms response time for hybrid search operations
- Reliability: 99.9% uptime in production memory operations
- Accuracy: ≥85% relevance in hybrid search results vs pure vector search
Product Overview
Core Components
1. Storage Adapter Test Suite
Comprehensive testing for all storage adapters with unified test contracts.
Adapters Covered:
- PostgreSQL (with ts_rank_cd text search)
- PostgreSQL Vector (with pgvector + hybrid search)
- SQLite (with FTS5)
- SQLite Vec (with vec0 + FTS5 BM25)
2. Memory Operations Validation
End-to-end testing of memory lifecycle operations across all storage types.
Operations Tested:
- Store, Recall, Update, Delete (CRUD)
- Batch operations and transactions
- Connection discovery and graph traversal
- Decay calculations and archival
3. Vector & Hybrid Search Testing
Validation of vector similarity and hybrid search accuracy.
Search Types:
- Pure vector similarity (cosine, euclidean, dot product)
- Pure text search (FTS5 BM25, ts_rank_cd)
- Hybrid search (70% vector + 30% text)
- Reciprocal Rank Fusion (RRF) algorithms
4. Performance & Scale Testing
Load testing and performance validation for production scenarios.
Scale Scenarios:
- 10K+ memories with concurrent access
- 100+ concurrent users
- Large batch operations (1K+ memories)
- Connection discovery across large graphs
User Stories
Memory System Developer
As a memory system developer
I want comprehensive test coverage for all storage adapters
So that I can confidently deploy new memory features without breaking existing functionality
Acceptance Criteria:
- All storage adapters pass identical test suites
- Test failures clearly indicate the root cause
- Tests can be run locally with minimal setup
- CI/CD pipeline runs all tests automatically
DevOps Engineer
As a DevOps engineer deploying AgentDock
I want performance and compatibility validation
So that I can ensure reliable operation in production environments
Acceptance Criteria:
- Performance tests validate SLA requirements
- Compatibility tests cover managed services (RDS, Supabase)
- Load tests simulate realistic production scenarios
- Resource usage is measured and documented
AI Application Developer
As an AI application developer using AgentDock
I want reliable memory operations
So that my agents maintain consistent conversational context
Acceptance Criteria:
- Memory recall accuracy is ≥85% for semantic queries
- Response times are consistently <200ms
- Cross-session memory persistence works reliably
- Memory connections enhance recall relevance
Functional Requirements
FR1: Storage Adapter Test Framework
Priority: P0 (Critical)
FR1.1: Unified Test Contracts
- All storage adapters implement identical test suites
- Test isolation prevents cross-contamination
- Graceful degradation when extensions unavailable
- Error handling validation for all failure modes
FR1.2: Memory Operations Testing
// Test contract example
interface MemoryOperationsTestSuite {
testBasicCRUD(): Promise<void>;
testUserIsolation(): Promise<void>;
testBatchOperations(): Promise<void>;
testConnections(): Promise<void>;
testPerformance(): Promise<void>;
}
FR2: Vector Search Validation
Priority: P0 (Critical)
FR2.1: Embedding Pipeline Testing
- Real OpenAI API integration with text-embedding-3-small
- Embedding dimension validation (1536 dimensions)
- Cost tracking and API rate limiting
- Fallback mechanisms when API unavailable
FR2.2: Hybrid Search Accuracy
- Vector similarity vs text search comparison
- 70% vector + 30% text weight validation
- Relevance ranking consistency
- Cross-adapter result comparison
FR3: Performance & Scale Testing
Priority: P1 (High)
FR3.1: Load Testing
- 10K+ memories with concurrent recall operations
- 100+ concurrent users performing memory operations
- Large batch storage and update operations
- Memory connection discovery at scale
FR3.2: Performance SLA Validation
- <200ms response time for hybrid search
- <100ms response time for vector-only search
- <50ms response time for text-only search
- Memory usage and garbage collection impact
FR4: Production Scenario Testing
Priority: P1 (High)
FR4.1: Managed Service Compatibility
- PostgreSQL RDS with pgvector extension
- Supabase PostgreSQL configuration
- Azure Database for PostgreSQL
- Google Cloud SQL compatibility
FR4.2: Self-Hosted Configuration
- PostgreSQL with manual pgvector installation
- SQLite with vec0 extension compilation
- Docker containerized testing environments
- Local development setup validation
Non-Functional Requirements
Performance Requirements
- Response Time: <200ms for 95% of hybrid search operations
- Throughput: Support 1000+ memory operations per second
- Concurrency: Handle 100+ concurrent users without degradation
- Memory Usage: <2GB RAM for 100K stored memories
Reliability Requirements
- Uptime: 99.9% availability for memory operations
- Data Integrity: Zero data loss during failures
- Graceful Degradation: Fallback to text search when vector unavailable
- Error Recovery: Automatic retry with exponential backoff
Security Requirements
- User Isolation: Complete data separation between users
- SQL Injection Protection: Parameterized queries only
- API Key Security: Secure handling of OpenAI API keys
- Access Control: Memory operations require proper authorization
Compatibility Requirements
- Database Versions: PostgreSQL 12+, SQLite 3.38+
- Extension Dependencies: pgvector 0.5+, sqlite-vec (vec0) latest
- Node.js Versions: 18.x, 20.x LTS
- Operating Systems: Linux, macOS, Windows
Technical Architecture
Test Infrastructure
Database Setup
# Docker Compose for test environment
services:
postgres-vector:
image: pgvector/pgvector:pg16
environment:
POSTGRES_DB: agentdock_test
POSTGRES_USER: test
POSTGRES_PASSWORD: test
ports:
- "5432:5432"
sqlite-vec:
build:
context: ./test-infrastructure
dockerfile: Dockerfile.sqlite-vec
volumes:
- ./test-data:/data
Test Data Generation
// Realistic test data sets
interface TestDataSets {
smallDataset: {
memories: 100;
users: 5;
agents: 3;
connections: 50;
};
mediumDataset: {
memories: 10000;
users: 50;
agents: 10;
connections: 5000;
};
largeDataset: {
memories: 100000;
users: 500;
agents: 50;
connections: 50000;
};
}
Test Categories
Unit Tests
- Individual memory operations
- Storage adapter implementations
- Vector similarity calculations
- Text search algorithms
Integration Tests
- Memory system component interactions
- Storage adapter compatibility
- Embedding service integration
- Connection graph operations
End-to-End Tests
- Complete user workflows
- Real embedding pipeline
- Cross-adapter scenarios
- Production configuration testing
Performance Tests
- Load testing scenarios
- Stress testing limits
- Memory usage profiling
- Response time validation
Implementation Plan
Phase 1: Foundation (Week 1-2) ✅ COMPLETED
Goal: Establish core testing infrastructure
Deliverables
- PostgreSQL Vector adapter test suite (535 lines, comprehensive coverage)
- SQLite Vec memory operations tests (complete partial implementation)
- Unified test contracts for all adapters (test-helpers.ts)
- CI/CD pipeline with database setup
Success Criteria
- ✅ All storage adapters have >90% test coverage
- ✅ CI/CD pipeline runs successfully (pnpm build passes)
- ✅ Local development environment setup documented
Phase 2: Integration (Week 3-4) ✅ COMPLETED
Goal: Validate cross-component functionality
Deliverables
- RecallService E2E integration tests (825 lines, comprehensive)
- Real embedding pipeline testing with OpenAI API (mock service pattern)
- Cross-adapter result comparison validation
- Hybrid search accuracy benchmarking (70% vector + 30% text)
Success Criteria
- ✅ RecallService works with all storage adapters
- ✅ Embedding pipeline handles API failures gracefully
- ✅ Hybrid search accuracy ≥85% vs pure vector search
Phase 3: Performance (Week 5-6)
Goal: Ensure production-ready performance
Deliverables
- Load testing suite for 10K+ memories
- Concurrent user testing (100+ users)
- Performance regression detection
- Resource usage optimization
Success Criteria
- All performance SLAs met
- Load tests pass without failures
- Resource usage within acceptable limits
Phase 4: Production Readiness (Week 7-8)
Goal: Validate production deployment scenarios
Deliverables
- Managed service compatibility testing
- Production configuration validation
- Disaster recovery testing
- Documentation and runbooks
Success Criteria
- All managed services tested successfully
- Production configurations validated
- Disaster recovery procedures documented
Test Specifications
Memory Operations Test Suite
Basic CRUD Operations
describe('Memory CRUD Operations', () => {
test('store creates memory with proper isolation');
test('recall filters by user/agent correctly');
test('update modifies memory safely');
test('delete removes memory completely');
test('getById returns correct memory');
test('getStats provides accurate counts');
});
Vector Operations Testing
describe('Vector Operations', () => {
test('storeMemoryWithEmbedding stores vector correctly');
test('searchByVector finds similar memories');
test('hybridSearch combines vector + text scores');
test('updateMemoryEmbedding modifies vectors');
test('getMemoryEmbedding retrieves vectors');
});
Hybrid Search Validation
describe('Hybrid Search', () => {
test('70% vector + 30% text weight distribution');
test('PostgreSQL ts_rank_cd text scoring');
test('SQLite FTS5 BM25 text scoring');
test('Reciprocal Rank Fusion algorithm');
test('result ranking consistency');
});
Performance Test Specifications
Load Testing
describe('Performance Tests', () => {
test('10K memories storage performance', async () => {
const startTime = Date.now();
await storeMemories(10000);
const duration = Date.now() - startTime;
expect(duration).toBeLessThan(30000); // 30 seconds
});
test('concurrent recall operations', async () => {
const promises = Array(100).fill(0).map(() =>
recallMemories('test query')
);
const results = await Promise.all(promises);
expect(results.every(r => r.length > 0)).toBe(true);
});
test('hybrid search response time', async () => {
const startTime = Date.now();
await hybridSearch('complex semantic query');
const duration = Date.now() - startTime;
expect(duration).toBeLessThan(200); // 200ms SLA
});
});
E2E Test Scenarios
User Journey: Learning Session
describe('E2E: Learning Session', () => {
test('complete learning workflow', async () => {
// 1. Store working memory during learning
const workingId = await storeWorkingMemory(
'Learning about React hooks'
);
// 2. Convert to episodic memory after practice
const episodicId = await storeEpisodicMemory(
'Successfully built React app with hooks'
);
// 3. Extract semantic knowledge
const semanticId = await storeSemanticMemory(
'React hooks manage state in functional components'
);
// 4. Learn procedural pattern
const proceduralId = await learnProceduralPattern(
'need state management',
'use React hooks'
);
// 5. Test recall with hybrid search
const results = await hybridSearch('React state management');
expect(results).toContainMemories([
workingId, episodicId, semanticId, proceduralId
]);
expect(results[0].score).toBeGreaterThan(0.8);
});
});
Cross-Adapter Compatibility
describe('E2E: Cross-Adapter Compatibility', () => {
test('same results across storage adapters', async () => {
const testQuery = 'machine learning algorithms';
const testMemories = generateTestMemories(100);
// Store memories in all adapters
await Promise.all([
postgresAdapter.batchStore(testMemories),
postgresVectorAdapter.batchStore(testMemories),
sqliteAdapter.batchStore(testMemories),
sqliteVecAdapter.batchStore(testMemories)
]);
// Query all adapters
const [pgResults, pgvResults, sqliteResults, sqliteVecResults] =
await Promise.all([
postgresAdapter.recall(testQuery),
postgresVectorAdapter.hybridSearch(testQuery),
sqliteAdapter.recall(testQuery),
sqliteVecAdapter.hybridSearch(testQuery)
]);
// Validate consistency
expect(pgvResults.length).toBeGreaterThan(pgResults.length);
expect(sqliteVecResults.length).toBeGreaterThan(sqliteResults.length);
expect(compareRelevance(pgvResults, sqliteVecResults)).toBeGreaterThan(0.8);
});
});
Risk Assessment
High Risk
- PostgreSQL Vector Testing Gap: Zero tests currently exist
- Performance Unknowns: No load testing for scale scenarios
- Production Compatibility: Managed services untested
Medium Risk
- Embedding API Dependencies: OpenAI rate limits and costs
- Extension Dependencies: pgvector and vec0 availability
- Test Environment Complexity: Multiple database setups
Low Risk
- Test Maintenance: New features require test updates
- CI/CD Performance: Longer build times with comprehensive tests
Success Criteria
Functional Success
- All storage adapters pass comprehensive test suites
- Hybrid search accuracy ≥85% vs pure vector search
- Zero data loss or corruption in any test scenario
- Complete user isolation across all operations
Performance Success
- <200ms response time for 95% of hybrid searches
- Support 1000+ memory operations per second
- Handle 100+ concurrent users without degradation
- Memory usage <2GB for 100K stored memories
Quality Success
- 95% test coverage across all memory operations
- Zero critical bugs in production deployment
- Successful deployment to all supported platforms
- Developer productivity maintained with fast test execution
Appendix
Test Data Samples
// Realistic memory content for testing
const testMemories = [
{
content: "The user prefers dark mode in applications",
type: "semantic",
importance: 0.7,
keywords: ["ui", "preferences", "dark-mode"]
},
{
content: "Successfully debugged authentication issue by checking JWT token expiration",
type: "episodic",
importance: 0.9,
tags: ["debugging", "authentication", "jwt"]
},
{
content: "When API returns 500 error, check database connection timeout",
type: "procedural",
importance: 0.8,
pattern: "api-error-debugging"
}
];
Performance Benchmarks
// Expected performance baselines
const performanceBaselines = {
vectorSearch: {
small: "< 50ms for 1K memories",
medium: "< 100ms for 10K memories",
large: "< 200ms for 100K memories"
},
hybridSearch: {
small: "< 100ms for 1K memories",
medium: "< 200ms for 10K memories",
large: "< 500ms for 100K memories"
},
storage: {
single: "< 10ms per memory",
batch: "< 5ms per memory in batch of 100"
}
};
Document Control
- Created: July 9, 2025
- Last Updated: July 9, 2025
- Next Review: July 16, 2025
- Approvers: Engineering Lead, Product Manager, QA Lead