GitHub Repository: https://github.com/Sector-F-Labs/reservoir
Reservoir
Abstract
Reservoir is a stateful proxy server for OpenAI-compatible Chat Completions APIs. It maintains conversation history in a Neo4j graph database and automatically injects relevant context into requests based on semantic similarity and recency.
Problem Statement
OpenAI-compatible Chat Completions APIs are stateless. Each request must include the complete conversation history for the model to maintain context. This creates several problems:
- Manual conversation state management
- Token limit constraints as conversations grow
- Inability to reference semantically related conversations
- No persistent storage of conversation data
Solution
Reservoir acts as an intermediary that:
- Stores all messages in a Neo4j graph database
- Computes embeddings using BGE-Large-EN-v1.5 (current default)
- Creates semantic relationships (synapses) between similar messages
- Automatically injects relevant context into new requests
- Manages token limits through intelligent truncation
Architecture
sequenceDiagram
participant App
participant Reservoir
participant Neo4j
participant LLM as OpenAI/Ollama
App->>Reservoir: Request (e.g. /v1/chat/completions/$USER/my-application)
Reservoir->>Reservoir: Check if last message exceeds token limit (Return error if true)
Reservoir->>Reservoir: Tag with Trace ID + Partition
Reservoir->>Neo4j: Store original request message(s)
%% --- Context Enrichment Steps ---
Reservoir->>Neo4j: Query for similar & recent messages
Neo4j-->>Reservoir: Return relevant context messages
Reservoir->>Reservoir: Inject context messages into request payload
%% --- End Enrichment Steps ---
Reservoir->>Reservoir: Check total token count & truncate if needed (preserving system/last messages)
Reservoir->>LLM: Forward enriched & potentially truncated request
LLM->>Reservoir: Return LLM response
Reservoir->>Neo4j: Store LLM response message
Reservoir->>App: Return LLM response
Supported Providers
- OpenAI (gpt-4, gpt-4o, gpt-3.5-turbo)
- Ollama (local models)
- Mistral AI
- Google Gemini
- Any OpenAI-compatible endpoint
Data Model
Conversations are stored as a graph structure:
- MessageNode: Individual messages with embeddings
- EmbeddingNode: Vector representations for semantic search
- SYNAPSE: Relationships between semantically similar messages
- RESPONDED_WITH: Sequential conversation flow
- HAS_EMBEDDING: Message-to-embedding associations
Semantic Relationships
Reservoir creates synapses between messages when cosine similarity exceeds 0.85. This enables:
- Cross-conversation context injection
- Topic thread identification
- Semantic search capabilities
Usage
Replace OpenAI API endpoint:
https://api.openai.com/v1/chat/completions
With Reservoir endpoint:
http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions
The system organizes conversations using a partition/instance hierarchy for multi-tenant isolation.
Implementation
Start server:
cargo run -- start
The server initializes a vector index in Neo4j and listens on port 3017.
Documentation
Technical documentation is available at sectorflabs.com/reservoir.
Local documentation can be built with:
make book
Reference Implementation
A reference talk demonstrating the system architecture:
License
BSD 3-Clause License