Introduction
Reservoir is first and foremost a memory system for interactions with large language models, designed to build a Retrieval-Augmented Generation (RAG) database of useful context from language model interactions over time. It maintains conversation history in a Neo4j graph database and automatically injects relevant context into requests based on semantic similarity and recency. Reservoir can also act as an optional stateful proxy server for OpenAI-compatible Chat Completions APIs.
Problem Statement
By default , Language Model interactions are stateless. Each request must include the complete conversation history for the model to maintain context. This creates several technical challenges:
- Manual conversation state management: Applications must implement their own conversation storage and retrieval systems
- Token limit constraints: As conversations grow, they exceed model token limits
- Inability to reference semantically related conversations: Previous relevant discussions cannot be automatically incorporated
- No persistent storage: Conversation data is lost when applications terminate
Technical Solution
Reservoir addresses these limitations by acting as an intermediary layer that:
- Stores all messages in a Neo4j graph database with full conversation history
- Computes embeddings using BGE-Large-EN-v1.5 for semantic similarity calculation
- Creates semantic relationships (synapses) between messages when cosine similarity exceeds 0.85
- Automatically injects relevant context into new requests based on similarity and recency
- Manages token limits through intelligent truncation while preserving system and user messages
Architecture Overview
Reservoir is a command line tool that intercepts API calls, enriches them with relevant context, and forwards requests to the target Language Model provider. It can also run as an HTTP proxy, acting as an intermediary between clients and API endpoints. All conversation data remains local to the deployment environment.
Data Model
Conversations are stored as a graph structure:
- MessageNode: Individual messages with metadata and embeddings
- EmbeddingNode: Vector representations for semantic search operations
- SYNAPSE: Relationships between semantically similar messages
- RESPONDED_WITH: Sequential conversation flow relationships
- HAS_EMBEDDING: Message-to-embedding associations
Supported Providers (Proxy Mode)
The system supports multiple Language Model providers through a unified interface:
- OpenAI (gpt-4, gpt-4o, gpt-3.5-turbo)
- Ollama (local model execution)
- Mistral AI
- Google Gemini
- Any OpenAI-compatible endpoint
Implementation Details
The server initializes a vector index in Neo4j for efficient semantic search and listens on a configurable port (default: 3017). Conversations are organized using a partition/instance hierarchy enabling multi-tenant isolation.
Use Cases
- Stateful chat applications: Eliminate manual conversation state management
- Cross-session context: Maintain context across application restarts
- Semantic search: Retrieve relevant historical conversations
- Multi-provider workflows: Maintain context when switching between Language Model providers
- Research and development: Build persistent knowledge bases from Language Model interactions
For implementation details, see the Quick Start guide.
Getting Started
Welcome to Reservoir! This section will guide you through everything you need to get up and running with Reservoir quickly and efficiently.
What You'll Learn
In this section, you'll learn how to:
- Install Reservoir - Set up Reservoir on your system with all prerequisites
- Quick Start - Get Reservoir running in minutes with basic configuration
- Your First Chat - Send your first LLM conversation through Reservoir
Prerequisites
Before you begin, make sure you have:
- Neo4j database running (local or remote)
- Rust toolchain installed (for building from source)
- API keys for your preferred LLM providers (OpenAI, Mistral, etc.)
Getting Help
If you run into any issues during setup, check out our Help & Support section for troubleshooting guides and frequently asked questions.
Let's get started!
Installation
This guide will walk you through installing and setting up Reservoir on your system.
Prerequisites
Before installing Reservoir, make sure you have the following dependencies installed:
Required Dependencies
-
Rust and Cargo (latest stable version)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source ~/.cargo/env
-
Neo4j Database (version 4.4 or later)
Option A: Using Docker (Recommended)
docker run \ --name neo4j \ -p7474:7474 -p7687:7687 \ -d \ -v $HOME/neo4j/data:/data \ -v $HOME/neo4j/logs:/logs \ -v $HOME/neo4j/import:/var/lib/neo4j/import \ -v $HOME/neo4j/plugins:/plugins \ --env NEO4J_AUTH=neo4j/password \ neo4j:latest
Option B: Native Installation
- Download from Neo4j Download Center
- Follow the installation instructions for your operating system
Optional Dependencies
-
mdBook (for building documentation)
cargo install mdbook
-
Hurl (for running API tests)
# macOS brew install hurl # Linux curl --location --remote-name https://github.com/Orange-OpenSource/hurl/releases/latest/download/hurl_amd64.deb sudo dpkg -i hurl_amd64.deb
Installing Reservoir
From Source (Recommended)
-
Clone the repository
git clone https://github.com/divanvisagie/reservoir.git cd reservoir
-
Build the project
cargo build --release
-
Install the binary (optional)
cargo install --path .
Using Cargo Install
Once Reservoir is published to crates.io, you'll be able to install it directly:
cargo install reservoir
Configuration
Environment Variables
Create an .env
file in your project directory or set these environment variables:
# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
# Server Configuration
RESERVOIR_PORT=3017
RESERVOIR_HOST=127.0.0.1
# API Keys (set as needed)
OPENAI_API_KEY=your-openai-key-here
MISTRAL_API_KEY=your-mistral-key-here
GEMINI_API_KEY=your-gemini-key-here
# Custom Provider Endpoints (optional)
RSV_OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
RSV_OLLAMA_BASE_URL=http://localhost:11434/v1/chat/completions
RSV_MISTRAL_BASE_URL=https://api.mistral.ai/v1/chat/completions
Using direnv (Recommended)
If you're using direnv, you can create a .envrc
file:
# .envrc
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USERNAME=neo4j
export NEO4J_PASSWORD=password
export RESERVOIR_PORT=3017
export OPENAI_API_KEY=your-openai-key-here
Then activate it:
direnv allow
Verification
1. Check Neo4j Connection
Make sure Neo4j is running and accessible:
# If using Docker
docker ps | grep neo4j
# Test connection (replace with your credentials)
curl -u neo4j:password http://localhost:7474/db/data/
2. Start Reservoir
# From the repository directory
cargo run -- start
# Or if you installed the binary
reservoir start
You should see output similar to:
2024-01-01T12:00:00Z [INFO] Initializing vector index in Neo4j...
2024-01-01T12:00:01Z [INFO] Server starting on http://127.0.0.1:3017
3. Test the Installation
Run the included tests to verify everything is working:
# Test all endpoints
./hurl/test.sh
# Or test individual endpoints
hurl --variable USER="$USER" --variable OPENAI_API_KEY="$OPENAI_API_KEY" hurl/chat_completion.hurl
4. Simple API Test
Test with a basic curl request:
curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Hello, Reservoir!"
}
]
}'
Troubleshooting Installation
Common Issues
Neo4j Connection Failed
- Verify Neo4j is running:
docker ps
or check your local Neo4j service - Check credentials in your environment variables
- Ensure ports 7474 and 7687 are not blocked
Cargo Build Fails
- Update Rust:
rustup update
- Clear cargo cache:
cargo clean
- Check for system dependency issues
Port Already in Use
- Change the port:
export RESERVOIR_PORT=3018
- Kill existing processes:
lsof -ti:3017 | xargs kill
API Key Issues
- Verify your API keys are set correctly:
echo $OPENAI_API_KEY
- Check for extra whitespace or quotes in environment variables
Getting Help
If you encounter issues:
- Check the Troubleshooting section
- Review the server logs for detailed error messages
- Verify all prerequisites are properly installed
- Test with the simplest possible configuration first
Next Steps
Once Reservoir is installed and running:
- Follow the Getting Started guide
- Try the Chat Gipitty Integration
- Explore the API Reference
- Check out Usage Examples
Quick Start
This guide will get you up and running with Reservoir in just a few minutes.
Before You Begin
Make sure you have:
- Reservoir installed (see Installation)
- Neo4j running locally
- At least one API key configured (OpenAI, Mistral, or Gemini)
Step 1: Start the Server
Open a terminal and start Reservoir:
cargo run -- start
You should see:
[INFO] Initializing vector index in Neo4j for semantic search
[INFO] Server starting on http://127.0.0.1:3017
Keep this terminal open - Reservoir is now running and ready to handle requests.
Step 2: Your First Chat Request
Open a new terminal and send your first chat request:
curl "http://127.0.0.1:3017/partition/$USER/instance/quickstart/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Hello! What is Reservoir?"
}
]
}'
The response will look like a standard OpenAI API response, but Reservoir has:
- Stored your message and the LLM's response
- Tagged them with your username and "quickstart" instance
- Made them available for future context enrichment
Step 3: See the Memory in Action
Send a follow-up question that references your previous conversation:
curl "http://127.0.0.1:3017/partition/$USER/instance/quickstart/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Can you elaborate on what you just told me?"
}
]
}'
Notice how the LLM understands "what you just told me" - that's Reservoir automatically injecting the previous conversation context!
Step 4: View Your Conversation History
Check what Reservoir has stored:
cargo run -- view 5 --partition "$USER" --instance quickstart
You'll see output like:
2024-01-01T12:00:00+00:00 [abc123] user: Hello! What is Reservoir?
2024-01-01T12:00:01+00:00 [abc123] assistant: Reservoir is a memory system for AI conversations...
2024-01-01T12:01:00+00:00 [def456] user: Can you elaborate on what you just told me?
2024-01-01T12:01:01+00:00 [def456] assistant: Certainly! Let me expand on Reservoir's capabilities...
Step 5: Try Different Models
Reservoir supports multiple providers. Try Ollama (no API key needed):
curl "http://127.0.0.1:3017/partition/$USER/instance/quickstart/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "What did we discuss earlier about Reservoir?"
}
]
}'
Even though you're using a different model (Ollama instead of OpenAI), Reservoir still provides the conversation context!
Understanding the URL Structure
The Reservoir API endpoint follows this pattern:
http://localhost:3017/partition/{partition}/instance/{instance}/v1/chat/completions
- Partition: Organizes conversations (typically your username)
- Instance: Sub-organizes within a partition (like "quickstart", "work", "personal")
- This keeps different contexts separate while allowing context sharing within each space
What Just Happened?
- Storage: Every message (yours and the LLM's) was stored in Neo4j
- Context Enrichment: Reservoir automatically found relevant past messages and included them in requests
- Multi-Provider: You used both OpenAI and Ollama with the same conversation history
- Organization: Your conversations were organized by partition and instance
Next Steps
Now that you've seen Reservoir in action, explore:
- Chat Gipitty Integration - Add memory to your existing cgip setup
- Python Integration - Use with the OpenAI Python library
- API Reference - Detailed API documentation
- Features - Learn about advanced features
Quick Reference
Common Commands
# Start the server
cargo run -- start
# View recent messages
cargo run -- view 10 --partition $USER --instance myapp
# Export conversations
cargo run -- export > backup.json
# Import conversations
cargo run -- import backup.json
# Search conversations
cargo run -- search "your query" --partition $USER
Environment Variables
export RESERVOIR_PORT=3017 # Server port
export NEO4J_URI=bolt://localhost:7687 # Neo4j connection
export OPENAI_API_KEY=your-key-here # OpenAI API key
export MISTRAL_API_KEY=your-key-here # Mistral API key
Ready to dive deeper? Check out the Usage Examples or learn about Chat Gipitty Integration!
Your First Chat
This guide will walk you through sending your first message through Reservoir and demonstrate how its memory and context features work.
Prerequisites
Before starting, make sure you have:
- Reservoir server running (
cargo run -- start
) - Neo4j database accessible
- API keys set up (if using cloud providers)
Example 1: Testing with Ollama (Local)
Let's start with a local Ollama model since it doesn't require API keys:
Step 1: Send your first message
curl "http://127.0.0.1:3017/partition/$USER/instance/first-chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Hello! My name is Alice and I love programming in Python."
}
]
}'
Step 2: Ask a follow-up question
Now ask something that requires memory of the previous conversation:
curl "http://127.0.0.1:3017/partition/$USER/instance/first-chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "What programming language do I like?"
}
]
}'
Magic! The Language Model will remember that you like Python, even though you didn't include the previous conversation in your request. Reservoir handled that automatically!
Step 3: Continue the conversation
curl "http://127.0.0.1:3017/partition/$USER/instance/first-chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Can you suggest a Python project for someone at my skill level?"
}
]
}'
The Language Model will make suggestions based on knowing you're Alice who loves Python programming!
Example 2: Using OpenAI Models
If you have an OpenAI API key set up:
Step 1: Introduction with GPT-4
curl "http://127.0.0.1:3017/partition/$USER/instance/gpt-chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Hi! I am working on a machine learning project about image classification."
}
]
}'
Step 2: Ask for specific help
curl "http://127.0.0.1:3017/partition/$USER/instance/gpt-chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "What neural network architecture would you recommend for my project?"
}
]
}'
GPT-4 will remember you're working on image classification and provide relevant recommendations!
Example 3: Cross-Model Conversations
One of Reservoir's unique features is that conversation context can span multiple models:
Step 1: Start with Ollama
curl "http://127.0.0.1:3017/partition/$USER/instance/cross-model/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "I am learning about quantum computing basics."
}
]
}'
Step 2: Switch to GPT-4
curl "http://127.0.0.1:3017/partition/$USER/instance/cross-model/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Can you explain quantum superposition in more detail?"
}
]
}'
GPT-4 will know you're learning quantum computing and provide an explanation appropriate to your level!
Understanding the Results
What Reservoir Does Behind the Scenes
When you send a message, Reservoir:
- Stores your message in Neo4j with embeddings
- Searches for relevant context from previous conversations
- Injects relevant history into your request automatically
- Forwards the enriched request to the Language Model provider
- Stores the Language Model's response for future context
Viewing Your Conversation History
You can see your stored conversations using the CLI:
# View last 5 messages in the first-chat instance
cargo run -- view 5 --partition $USER --instance first-chat
Sample output:
2025-06-21T09:10:01+00:00 [abc123] user: Hello! My name is Alice and I love programming in Python.
2025-06-21T09:10:02+00:00 [abc123] assistant: Hello Alice! It's great to meet a fellow Python enthusiast...
2025-06-21T09:11:10+00:00 [def456] user: What programming language do I like?
2025-06-21T09:11:12+00:00 [def456] assistant: You mentioned that you love programming in Python!
2025-06-21T09:12:00+00:00 [ghi789] user: Can you suggest a Python project for someone at my skill level?
Testing Different Scenarios
Scenario 1: Different Partitions
Try organizing conversations by topic using different partitions:
# Work-related conversations
curl "http://127.0.0.1:3017/partition/work/instance/coding/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "gemma3", "messages": [{"role": "user", "content": "I need help debugging a React component."}]}'
# Personal learning
curl "http://127.0.0.1:3017/partition/personal/instance/learning/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "gemma3", "messages": [{"role": "user", "content": "I want to learn guitar playing."}]}'
Each partition maintains separate conversation history!
Scenario 2: Web Search Integration
If using a model that supports web search:
curl "http://127.0.0.1:3017/partition/$USER/instance/research/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-search-preview",
"messages": [{"role": "user", "content": "What are the latest trends in AI development?"}],
"web_search_options": {"enabled": true, "max_results": 5}
}'
Common Issues and Solutions
Server Not Responding
# Check if Reservoir is running
curl http://127.0.0.1:3017/health
# If not running, start it
cargo run -- start
"Model not found" Error
- For Ollama models: Make sure Ollama is running and the model is installed
- For cloud models: Check your API keys are set correctly
Empty Responses
- Check your internet connection for cloud providers
- Verify the model name is spelled correctly
- Ensure your API key has sufficient credits
Next Steps
Now that you've sent your first chat, explore these features:
- Python Integration - Use Reservoir from Python code
- Partitioning & Organization - Organize your conversations
- Chat Gipitty Integration - Add memory to your existing chat tools
- API Reference - Learn about advanced features
Congratulations! You've successfully used Reservoir to have a conversation with persistent memory. The Language Model now remembers everything from your conversation and can reference it in future chats!
Usage & Integration
Reservoir is designed to work seamlessly with your existing AI workflows and tools. This section covers various ways to integrate and use Reservoir in your projects.
Integration Options
Chat Applications
- Chat Gipitty Integration - Add persistent memory to your Chat Gipitty conversations
- Python with OpenAI Library - Use Reservoir with the popular OpenAI Python client
Direct API Usage
- Curl Examples - Command-line examples for testing and scripting
- Ollama Integration - Use Reservoir with local Ollama models
Common Use Cases
- Multi-session conversations - Maintain context across different chat sessions
- Cross-application memory - Share conversation history between different tools
- Local AI workflows - Keep conversations private while using local models
- Research and development - Build applications that learn from past interactions
Choosing Your Integration
- New to AI development? Start with Chat Gipitty Integration
- Python developer? Check out Python with OpenAI Library
- Command-line user? Try the Curl Examples
- Privacy-focused? Use Ollama Integration for fully local conversations
Each integration method maintains the same core benefits: persistent memory, context enrichment, and seamless AI conversations.
Ollama Client Integration
You can use reservoir as a memory system for the Ollama command line client by integrating it with a simple bash script.
You can place the following function in your ~/.bashrc
or ~/.zshrc
file and it will use reservoir to
- Fetch context from the model
- Prepend the context to your query
- Send the request to the model
- Save the output
function contextual_ollama_with_ingest() {
local user_query="$1"
# Validate input
if [ -z "$user_query" ]; then
echo "Usage: contextual_ollama_with_ingest 'Your question goes here'" >&2
return 1
fi
# Ingest the user's query into Reservoir
echo "$user_query" | reservoir ingest
# Generate dynamic system prompt with context
local system_prompt_content=$(
echo "the following is info from semantic search based on your query:"
reservoir search "$user_query" --semantic --link
echo "the following is recent history:"
reservoir view 10
)
local full_prompt_content=$(
echo "You are a helpful assistant. Use the following context to answer the user's question."
echo "$system_prompt_content"
echo "User's question: ${user_query}"
)
# Call cgip with enriched context
local assistant_response=$(ollama run gemma3 "$full_prompt_content")
# Store the assistant's response
echo "$assistant_response" | reservoir ingest --role assistant
# Display the response
echo "$assistant_response"
}
# Create a convenient alias
alias olm='contextual_ollama_with_ingest'
By adhering to POSIX standards, reservoir become the semantic memory for any shell interaction with a language model.
Chat Gipitty Integration
Reservoir was originally designed as a memory system for Chat Gipitty. This integration gives your cgip conversations persistent memory, context awareness, and the ability to search through your LLM interaction history.
What You Get
When you integrate Reservoir with Chat Gipitty, you get:
- Persistent Memory: Your conversations are remembered across sessions
- Semantic Search: Find relevant past discussions automatically
- Context Enrichment: Each response is informed by your conversation history
- Multi-Model Support: Switch between different LLM providers while maintaining context
Setup
Prerequisites
- Chat Gipitty installed and working
- Reservoir installed and running (see Installation)
- Your shell configured (bash or zsh)
Installation
Add this function to your ~/.bashrc
or ~/.zshrc
file:
function contextual_cgip_with_ingest() {
local user_query="$1"
# Validate input
if [ -z "$user_query" ]; then
echo "Usage: contextual_cgip_with_ingest 'Your question goes here'" >&2
return 1
fi
# Ingest the user's query into Reservoir
echo "$user_query" | reservoir ingest
# Generate dynamic system prompt with context
local system_prompt_content=$(
echo "the following is info from semantic search based on your query:"
reservoir search "$user_query" --semantic --link
echo "the following is recent history:"
reservoir view 10
)
# Call cgip with enriched context
local assistant_response=$(cgip "${user_query}" --system-prompt="${system_prompt_content}")
# Store the assistant's response
echo "$assistant_response" | reservoir ingest --role assistant
# Display the response
echo "$assistant_response"
}
# Create a convenient alias
alias gpty='contextual_cgip_with_ingest'
After adding this to your shell configuration, reload it:
# For bash
source ~/.bashrc
# For zsh
source ~/.zshrc
Usage
Basic Usage
Use the function directly:
contextual_cgip_with_ingest "Explain quantum computing in simple terms"
Or use the convenient alias:
gpty "What is machine learning?"
Follow-up Questions
The magic happens with follow-up questions:
gpty "Explain neural networks"
# ... LLM responds with explanation ...
gpty "How do they relate to what we discussed about machine learning earlier?"
# ... LLM responds with context from the previous conversation ...
Different Topics
Start a new topic, and Reservoir will find relevant context:
gpty "I'm learning Rust programming"
# ... later in a different session ...
gpty "Show me some advanced Rust patterns"
# Reservoir will remember you're learning Rust and provide appropriate context
How It Works
Here's what happens when you use the integrated function:
- Query Ingestion: Your question is stored in Reservoir
- Context Gathering: Reservoir searches for:
- Semantically similar past conversations
- Recent conversation history
- Context Injection: This context is provided to cgip as a system prompt
- Enhanced Response: cgip responds with awareness of your history
- Response Storage: The LLM's response is stored for future context
Advanced Configuration
Custom Search Parameters
You can modify the function to customize how context is gathered:
function contextual_cgip_with_ingest() {
local user_query="$1"
if [ -z "$user_query" ]; then
echo "Usage: contextual_cgip_with_ingest 'Your question goes here'" >&2
return 1
fi
echo "$user_query" | reservoir ingest
# Customize these parameters
local system_prompt_content=$(
echo "=== Relevant Context ==="
reservoir search "$user_query" --semantic --link --limit 5
echo ""
echo "=== Recent History ==="
reservoir view 15 --partition "$USER" --instance "cgip"
)
local assistant_response=$(cgip "${user_query}" --system-prompt="${system_prompt_content}")
echo "$assistant_response" | reservoir ingest --role assistant
echo "$assistant_response"
}
Partitioned Conversations
Organize your conversations by topic or project:
function gpty_work() {
local user_query="$1"
if [ -z "$user_query" ]; then
echo "Usage: gpty_work 'Your work-related question'" >&2
return 1
fi
echo "$user_query" | reservoir ingest --partition "$USER" --instance "work"
local system_prompt_content=$(
echo "Context from work conversations:"
reservoir search "$user_query" --semantic --partition "$USER" --instance "work"
echo "Recent work discussion:"
reservoir view 10 --partition "$USER" --instance "work"
)
local assistant_response=$(cgip "${user_query}" --system-prompt="${system_prompt_content}")
echo "$assistant_response" | reservoir ingest --role assistant --partition "$USER" --instance "work"
echo "$assistant_response"
}
function gpty_personal() {
# Similar function for personal conversations
# ... implement similarly with --instance "personal"
}
Model Selection
Use different models while maintaining context:
function gpty_creative() {
local user_query="$1"
echo "$user_query" | reservoir ingest
local system_prompt_content=$(
reservoir search "$user_query" --semantic --link
reservoir view 5
)
# Use a creative model via cgip configuration
local assistant_response=$(cgip "${user_query}" --system-prompt="${system_prompt_content}" --model gpt-4)
echo "$assistant_response" | reservoir ingest --role assistant
echo "$assistant_response"
}
Benefits of This Integration
Continuous Learning
- Your LLM assistant learns from every interaction
- Context builds up over time, making responses more personalized
- No need to re-explain your projects or preferences
Cross-Session Memory
- Resume conversations from days or weeks ago
- Reference past decisions and discussions
- Build on previous explanations and examples
Semantic Understanding
- Ask "What did we discuss about X?" and get relevant results
- Similar topics are automatically connected
- Context is found even if you use different wording
Privacy
- All your conversation history stays local
- No data sent to external services beyond the LLM API calls
- You control your data completely
Troubleshooting
Function Not Found
Make sure you've sourced your shell configuration:
source ~/.bashrc # or ~/.zshrc
No Context Being Added
Check that Reservoir is running:
# Should show Reservoir process
ps aux | grep reservoir
# Start if not running
cargo run -- start
Empty Search Results
Build up some conversation history first:
gpty "Tell me about artificial intelligence"
gpty "What are neural networks?"
gpty "How does machine learning work?"
# Now try a search
gpty "What did we discuss about AI?"
Permission Issues
Make sure the function has access to reservoir commands:
# Test individual commands
echo "test" | reservoir ingest
reservoir view 5
reservoir search "test"
Next Steps
- Explore API Reference to understand Reservoir's capabilities
- Learn about Partitioning to organize conversations
- Check out Python Integration for programmatic access
- See Troubleshooting if you encounter issues
The Chat Gipitty integration transforms your LLM interactions from isolated conversations into a connected, searchable knowledge base that grows smarter with every interaction.
Python Integration
Reservoir works seamlessly with the popular OpenAI Python library. You simply point the client to your Reservoir instance instead of directly to OpenAI, and Reservoir handles all the memory and context management automatically.
Setup
First, install the OpenAI Python library if you haven't already:
pip install openai
Basic Configuration
import os
from openai import OpenAI
INSTANCE = "my-application"
PARTITION = os.getenv("USER")
RESERVOIR_PORT = os.getenv('RESERVOIR_PORT', '3017')
RESERVOIR_BASE_URL = f"http://localhost:{RESERVOIR_PORT}/v1/partition/{PARTITION}/instance/{INSTANCE}"
OpenAI Models
Basic Usage with OpenAI
import os
from openai import OpenAI
INSTANCE = "my-application"
PARTITION = os.getenv("USER")
RESERVOIR_PORT = os.getenv('RESERVOIR_PORT', '3017')
RESERVOIR_BASE_URL = f"http://localhost:{RESERVOIR_PORT}/v1/partition/{PARTITION}/instance/{INSTANCE}"
client = OpenAI(
base_url=RESERVOIR_BASE_URL,
api_key=os.environ.get("OPENAI_API_KEY")
)
completion = client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": "Write a one-sentence bedtime story about a curious robot."
}
]
)
print(completion.choices[0].message.content)
With Web Search Options
For models that support web search (like gpt-4o-search-preview
), you can enable web search capabilities:
completion = client.chat.completions.create(
model="gpt-4o-search-preview",
messages=[
{
"role": "user",
"content": "What are the latest trends in machine learning?"
}
],
extra_body={
"web_search_options": {
"enabled": True,
"max_results": 5
}
}
)
Ollama Models (Local)
Using Ollama (No API Key Required)
import os
from openai import OpenAI
INSTANCE = "my-application"
PARTITION = os.getenv("USER")
RESERVOIR_PORT = os.getenv('RESERVOIR_PORT', '3017')
RESERVOIR_BASE_URL = f"http://localhost:{RESERVOIR_PORT}/v1/partition/{PARTITION}/instance/{INSTANCE}"
client = OpenAI(
base_url=RESERVOIR_BASE_URL,
api_key="not-needed-for-ollama" # Ollama doesn't require API keys
)
completion = client.chat.completions.create(
model="llama3.2", # or "gemma3", or any Ollama model
messages=[
{
"role": "user",
"content": "Explain the concept of recursion with a simple example."
}
]
)
print(completion.choices[0].message.content)
Supported Models
Reservoir automatically routes requests to the appropriate provider based on the model name:
Model | Provider | API Key Required |
---|---|---|
gpt-4 , gpt-4o , gpt-4o-mini , gpt-3.5-turbo | OpenAI | Yes (OPENAI_API_KEY ) |
gpt-4o-search-preview | OpenAI | Yes (OPENAI_API_KEY ) |
llama3.2 , gemma3 , or any custom name | Ollama | No |
mistral-large-2402 | Mistral | Yes (MISTRAL_API_KEY ) |
gemini-2.0-flash , gemini-2.5-flash-preview-05-20 | Yes (GEMINI_API_KEY ) |
Note: Any model name not explicitly configured will default to using Ollama.
Environment Variables
You can customize provider endpoints and set API keys using environment variables:
import os
# Set environment variables (or use .env file)
os.environ['OPENAI_API_KEY'] = 'your-openai-key'
os.environ['MISTRAL_API_KEY'] = 'your-mistral-key'
os.environ['GEMINI_API_KEY'] = 'your-gemini-key'
# Custom provider endpoints (optional)
os.environ['RSV_OPENAI_BASE_URL'] = 'https://api.openai.com/v1/chat/completions'
os.environ['RSV_OLLAMA_BASE_URL'] = 'http://localhost:11434/v1/chat/completions'
os.environ['RSV_MISTRAL_BASE_URL'] = 'https://api.mistral.ai/v1/chat/completions'
Complete Example
Here's a complete example that demonstrates Reservoir's memory capabilities:
import os
from openai import OpenAI
def setup_reservoir_client():
"""Setup Reservoir client with proper configuration"""
instance = "chat-example"
partition = os.getenv("USER", "default")
port = os.getenv('RESERVOIR_PORT', '3017')
base_url = f"http://localhost:{port}/v1/partition/{partition}/instance/{instance}"
return OpenAI(
base_url=base_url,
api_key=os.environ.get("OPENAI_API_KEY", "not-needed-for-ollama")
)
def chat_with_memory(message, model="gpt-4"):
"""Send a message through Reservoir with automatic memory"""
client = setup_reservoir_client()
completion = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": message
}
]
)
return completion.choices[0].message.content
# Example conversation that builds context
if __name__ == "__main__":
# First message
response1 = chat_with_memory("My name is Alice and I love Python programming.")
print("Assistant:", response1)
# Second message - Reservoir will automatically include context
response2 = chat_with_memory("What programming language do I like?")
print("Assistant:", response2) # Will know you like Python!
# Third message - Even more context
response3 = chat_with_memory("Can you suggest a project for me?")
print("Assistant:", response3) # Will suggest Python projects for Alice!
Benefits of Using Reservoir
When you use Reservoir with the OpenAI library, you get:
- Automatic Context: Previous conversations are automatically included
- Cross-Session Memory: Conversations persist across different Python sessions
- Smart Token Management: Reservoir handles token limits automatically
- Multi-Provider Support: Switch between different LLM providers seamlessly
- Local Storage: All your conversation data stays on your device
Next Steps
- Learn about Partitioning & Organization to organize your conversations
- Check out Token Management to understand how Reservoir handles context limits
- Explore the API Reference for more advanced usage patterns
Curl Examples
This page provides comprehensive examples of using Reservoir with curl commands. These examples are perfect for testing, scripting, or understanding the API structure.
Basic URL Structure
Instead of calling the provider directly, you call Reservoir with this URL pattern:
- Direct Provider:
https://api.openai.com/v1/chat/completions
- Through Reservoir:
http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions
Where:
$USER
is your system username (acts as the partition)reservoir
is the instance name (you can use any name)
OpenAI Models
Basic GPT-4 Example
curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Write a one-sentence bedtime story about a brave little toaster."
}
]
}'
GPT-4 with System Message
curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that explains complex topics in simple terms."
},
{
"role": "user",
"content": "Explain quantum computing to a 10-year-old."
}
]
}'
Web Search Integration
For models that support web search (like gpt-4o-search-preview
):
curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-search-preview",
"messages": [
{
"role": "user",
"content": "What are the latest developments in AI?"
}
],
"web_search_options": {
"enabled": true,
"max_results": 5
}
}'
Ollama Models (Local)
Basic Ollama Example
No API key needed for Ollama models:
curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Explain quantum computing in simple terms."
}
]
}'
Using Llama Models
curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "Write a Python function to calculate fibonacci numbers."
}
]
}'
Other Providers
Mistral AI
curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MISTRAL_API_KEY" \
-d '{
"model": "mistral-large-2402",
"messages": [
{
"role": "user",
"content": "Explain the differences between functional and object-oriented programming."
}
]
}'
Google Gemini
curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GEMINI_API_KEY" \
-d '{
"model": "gemini-2.0-flash",
"messages": [
{
"role": "user",
"content": "Compare different sorting algorithms and their time complexities."
}
]
}'
Partitioning Examples
Using Different Partitions
You can organize conversations by using different partition names:
# Work conversations
curl "http://127.0.0.1:3017/partition/work/instance/coding/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Review this code for security issues"}]
}'
# Personal conversations
curl "http://127.0.0.1:3017/partition/personal/instance/creative/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Help me write a short story"}]
}'
Using Different Instances
Different instances within the same partition:
# Development instance
curl "http://127.0.0.1:3017/partition/$USER/instance/development/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Debug this Python error"}]
}'
# Research instance
curl "http://127.0.0.1:3017/partition/$USER/instance/research/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Explain machine learning concepts"}]
}'
Testing Scenarios
Test Basic Connectivity
# Simple test with Ollama (no API key needed)
curl "http://127.0.0.1:3017/partition/test/instance/basic/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [{"role": "user", "content": "Hello, can you hear me?"}]
}'
Test Memory Functionality
Send multiple requests to see memory in action:
# First message
curl "http://127.0.0.1:3017/partition/test/instance/memory/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [{"role": "user", "content": "My favorite color is blue."}]
}'
# Second message - should remember the color
curl "http://127.0.0.1:3017/partition/test/instance/memory/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [{"role": "user", "content": "What is my favorite color?"}]
}'
Error Handling
Invalid Model
curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "nonexistent-model",
"messages": [{"role": "user", "content": "Hello"}]
}'
Missing API Key
curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}'
# Will return error because OPENAI_API_KEY is required for GPT-4
Environment Variables
Set up your environment for easier testing:
export OPENAI_API_KEY="your-openai-key"
export MISTRAL_API_KEY="your-mistral-key"
export GEMINI_API_KEY="your-gemini-key"
export RESERVOIR_URL="http://127.0.0.1:3017"
export USER_PARTITION="$USER"
Then use in requests:
curl "$RESERVOIR_URL/partition/$USER_PARTITION/instance/test/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello from the environment!"}]
}'
Debugging Tips
Pretty Print JSON Response
Add | jq
to format the JSON response:
curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [{"role": "user", "content": "Hello"}]
}' | jq
Verbose Output
Use -v
flag to see request/response headers:
curl -v "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [{"role": "user", "content": "Hello"}]
}'
Save Response
Save the response to a file:
curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [{"role": "user", "content": "Hello"}]
}' -o response.json
Next Steps
- Learn about API Reference for more endpoint details
- Check out Python Integration for programmatic usage
- Explore Partitioning & Organization to organize your conversations
Ollama Integration
Reservoir works seamlessly with Ollama, allowing you to use local AI models with persistent memory and context enrichment. This is perfect for privacy-focused workflows where you want to keep all your conversations completely local.
What is Ollama?
Ollama is a tool that makes it easy to run large language models locally on your machine. It supports popular models like Llama, Gemma, and many others, all running entirely on your hardware.
Benefits of Using Ollama with Reservoir
- Complete Privacy: All conversations stay on your device
- No API Keys: No need for cloud service API keys
- Offline Capable: Works without internet connection
- Cost Effective: No usage-based charges
- Full Control: Choose exactly which models to use
Setting Up Ollama
Step 1: Install Ollama
First, install Ollama from ollama.ai:
# On macOS
brew install ollama
# On Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Or download from https://ollama.ai/download
Step 2: Start Ollama Service
ollama serve
This starts the Ollama service on http://localhost:11434
.
Step 3: Download Models
Download the models you want to use:
# Download Gemma 3 (Google's model)
ollama pull gemma3
# Download Llama 3.2 (Meta's model)
ollama pull llama3.2
# Download Mistral (Mistral AI's model)
ollama pull mistral
# See all available models
ollama list
Using Ollama with Reservoir
Regular Mode
By default, Reservoir routes any unrecognized model names to Ollama:
curl "http://127.0.0.1:3017/partition/$USER/instance/ollama-chat/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Explain machine learning in simple terms."
}
]
}'
No API key required!
Ollama Mode
Reservoir also provides a special "Ollama mode" that makes it a drop-in replacement for Ollama's API:
# Start Reservoir in Ollama mode
cargo run -- start --ollama
In Ollama mode, Reservoir:
- Uses the same API endpoints as Ollama
- Provides the same response format
- Adds memory and context enrichment automatically
- Makes existing Ollama clients work with persistent memory
Testing Ollama Mode
# Test with the standard Ollama endpoint format
curl "http://127.0.0.1:3017/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Hello, can you remember our previous conversations?"
}
]
}'
Popular Ollama Models
Gemma 3 (Google)
Excellent for general conversation and coding:
curl "http://127.0.0.1:3017/partition/$USER/instance/coding/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Write a Python function to sort a list of dictionaries by a specific key."
}
]
}'
Llama 3.2 (Meta)
Great for reasoning and complex tasks:
curl "http://127.0.0.1:3017/partition/$USER/instance/reasoning/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{
"role": "user",
"content": "Solve this logic puzzle: If all roses are flowers, and some flowers are red, can we conclude that some roses are red?"
}
]
}'
Mistral 7B
Efficient and good for general tasks:
curl "http://127.0.0.1:3017/partition/$USER/instance/general/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "mistral",
"messages": [
{
"role": "user",
"content": "Summarize the key points of quantum computing for a beginner."
}
]
}'
Python Integration with Ollama
Using the OpenAI library with local Ollama models:
import os
from openai import OpenAI
# Setup for Ollama through Reservoir
INSTANCE = "ollama-python"
PARTITION = os.getenv("USER", "default")
RESERVOIR_PORT = os.getenv('RESERVOIR_PORT', '3017')
RESERVOIR_BASE_URL = f"http://localhost:{RESERVOIR_PORT}/v1/partition/{PARTITION}/instance/{INSTANCE}"
client = OpenAI(
base_url=RESERVOIR_BASE_URL,
api_key="not-needed-for-ollama" # Ollama doesn't require API keys
)
# Chat with memory using local model
completion = client.chat.completions.create(
model="gemma3",
messages=[
{
"role": "user",
"content": "My favorite hobby is gardening. What plants would you recommend for a beginner?"
}
]
)
print(completion.choices[0].message.content)
# Ask a follow-up that requires memory
follow_up = client.chat.completions.create(
model="gemma3",
messages=[
{
"role": "user",
"content": "What tools do I need to get started with my hobby?"
}
]
)
print(follow_up.choices[0].message.content)
# Will remember you're interested in gardening!
Environment Configuration
You can customize the Ollama endpoint if needed:
# Default Ollama endpoint
export RSV_OLLAMA_BASE_URL="http://localhost:11434/v1/chat/completions"
# Custom endpoint (if running Ollama on different port/host)
export RSV_OLLAMA_BASE_URL="http://192.168.1.100:11434/v1/chat/completions"
Performance Tips
Model Selection
- gemma3: Good balance of speed and quality
- llama3.2: Higher quality but slower
- mistral: Fast and efficient
- smaller models (7B parameters): Faster on limited hardware
- larger models (13B+): Better quality but require more resources
Hardware Considerations
- RAM: 8GB minimum, 16GB+ recommended for larger models
- GPU: Optional but significantly speeds up inference
- Storage: Models range from 4GB to 40GB+ each
Optimizing Performance
# Use GPU acceleration if available
ollama run gemma3 --gpu
# Monitor resource usage
ollama ps
Troubleshooting Ollama
Common Issues
Ollama Not Found
# Check if Ollama is running
curl http://localhost:11434/api/tags
# If not running, start it
ollama serve
Model Not Available
# List installed models
ollama list
# Pull missing model
ollama pull gemma3
Performance Issues
# Check system resources
ollama ps
# Try a smaller model
ollama pull gemma3:2b # 2B parameter version
Error Messages
- "connection refused": Ollama service isn't running
- "model not found": Model needs to be pulled with
ollama pull
- "out of memory": Try a smaller model or close other applications
Combining Local and Cloud Models
One of Reservoir's strengths is seamlessly switching between local and cloud models:
import os
from openai import OpenAI
# Same client setup
client = OpenAI(base_url=RESERVOIR_BASE_URL, api_key=os.environ.get("OPENAI_API_KEY", ""))
# Start with local model for initial draft
local_response = client.chat.completions.create(
model="gemma3", # Local Ollama model
messages=[{"role": "user", "content": "Write a draft email about project updates"}]
)
# Refine with cloud model for better quality
cloud_response = client.chat.completions.create(
model="gpt-4", # Cloud OpenAI model
messages=[{"role": "user", "content": "Please improve the writing quality and make it more professional"}]
)
Both responses will have access to the same conversation context!
Next Steps
- Python Integration - Use Ollama models from Python
- Features - Multi-Provider Support - Learn about mixing different providers
- Partitioning & Organization - Organize your local conversations
- Architecture - Data Model - Understand how conversations are stored
Ready to go private? 🔒 With Ollama and Reservoir, you have a completely local AI assistant with persistent memory!
API Overview
Reservoir provides an OpenAI-compatible API endpoint that acts as a smart proxy between your application and LLM language models. This section covers the core API structure and basic usage patterns.
URL Structure
The Reservoir API follows this pattern:
/v1/partition/{partition}/instance/{instance}/chat/completions
Parameters
{partition}
: A broad category for organizing conversations (e.g., project name, application name, username){instance}
: A specific context within the partition (e.g., user ID, session ID, specific feature)
This structure allows you to organize conversations hierarchically and scope context enrichment appropriately.
Example URL Transformation
- Instead of:
https://api.openai.com/v1/chat/completions
- Use:
http://localhost:3017/v1/partition/$USER/instance/my-application/chat/completions
Here, $USER
is your system username, and my-application
is your application instance. All context enrichment and history retrieval are scoped to this specific partition/instance combination.
Basic Request Structure
Reservoir maintains full compatibility with the OpenAI Chat Completions API. You can use the same request structure, headers, and parameters you would use with OpenAI directly.
Required Headers
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
Request Body
The request body follows the same format as OpenAI's Chat Completions API:
{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Your message here"
}
]
}
What Happens Behind the Scenes
When you make a request to Reservoir:
- Message Storage: Your message is stored with the specified partition/instance
- Context Enrichment: Reservoir finds relevant past conversations and recent history
- Token Management: The enriched context is checked against token limits
- Request Forwarding: The enriched request is forwarded to the appropriate LLM provider
- Response Storage: The LLM's response is stored for future context
Response Format
Responses maintain the same format as the underlying LLM provider (OpenAI, Ollama, etc.), so your existing code will work without modification.
Next Steps
- Chat Completions Endpoint - Detailed endpoint documentation
- Search & Retrieval - Finding past conversations
- Data Management - Import/export and management
- Command Line Interface - CLI usage and commands
Chat Completions Endpoint
The Chat Completions endpoint is Reservoir's core API, providing full OpenAI API compatibility with intelligent context enrichment. This endpoint automatically enhances your conversations with relevant historical context while maintaining the same request/response format as OpenAI's Chat Completions API.
Endpoint URL
POST /v1/partition/{partition}/instance/{instance}/chat/completions
URL Parameters
Parameter | Description | Example |
---|---|---|
partition | Top-level organization boundary | alice , project_name , $USER |
instance | Specific context within partition | coding , research , session_123 |
Example URLs
# User-specific coding assistant
POST /v1/partition/alice/instance/coding/chat/completions
# Project-specific documentation bot
POST /v1/partition/docs_project/instance/support/chat/completions
# Personal research assistant
POST /v1/partition/$USER/instance/research/chat/completions
# Default partition/instance (if not specified)
POST /v1/chat/completions # Uses partition=default, instance=default
Request Format
Headers
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
Request Body
Reservoir accepts the standard OpenAI Chat Completions request format:
{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "How do I implement error handling in async functions?"
}
]
}
Supported Models
OpenAI Models:
gpt-4.1
gpt-4-turbo
gpt-4o
gpt-4o-mini
gpt-3.5-turbo
gpt-4o-search-preview
Local Models (via Ollama):
llama3.1:8b
llama3.1:70b
mistral:7b
codellama:latest
- Any Ollama-supported model
Message Roles
Role | Description | Usage |
---|---|---|
user | User input messages | Questions, requests, instructions |
assistant | LLM responses | Previous LLM responses in conversation |
system | System instructions | Behavior modification, context setting |
Context Enrichment Process
When you send a request, Reservoir automatically enhances it with relevant context:
1. Message Analysis
// Your original request
{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "How do I handle database timeouts?"
}
]
}
2. Context Discovery
Reservoir finds relevant context through:
- Semantic Search: Messages similar to "database timeouts"
- Recent History: Last 15 messages from same partition/instance
- Synapse Connections: Related discussions via SYNAPSE relationships
3. Context Injection
// Enriched request sent to the Language Model
{
"model": "gpt-4",
"messages": [
{
"role": "system",
"content": "The following is the result of a semantic search of the most related messages by cosine similarity to previous conversations"
},
{
"role": "user",
"content": "What's the best way to configure database connection pools?"
},
{
"role": "assistant",
"content": "For database connection pools, consider these settings..."
},
{
"role": "system",
"content": "The following are the most recent messages in the conversation in chronological order"
},
{
"role": "user",
"content": "I'm working on optimizing database queries"
},
{
"role": "assistant",
"content": "Here are some query optimization techniques..."
},
{
"role": "user",
"content": "How do I handle database timeouts?" // Your original message
}
]
}
Response Format
Reservoir returns responses in the standard OpenAI Chat Completions format:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1677858242,
"model": "gpt-4",
"usage": {
"prompt_tokens": 13,
"completion_tokens": 7,
"total_tokens": 20
},
"choices": [
{
"message": {
"role": "assistant",
"content": "To handle database timeouts, you should implement retry logic with exponential backoff..."
},
"finish_reason": "stop",
"index": 0
}
]
}
Configuration and Model Selection
Environment Variables
Configure different LLM providers:
# OpenAI (default)
export OPENAI_API_KEY="your-openai-api-key"
export RSV_OPENAI_BASE_URL="https://api.openai.com/v1/chat/completions"
# Ollama (local)
export RSV_OLLAMA_BASE_URL="http://localhost:11434/v1/chat/completions"
# Mistral
export MISTRAL_API_KEY="your-mistral-api-key"
export RSV_MISTRAL_BASE_URL="https://api.mistral.ai/v1/chat/completions"
# Gemini
export GEMINI_API_KEY="your-gemini-api-key"
Model Detection
Reservoir automatically routes requests based on model name:
- OpenAI models:
gpt-*
→ OpenAI API - Local models:
llama*
,mistral*
, etc. → Ollama API - Mistral models:
mistral-*
→ Mistral API
Error Handling
Token Limit Errors
If your message exceeds model token limits:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "Your last message is too long. It contains approximately 5000 tokens, which exceeds the maximum limit of 4096. Please shorten your message."
},
"finish_reason": "length",
"index": 0
}
]
}
API Connection Errors
{
"error": {
"message": "Failed to connect to OpenAI API: Connection timeout. Check your API key and network connection. Using model 'gpt-4' at 'https://api.openai.com/v1/chat/completions'"
}
}
Invalid Model Errors
{
"error": {
"message": "Invalid OpenAI model name: 'gpt-5'. Valid models are: ['gpt-4.1', 'gpt-4-turbo', 'gpt-4o', 'gpt-4o-mini', 'gpt-3.5-turbo', 'gpt-4o-search-preview']"
}
}
Usage Examples
Basic Request
curl -X POST "http://localhost:3017/v1/partition/alice/instance/coding/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Explain async/await in Python"
}
]
}'
With System Message
curl -X POST "http://localhost:3017/v1/partition/docs/instance/writing/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "system",
"content": "You are a technical documentation expert. Provide clear, concise explanations."
},
{
"role": "user",
"content": "How should I document API endpoints?"
}
]
}'
Local Model (Ollama)
curl -X POST "http://localhost:3017/v1/partition/alice/instance/local/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1:8b",
"messages": [
{
"role": "user",
"content": "What are the benefits of using local LLMs?"
}
]
}'
Integration Examples
Python with OpenAI Library
import openai
# Configure to use Reservoir instead of OpenAI directly
openai.api_base = "http://localhost:3017/v1/partition/alice/instance/coding"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "user", "content": "How do I optimize this database query?"}
]
)
print(response.choices[0].message.content)
JavaScript/Node.js
const OpenAI = require('openai');
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: 'http://localhost:3017/v1/partition/myapp/instance/support'
});
async function chat(message) {
const completion = await openai.chat.completions.create({
messages: [{ role: 'user', content: message }],
model: 'gpt-4',
});
return completion.choices[0].message.content;
}
Streaming Responses
Reservoir supports streaming responses when the underlying model supports it:
import openai
openai.api_base = "http://localhost:3017/v1/partition/alice/instance/chat"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Explain machine learning"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
Advanced Features
Web Search Integration
Some models support web search capabilities:
{
"model": "gpt-4o-search-preview",
"messages": [
{
"role": "user",
"content": "What are the latest developments in AI?"
}
],
"web_search_options": {
"enabled": true
}
}
Message Storage
All messages (user and assistant) are automatically stored with:
- Embeddings: For semantic search and context enrichment
- Timestamps: For chronological ordering
- Partition/Instance: For data organization
- Trace IDs: For linking request/response pairs
Context Control
Control context enrichment via configuration:
# Adjust context size
reservoir config --set semantic_context_size=20
reservoir config --set recent_context_size=15
# View current settings
reservoir config --get semantic_context_size
Performance Considerations
Token Management
- Reservoir automatically manages token limits for each model
- Context is intelligently truncated when necessary
- Priority given to most relevant and recent content
Caching
- Embeddings are cached to avoid recomputation
- Vector indices are optimized for fast similarity search
- Connection pooling for database efficiency
Latency
- Typical latency: 200-500ms for context enrichment
- Parallel processing of semantic search and recent history
- Optimized Neo4j queries for fast retrieval
The Chat Completions endpoint provides the full power of Reservoir's context enrichment while maintaining complete compatibility with existing OpenAI-based applications, making it easy to add conversational memory to any LLM application.
Search & Retrieval
Reservoir provides powerful search capabilities for finding relevant conversations and messages across your entire conversation history. The search system supports both keyword-based and semantic similarity searches, enabling you to discover related discussions even when they use different terminology.
Search Methods
Keyword Search
Traditional text-based search that finds exact matches or partial matches within message content.
CLI Usage:
# Basic keyword search
reservoir search "python programming"
# Search in specific partition
reservoir search --partition alice "machine learning"
Characteristics:
- Fast and precise for exact term matches
- Case-insensitive matching
- Supports partial word matching
- Best for finding specific technical terms or names
Semantic Search
Vector-based similarity search that finds conceptually related messages even when they use different words.
CLI Usage:
# Semantic search
reservoir search --semantic "machine learning concepts"
# Use RAG strategy (same as context enrichment)
reservoir search --link --semantic "database design"
Characteristics:
- Finds conceptually similar content
- Works across different terminology
- Uses BGE-Large-EN-v1.5 embeddings
- Powers Reservoir's context enrichment system
Search Options
Partitioning
Scope your search to specific organizational boundaries:
# Search in specific partition
reservoir search --partition alice "neural networks"
# Search in specific instance within partition
reservoir search --partition alice --instance coding "API design"
Deduplication
Remove duplicate or highly similar results:
# Remove duplicate results
reservoir search --deduplicate --semantic "error handling"
RAG Strategy
Use the same search strategy that powers context enrichment:
# Use advanced search with synapse expansion
reservoir search --link --semantic "software architecture"
The --link
option:
- Searches for semantically similar messages
- Expands results using synapse relationships
- Follows conversation threads
- Deduplicates automatically
- Limits results to most relevant matches
Search Implementation
Vector Similarity
Reservoir uses cosine similarity to find related messages:
- Query Embedding: Your search term is converted to a vector using BGE-Large-EN-v1.5
- Index Search: Neo4j's vector index finds similar message embeddings
- Scoring: Results are ranked by similarity score (0.0 to 1.0)
- Filtering: Results are filtered by partition/instance boundaries
Synapse Expansion
When using --link
, the search expands beyond direct similarity:
- Initial Search: Find semantically similar messages
- Synapse Following: Explore connected messages via SYNAPSE relationships
- Thread Discovery: Follow conversation threads and related discussions
- Relevance Scoring: Combine similarity scores with relationship strength
- Result Limiting: Return top matches within context limits
Example Queries
Finding Programming Discussions
# Find all Python-related conversations
reservoir search --semantic "python programming"
# Find specific error discussions
reservoir search "TypeError"
# Find design pattern conversations
reservoir search --link --semantic "software design patterns"
Research and Analysis
# Find machine learning discussions
reservoir search --semantic "neural networks deep learning"
# Find database-related conversations
reservoir search --partition research --semantic "database optimization"
# Find recent discussions on a topic
reservoir view 50 | grep -i "kubernetes"
Cross-Conversation Discovery
# Find related discussions across all conversations
reservoir search --link --semantic "microservices architecture"
# Discover connections between topics
reservoir search --deduplicate --semantic "testing strategies"
Search Results Format
CLI Output
Search results include:
- Timestamp: When the message was created
- Partition/Instance: Organizational context
- Role: User or assistant message
- Content: The actual message text
- Score: Similarity score (for semantic search)
JSON Format
When exported, search results follow the MessageNode structure:
{
"trace_id": "abc123-def456",
"partition": "alice",
"instance": "coding",
"role": "user",
"content": "How do I implement error handling in async functions?",
"timestamp": "2024-01-15T10:30:00Z",
"embedding": [0.1, -0.2, 0.3, ...],
"url": null
}
Integration with Context Enrichment
The search system directly powers Reservoir's context enrichment:
- Automatic Search: Every user message triggers a semantic search
- Context Building: Search results become conversation context
- Relevance Filtering: Only high-quality matches (>0.85 similarity) are used
- Token Management: Results are truncated to fit model token limits
Performance Considerations
Vector Index
Reservoir maintains optimized vector indices for fast search:
CREATE VECTOR INDEX embedding1536
FOR (n:Embedding1536) ON (n.embedding)
OPTIONS {
indexConfig: {
`vector.dimensions`: 1536,
`vector.similarity_function`: 'cosine'
}
}
Search Strategies
- Keyword Search: Fastest for exact matches
- Basic Semantic: Good balance of speed and relevance
- RAG Strategy (
--link
): Most comprehensive but slower - Deduplication: Adds processing time but improves result quality
Optimization Tips
- Use Specific Partitions: Reduces search space
- Keyword for Exact Terms: Faster than semantic for specific names
- Semantic for Concepts: Better for finding related ideas
- Limit Result Count: Implicit in CLI, configurable in API
Advanced Usage
Combining with Other Commands
# Search and then view context
reservoir search --semantic "error handling" | head -5
reservoir view 10
# Search and ingest related information
echo "Related to error handling discussion" | reservoir ingest
# Export search results for analysis
reservoir search --semantic "API design" > api_discussions.txt
Scripting and Automation
#!/bin/bash
# Find and analyze topic discussions
TOPIC="$1"
echo "Searching for discussions about: $TOPIC"
# Semantic search with RAG strategy
reservoir search --link --semantic "$TOPIC" > "search_results_$TOPIC.txt"
# Count total discussions
TOTAL=$(wc -l < "search_results_$TOPIC.txt")
echo "Found $TOTAL related messages"
# Show recent activity
echo "Recent activity:"
reservoir view 20 | grep -i "$TOPIC" | head -3
The search system is designed to make your conversation history searchable and discoverable, turning your accumulated AI interactions into a valuable knowledge base that grows more useful over time.
Data Management
Reservoir provides comprehensive data management capabilities for backing up, migrating, and organizing your conversation data. The system supports full data export/import, individual message management, and flexible partitioning strategies.
Export and Import
Export All Data
Export your entire conversation history as JSON for backup or migration:
# Export all messages to stdout
reservoir export
# Save to file with timestamp
reservoir export > backup_$(date +%Y%m%d_%H%M%S).json
# Export and compress for storage
reservoir export | gzip > reservoir_backup.json.gz
Export Format: Each message is exported as a complete MessageNode with all metadata:
[
{
"trace_id": "550e8400-e29b-41d4-a716-446655440000",
"partition": "default",
"instance": "default",
"role": "user",
"content": "How do I implement error handling in async functions?",
"timestamp": "2024-01-15T10:30:00.000Z",
"embedding": [0.123, -0.456, 0.789, ...],
"url": null
},
{
"trace_id": "550e8400-e29b-41d4-a716-446655440001",
"partition": "default",
"instance": "default",
"role": "assistant",
"content": "Here are several approaches to error handling in async functions...",
"timestamp": "2024-01-15T10:30:15.000Z",
"embedding": [0.234, -0.567, 0.890, ...],
"url": null
}
]
Import Data
Import message data from JSON files:
# Import from a backup file
reservoir import backup_20240115.json
# Import from another Reservoir instance
reservoir import exported_conversations.json
# Import compressed backup
gunzip -c reservoir_backup.json.gz | reservoir import /dev/stdin
Import Behavior:
- Validates JSON format and MessageNode structure
- Preserves all metadata including timestamps and embeddings
- Maintains partition/instance organization
- Skips duplicate messages (based on trace_id)
- Rebuilds relationships and synapses
Migration Workflows
Complete System Migration:
# On source system
reservoir export > full_backup.json
# Transfer file to new system
scp full_backup.json user@newserver:/path/to/reservoir/
# On destination system
reservoir import full_backup.json
# Verify migration
reservoir view 10
reservoir search --semantic "test query"
Selective Migration:
# Export from specific partition
reservoir export | jq '.[] | select(.partition=="alice")' > alice_messages.json
# Import to different partition (requires manual editing or processing)
# Edit JSON to change partition names, then import
reservoir import alice_messages.json
Message Management
Manual Message Ingestion
Add messages manually for testing, note-taking, or data entry:
# Add a user message
echo "How do I configure Neo4j for production?" | reservoir ingest
# Add to specific partition/instance
echo "Remember to update dependencies" | reservoir ingest --partition alice --instance notes
# Add assistant message
echo "Here's the production Neo4j configuration..." | reservoir ingest --role assistant
# Ingest from file
cat meeting_notes.txt | reservoir ingest --partition team --instance meetings
Use Cases:
- Documentation: Add important information manually
- Testing: Create test scenarios with known data
- Migration: Import data from other systems
- Notes: Add personal reminders or observations
Viewing Recent Data
Monitor recent activity and verify data integrity:
# View last 10 messages
reservoir view 10
# View from specific partition
reservoir view --partition alice 15
# View from specific instance
reservoir view --partition alice --instance coding 20
# Pipe to other tools for analysis
reservoir view 50 | grep -i "error" | wc -l
Partitioning Strategy
Organizational Structure
Reservoir uses a two-level organizational hierarchy:
- Partition: High-level boundary (user, project, team)
- Instance: Sub-boundary within partition (topic, session, category)
default/
├── default/ # General conversations
├── coding/ # Programming discussions
└── research/ # Research and analysis
alice/
├── personal/ # Personal conversations
├── work/ # Work-related discussions
└── learning/ # Educational content
team/
├── meetings/ # Team meeting notes
├── planning/ # Project planning
└── retrospectives/ # Review sessions
Partition Management
Creating Partitions: Partitions are created automatically when first used:
# Create new partition by using it
echo "Starting new project discussions" | reservoir ingest --partition newproject
# Create instance within partition
echo "Technical architecture discussion" | reservoir ingest --partition newproject --instance architecture
Partition Benefits:
- Isolation: Keep different contexts separate
- Search Scoping: Limit searches to relevant content
- Access Control: Enable future access restrictions
- Organization: Maintain clean separation of concerns
Data Isolation
Partitions provide logical isolation:
- Context Enrichment: Only includes messages from same partition/instance
- Search: Can be scoped to specific partitions
- Export: Can filter by partition (with additional tooling)
- Privacy: Enables separation of personal/professional content
Data Integrity
Backup Strategies
Daily Backups:
#!/bin/bash
# Daily backup script
BACKUP_DIR="/backup/reservoir"
DATE=$(date +%Y%m%d)
TIMESTAMP=$(date +%H%M%S)
# Create backup directory
mkdir -p "$BACKUP_DIR/$DATE"
# Export data
reservoir export > "$BACKUP_DIR/$DATE/reservoir_$TIMESTAMP.json"
# Compress older backups
find "$BACKUP_DIR" -name "*.json" -mtime +7 -exec gzip {} \;
# Clean old backups (keep 30 days)
find "$BACKUP_DIR" -name "*.json.gz" -mtime +30 -delete
# Log backup
echo "$(date): Backup completed - $BACKUP_DIR/$DATE/reservoir_$TIMESTAMP.json" >> /var/log/reservoir_backup.log
Incremental Exports:
# Export recent messages (last 24 hours)
reservoir view 1000 | jq -r '.[] | select(.timestamp > "'$(date -d '1 day ago' -Iseconds)'")' > incremental_backup.json
Data Validation
Verify Data Integrity:
# Check message count
TOTAL_MESSAGES=$(reservoir export | jq length)
echo "Total messages: $TOTAL_MESSAGES"
# Verify embeddings
EMBEDDED_COUNT=$(reservoir export | jq '[.[] | select(.embedding != null)] | length')
echo "Messages with embeddings: $EMBEDDED_COUNT"
# Check partition distribution
reservoir export | jq -r '.[] | .partition' | sort | uniq -c
Recovery Procedures
Restore from Backup:
# Stop Reservoir (if running as service)
systemctl stop reservoir
# Clear existing data (WARNING: destructive)
# This requires manual Neo4j database clearing
# Import backup
reservoir import /backup/reservoir/20240115/reservoir_full.json
# Verify restoration
reservoir view 10
reservoir search --semantic "test"
# Restart service
systemctl start reservoir
Advanced Data Operations
Data Analysis
Export for Analysis:
# Export specific fields for analysis
reservoir export | jq -r '.[] | [.timestamp, .partition, .role, (.content | length)] | @csv' > message_stats.csv
# Analyze conversation patterns
reservoir export | jq -r '.[] | .partition' | sort | uniq -c | sort -nr
# Find most active time periods
reservoir export | jq -r '.[] | .timestamp[0:10]' | sort | uniq -c | sort -nr
Data Transformation
Format Conversion:
# Convert to CSV format
reservoir export | jq -r '.[] | [.timestamp, .partition, .instance, .role, .content] | @csv' > conversations.csv
# Extract just message content
reservoir export | jq -r '.[] | .content' > all_messages.txt
# Create markdown format
reservoir export | jq -r '.[] | "## " + .timestamp + " (" + .role + ")\n\n" + .content + "\n"' > conversations.md
Embedding Management
Replay Embeddings: When embedding models change or for data recovery:
# Replay embeddings for all messages
reservoir replay
# Replay for specific model/partition
reservoir replay bge-large-en-v15
# Monitor embedding progress
# (Check logs for embedding generation status)
tail -f /var/log/reservoir.log | grep -i embedding
Best Practices
Regular Maintenance
- Schedule Regular Backups: Daily exports with compression
- Monitor Disk Usage: Embeddings require significant storage
- Validate Data Integrity: Regular checks for missing embeddings
- Clean Old Logs: Rotate and archive log files
- Test Recovery: Periodically test backup restoration
Storage Optimization
- Compress Backups: Use gzip for long-term storage
- Archive Old Data: Move historical data to cold storage
- Monitor Neo4j Storage: Regular database maintenance
- Embedding Efficiency: Consider embedding model size vs. quality
Security Considerations
- Encrypt Backups: Sensitive conversation data should be encrypted
- Access Controls: Limit access to export/import capabilities
- Audit Trails: Log all data management operations
- Data Retention: Define policies for data lifecycle management
Data management in Reservoir is designed to be straightforward while providing enterprise-grade capabilities for backup, migration, and organization of your conversation data.
Command Line Interface
Reservoir provides a comprehensive command-line interface for managing your conversation data, searching through message history, and configuring the system. This section covers all available commands and their usage.
Overview
Reservoir's CLI allows you to:
- Start the proxy server
- Search through conversations
- Import and export conversation data
- View recent messages
- Ingest new messages manually
- Configure system settings
- Replay embeddings for existing data
Available Commands
reservoir start
Start the Reservoir proxy server.
reservoir start [OPTIONS]
Options:
-o, --ollama
- Ollama mode which sets up on same default port as ollama useful for using as a proxy for clients that don't support setting a url-h, --help
- Print help-V, --version
- Print version
Examples:
# Start in normal mode
reservoir start
# Start in Ollama mode (uses port 11434)
reservoir start --ollama
reservoir search
Search messages by keyword or semantic similarity.
reservoir search [OPTIONS] <TERM>
Arguments:
<TERM>
- The search term (keyword or semantic)
Options:
--semantic
- Use semantic search instead of keyword search-p, --partition <PARTITION>
- Partition to search (defaults to "default")-i, --instance <INSTANCE>
- Instance to search (defaults to partition)-l, --link
- Use the same search strategy as RAG does when injecting into the model-d, --deduplicate
- Deduplicate first similarity results-h, --help
- Print help-V, --version
- Print version
Examples:
# Keyword search
reservoir search "python programming"
# Semantic search
reservoir search --semantic "machine learning concepts"
# Search in specific partition/instance
reservoir search --partition alice --instance coding "neural networks"
# Use RAG search strategy
reservoir search --link --semantic "database design"
# Deduplicate results
reservoir search --deduplicate --semantic "API design"
reservoir export
Export all message nodes as JSON.
reservoir export
Options:
-h, --help
- Print help
Examples:
# Export all messages to stdout
reservoir export > my_conversations.json
# Export and view
reservoir export | jq '.[0]'
reservoir import
Import message nodes from a JSON file.
reservoir import <FILE>
Arguments:
<FILE>
- Path to the JSON file to import
Options:
-h, --help
- Print help-V, --version
- Print version
Examples:
# Import from a file
reservoir import my_conversations.json
# Import from a backup
reservoir import backup_2024_01_15.json
reservoir view
View last x messages in the default partition/instance.
reservoir view [OPTIONS] <COUNT>
Arguments:
<COUNT>
- Number of messages to display
Options:
-p, --partition <PARTITION>
- Partition to view (defaults to "default")-i, --instance <INSTANCE>
- Instance to view (defaults to partition)-h, --help
- Print help-V, --version
- Print version
Examples:
# View last 10 messages
reservoir view 10
# View messages from specific partition
reservoir view --partition alice 5
# View messages from specific instance
reservoir view --partition alice --instance coding 15
reservoir ingest
Ingest a message from stdin as a user MessageNode.
reservoir ingest [OPTIONS]
Options:
-p, --partition <PARTITION>
- Partition to save the message in (defaults to "default")-i, --instance <INSTANCE>
- Instance to save the message in (defaults to partition)--role <ROLE>
- Role to assign to the message (defaults to "user")-h, --help
- Print help-V, --version
- Print version
Examples:
# Ingest a user message
echo "How do I implement a binary search tree?" | reservoir ingest
# Ingest to specific partition/instance
echo "What are design patterns?" | reservoir ingest --partition alice --instance coding
# Ingest as assistant message
echo "Here's how to implement a BST..." | reservoir ingest --role assistant
# Ingest from file
cat question.txt | reservoir ingest --partition research --instance ai
reservoir config
Set or get default configuration values with your config.toml.
reservoir config [OPTIONS]
Options:
-s, --set <SET>
- Set a configuration value. Use the format key=value.reservoir config --set model=gpt-4-turbo
-g, --get <GET>
- Get your current configuration value.reservoir config --get model
-h, --help
- Print help-V, --version
- Print version
Examples:
# View current configuration
reservoir config --get semantic_context_size
# Set configuration value
reservoir config --set semantic_context_size=20
# Set Neo4j connection
reservoir config --set neo4j_uri=bolt://localhost:7687
reservoir replay
Replay embeddings process.
reservoir replay [MODEL]
Arguments:
[MODEL]
- Partition to replay (defaults to "default")
Options:
-h, --help
- Print help-V, --version
- Print version
Examples:
# Replay embeddings for default model
reservoir replay
# Replay for specific model
reservoir replay bge-large-en-v15
Common Workflows
Daily Usage
# Start the server
reservoir start
# View recent conversations
reservoir view 10
# Search for specific topics
reservoir search --semantic "machine learning"
# Add a note or question
echo "Remember to implement error handling" | reservoir ingest
Data Management
# Export all data for backup
reservoir export > backup_$(date +%Y%m%d).json
# Import previous backup
reservoir import backup_20240115.json
# View configuration
reservoir config --get semantic_context_size
Development and Testing
# Start in Ollama mode for local testing
reservoir start --ollama
# Search with debugging
reservoir search --link --deduplicate --semantic "API design"
# Replay embeddings after model changes
reservoir replay bge-large-en-v15
Configuration
The CLI respects configuration from:
- Command-line arguments (highest priority)
- Configuration file (
~/.config/reservoir/reservoir.toml
) - Environment variables
- Default values (lowest priority)
See Environment Variables for detailed configuration options.
Error Handling
The CLI provides helpful error messages for common issues:
- Connection errors: Check if Neo4j is running
- Permission errors: Verify file permissions for import/export
- Invalid arguments: Use
--help
for correct syntax - Configuration errors: Verify config file format
Integration with Scripts
The CLI is designed to work well in scripts and automation:
#!/bin/bash
# Backup and restart script
# Export current data
reservoir export > "backup_$(date +%Y%m%d_%H%M%S).json"
# Restart with fresh embeddings
reservoir replay
# Start the server
reservoir start
System Architecture
Reservoir is designed as a transparent proxy for OpenAI-compatible APIs, with a focus on capturing and enriching AI conversations. This section provides an overview of the system architecture and how components interact.
Request Processing Sequence
Reservoir intercepts your API calls, enriches them with relevant history, manages token limits, and then forwards them to the actual Language Model service. Here's the detailed sequence:
sequenceDiagram participant App participant Reservoir participant Neo4j participant LLM as OpenAI/Ollama App->>Reservoir: Request (e.g. /v1/chat/completions/$USER/my-application) Reservoir->>Reservoir: Check if last message exceeds token limit (Return error if true) Reservoir->>Reservoir: Tag with Trace ID + Partition Reservoir->>Neo4j: Store original request message(s) %% --- Context Enrichment Steps --- Reservoir->>Neo4j: Query for similar & recent messages Neo4j-->>Reservoir: Return relevant context messages Reservoir->>Reservoir: Inject context messages into request payload %% --- End Enrichment Steps --- Reservoir->>Reservoir: Check total token count & truncate if needed (preserving system/last messages) Reservoir->>LLM: Forward enriched & potentially truncated request LLM->>Reservoir: Return LLM response Reservoir->>Neo4j: Store LLM response message Reservoir->>App: Return LLM response
High-Level Architecture
flowchart TB Client(["Client App"]) -->|API Request| HTTPServer{{HTTP Server}} HTTPServer -->|Process Request| Handler[Request Handler] subgraph Handler Logic direction LR Handler_Start(Start) --> CheckInputTokens(Check Input Tokens) CheckInputTokens -- OK --> StoreRequest(Store Request) CheckInputTokens -- Too Long --> ReturnError(Return Error Response) StoreRequest --> QueryContext(Query Neo4j for Context) QueryContext --> InjectContext(Inject Context) InjectContext --> CheckTotalTokens(Check/Truncate Total Tokens) CheckTotalTokens --> ForwardRequest(Forward to LLM) end Handler -->|Store/Query| Neo4j[(Neo4j Database)] Handler -->|Forward/Receive| OpenAI([OpenAI/Ollama API]) OpenAI --> Handler Handler -->|Return Response| HTTPServer HTTPServer -->|API Response| Client Config[/Env Vars/] --> HTTPServer Config --> Handler Config --> Neo4j
Core Components
1. Client Application
Your application making API calls to Reservoir. This could be:
- A web application using the OpenAI JavaScript library
- A Python script using the OpenAI Python library
- A command-line tool like curl
- Any application that can make HTTP requests
2. HTTP Server (Hyper/Tokio)
The HTTP server built on Rust's async ecosystem:
- Receives requests on the configured port (default: 3017)
- Routes based on URL path following the pattern
/v1/partition/{partition}/instance/{instance}/chat/completions
- Handles CORS for web applications
- Manages concurrent requests efficiently using Tokio's async runtime
3. Request Handler
The core logic that processes each request:
Input Validation
- Token size checking: Validates that the last message doesn't exceed token limits
- Request format validation: Ensures the request follows OpenAI's API structure
- Authentication: Forwards API keys to the appropriate provider
Context Management
- Trace ID assignment: Each request gets a unique identifier for tracking
- Partition/Instance extraction: Pulls organization parameters from the URL path
- Message storage: Stores incoming messages in Neo4j with proper tagging
Context Enrichment
- Historical context query: Searches Neo4j for relevant past conversations
- Similarity matching: Uses vector embeddings to find semantically similar messages
- Recency filtering: Includes recent messages from the same partition/instance
- Context injection: Adds relevant context to the request payload
Token Management
- Total token calculation: Counts tokens in the enriched message list
- Smart truncation: Removes older context while preserving system prompts and latest messages
- Provider-specific limits: Respects different token limits for different models
Request Forwarding
- Provider routing: Automatically routes to the correct provider based on model name
- Request forwarding: Sends the enriched request to the upstream LLM
- Response handling: Processes and stores the LLM's response
Relationship Building
- Synapse connections: Links semantically similar messages using vector similarity
- Weak connection removal: Removes relationships with similarity scores below 0.85
- Conversation threading: Maintains coherent conversation threads over time
4. Neo4j Database
The graph database that stores all conversation data:
Data Storage
- MessageNode entities: Each message is stored as a node with properties
- Partition/Instance tagging: Messages are tagged for proper organization
- Vector embeddings: Semantic representations for similarity search
- Temporal information: Timestamps for recency-based queries
Graph Relationships
- Synapse relationships: Connect related messages across conversations
- Conversation threads: Maintain sequential flow of discussions
- Similarity scores: Weighted relationships based on semantic similarity
Query Capabilities
- Vector similarity search: Find semantically similar messages
- Temporal queries: Retrieve recent messages within time windows
- Graph traversal: Navigate conversation relationships
- Partition/Instance filtering: Scope queries to specific contexts
5. External LLM Services
Reservoir supports multiple AI providers:
- OpenAI: GPT-4, GPT-4o, GPT-3.5-turbo, and specialized models
- Ollama: Local models like Llama, Gemma, and custom models
- Mistral AI: Mistral's cloud-hosted models
- Google Gemini: Google's AI models
- Custom providers: Any OpenAI-compatible API endpoint
6. Configuration Management
Environment-based configuration:
- Database connection: Neo4j URI, credentials, and connection pooling
- Server settings: Port, host, CORS configuration
- API keys: Credentials for various AI providers
- Provider endpoints: Custom URLs for different services
- Token limits: Configurable limits for different models
Request Processing Flow
- Request Arrival: Client sends a request to Reservoir's endpoint
- URL Parsing: Extract partition and instance from the URL path
- Input Validation: Check message format and token limits
- Message Storage: Store the user's message in Neo4j
- Context Retrieval: Query for relevant historical context
- Context Enrichment: Inject relevant messages into the request
- Token Management: Ensure the enriched request fits within limits
- Provider Routing: Determine which AI provider to use
- Request Forwarding: Send the enriched request to the AI provider
- Response Processing: Receive and process the AI's response
- Response Storage: Store the AI's response in Neo4j
- Relationship Building: Create or update message relationships
- Response Return: Send the response back to the client
Scalability Considerations
Horizontal Scaling
- Stateless design: Each request is independent
- Database connection pooling: Efficient resource utilization
- Async processing: Non-blocking I/O for high concurrency
Vertical Scaling
- Memory management: Efficient vector storage and retrieval
- CPU optimization: Fast similarity calculations
- Disk I/O: Optimized database queries and indexing
Performance Optimizations
- Vector indexing: Fast similarity search in Neo4j
- Connection pooling: Reuse database connections
- Caching strategies: Cache frequently accessed data
- Batching: Efficient bulk operations where possible
Security Architecture
Authentication
- API key forwarding: Secure handling of provider credentials
- No key storage: Reservoir doesn't store AI provider keys
- Environment-based secrets: Secure configuration management
Data Privacy
- Local storage: All conversation data stays on your infrastructure
- No external logging: Conversation content never leaves your network
- Configurable retention: Control how long data is stored
Access Control
- Partition isolation: Conversations are isolated by partition/instance
- URL-based permissions: Access control through URL structure
- Network security: Configurable CORS and network policies
Monitoring and Observability
Logging
- Request tracing: Unique trace IDs for each request
- Error logging: Detailed error information for debugging
- Performance metrics: Request timing and processing statistics
Health Checks
- Database connectivity: Monitor Neo4j connection health
- Provider availability: Check AI service availability
- Resource utilization: Memory and CPU monitoring
This architecture provides a robust, scalable foundation for AI conversation management while maintaining transparency and compatibility with existing applications.
Data Model
Reservoir uses Neo4j as its graph database to store conversations and their relationships. This section provides a detailed overview of the data model, including nodes, relationships, and how they work together to enable intelligent conversation management.
Overview
The data model is designed around the concept of messages as nodes in a graph, with relationships that capture both the conversational flow and semantic similarities. This approach enables powerful querying capabilities for context enrichment and conversation analysis.
Nodes
MessageNode
Represents a single message in a conversation, whether from a user or an LLM assistant.
Property | Type | Description |
---|---|---|
trace_id | String | Unique identifier per request/response pair |
partition | String | Logical namespace from URL, typically the system username ($USER ) |
instance | String | Specific context within partition, typically the application name |
role | String | Role of the message (user or assistant ) |
content | String | The text content of the message |
timestamp | DateTime | When the message was created (ISO 8601 format) |
embedding | Vector | Vector representation of the message for similarity search |
url | String | Optional URL associated with the message |
Example MessageNode
CREATE (m:MessageNode {
trace_id: "abc123-def456-ghi789",
partition: "alice",
instance: "code-assistant",
role: "user",
content: "How do I implement a binary search tree?",
timestamp: "2024-01-15T10:30:00Z",
embedding: [0.1, -0.2, 0.3, ...],
url: null
})
Relationships
The data model uses two types of relationships to capture different aspects of conversation structure:
RESPONDED_WITH
Links a user message to its corresponding assistant response, preserving the original conversation flow.
Properties:
- Direction:
(User Message)-[:RESPONDED_WITH]->(Assistant Message)
- Cardinality: One-to-one (each user message has exactly one assistant response)
- Mutability: Immutable once created
Purpose:
- Maintains conversation integrity
- Enables reconstruction of original conversation threads
- Provides audit trail for request/response pairs
SYNAPSE
Links semantically similar messages based on vector similarity, enabling cross-conversation context discovery.
Properties:
- Direction: Bidirectional (similarity is symmetric)
- Score: Float value representing similarity strength (0.0 to 1.0)
- Threshold: Minimum score of 0.85 required for synapse creation
- Mutability: Dynamic (can be created, updated, or removed)
Creation Rules:
- Sequential Synapses: Initially created between consecutive messages in a conversation
- Similarity Synapses: Created between messages with high semantic similarity (≥ 0.85)
- Cross-Conversation: Can link messages from different conversations within the same partition/instance
- Pruning: Synapses with scores below threshold are automatically removed
Example Synapse
(m1:MessageNode)-[:SYNAPSE {score: 0.92}]-(m2:MessageNode)
Graph Structure Example
┌─────────────────┐ RESPONDED_WITH ┌─────────────────┐
│ User Message │────────────────────→│Assistant Message│
│ "Explain BST" │ │ "A binary..." │
└─────────────────┘ └─────────────────┘
│ │
│ SYNAPSE │ SYNAPSE
│ {score: 0.91} │ {score: 0.87}
▼ ▼
┌─────────────────┐ RESPONDED_WITH ┌─────────────────┐
│ User Message │────────────────────→│Assistant Message│
│ "How to code │ │ "Here's how..." │
│ tree search?" │ │ │
└─────────────────┘ └─────────────────┘
Real Conversation Graph Visualization
Here's an example of how conversations and their threads appear in practice, showing the synapse relationships that connect semantically related messages across different conversation flows:
This visualization shows:
- Message nodes representing individual user and assistant messages
- RESPONDED_WITH relationships (direct conversation flow)
- SYNAPSE relationships connecting semantically similar messages
- Conversation threads formed by chains of related messages
- Cross-conversation connections where topics are discussed in multiple conversations
The graph structure enables Reservoir to find relevant context from past conversations when enriching new requests, creating a rich conversational memory that spans multiple sessions and topics.
Vector Index
Reservoir maintains a vector index called messageEmbeddings
in Neo4j for efficient similarity searches.
Index Configuration
CREATE VECTOR INDEX messageEmbeddings
FOR (m:MessageNode) ON (m.embedding)
OPTIONS {indexConfig: {
`vector.dimensions`: 1536,
`vector.similarity_function`: 'cosine'
}}
Similarity Search
The vector index enables fast cosine similarity searches:
CALL db.index.vector.queryNodes('messageEmbeddings', 10, $queryEmbedding)
YIELD node, score
WHERE node.partition = $partition AND node.instance = $instance
RETURN node, score
ORDER BY score DESC
Partitioning Strategy
Partition
- Purpose: Top-level organization boundary
- Typical Value: System username (
$USER
) - Scope: All messages for a specific user
- Isolation: Messages from different partitions never interact
Instance
- Purpose: Application-specific context within a partition
- Typical Value: Application name (e.g., "code-assistant", "chat-app")
- Scope: Specific use case or application context
- Organization: Multiple instances can exist within a partition
Example Organization
Partition: "alice"
├── Instance: "code-assistant"
│ ├── Programming questions
│ └── Code review discussions
├── Instance: "research-helper"
│ ├── Literature reviews
│ └── Data analysis questions
└── Instance: "personal-chat"
├── General conversations
└── Daily planning
Relationship Types: Fixed vs. Dynamic
Fixed Relationships
Characteristics:
- Immutable once created
- Preserve data integrity
- Represent factual conversation structure
Examples:
MessageNode
properties (once created, content doesn't change)RESPONDED_WITH
relationships (permanent conversation pairs)
Dynamic Relationships
Characteristics:
- Mutable and adaptive
- Support learning and optimization
- Reflect current understanding of semantic relationships
Examples:
SYNAPSE
relationships (can be created, updated, or removed)- Similarity scores (can be recalculated as algorithms improve)
Query Patterns
Context Enrichment Query
// Find recent and similar messages for context
MATCH (m:MessageNode)
WHERE m.partition = $partition
AND m.instance = $instance
AND m.timestamp > $recentThreshold
WITH m
ORDER BY m.timestamp DESC
LIMIT 10
UNION
CALL db.index.vector.queryNodes('messageEmbeddings', 5, $queryEmbedding)
YIELD node, score
WHERE node.partition = $partition
AND node.instance = $instance
AND score > 0.85
RETURN node, score
ORDER BY score DESC
Conversation Thread Reconstruction
// Reconstruct a conversation thread
MATCH (user:MessageNode {role: 'user'})-[:RESPONDED_WITH]->(assistant:MessageNode)
WHERE user.trace_id = $traceId
RETURN user, assistant
ORDER BY user.timestamp
Synapse Network Analysis
// Find highly connected messages (conversation hubs)
MATCH (m:MessageNode)-[s:SYNAPSE]-(related:MessageNode)
WHERE m.partition = $partition AND m.instance = $instance
WITH m, count(s) as connectionCount, avg(s.score) as avgScore
WHERE connectionCount > 3
RETURN m, connectionCount, avgScore
ORDER BY connectionCount DESC, avgScore DESC
Data Lifecycle
Message Storage
- Ingestion: New messages are stored with embeddings
- Indexing: Vector embeddings are indexed for similarity search
- Relationship Creation:
RESPONDED_WITH
links are established - Synapse Building: Similar messages are connected via
SYNAPSE
relationships
Synapse Evolution
- Initial Creation: Sequential synapses between consecutive messages
- Similarity Detection: Cross-conversation synapses based on semantic similarity
- Threshold Enforcement: Weak synapses (score < 0.85) are removed
- Continuous Optimization: Relationships are updated as new messages arrive
Cleanup and Maintenance
- Orphaned Relationships: Periodic cleanup of broken relationships
- Index Optimization: Regular vector index maintenance
- Storage Optimization: Archival of old messages based on retention policies
Performance Considerations
Indexing Strategy
- Vector Index: Primary index for similarity searches
- Partition/Instance Index: Composite index for scoped queries
- Timestamp Index: Range queries for recent messages
- Role Index: Fast filtering by message role
Query Optimization
- Parameterized Queries: Use query parameters to enable plan caching
- Result Limiting: Always limit result sets for performance
- Selective Filtering: Apply partition/instance filters early
- Vector Search Tuning: Optimize similarity thresholds and result counts
Scaling Considerations
- Horizontal Partitioning: Distribute data across multiple Neo4j instances
- Read Replicas: Use read replicas for query-heavy workloads
- Connection Pooling: Efficient database connection management
- Batch Operations: Use batch writes for bulk data operations
This data model provides a robust foundation for conversation storage and retrieval while maintaining flexibility for future enhancements and optimizations.
Context Enrichment
Context enrichment is Reservoir's core mechanism for providing intelligent, memory-aware LLM conversations. By automatically injecting relevant historical context and recent conversation history into each request, Reservoir gives LLM models a persistent memory that improves response quality and maintains conversational continuity across sessions.
Overview
When you send a message to Reservoir, the system automatically enhances your request with:
- Semantically similar messages from past conversations (using vector similarity search)
- Recent conversation history from the same partition/instance
- Connected conversation threads through synapse relationships
This enriched context is injected into your request before forwarding it to the LLM provider, making the LLM aware of relevant past discussions.
Context Enrichment Process
1. Message Reception and Initial Processing
pub async fn handle_with_partition(
partition: &str,
instance: &str,
whole_body: Bytes,
) -> Result<Bytes, Error> {
let json_string = String::from_utf8_lossy(&whole_body).to_string();
let chat_request_model = ChatRequest::from_json(json_string.as_str()).expect("Valid JSON");
let model_info = ModelInfo::new(chat_request_model.model.clone());
let trace_id = Uuid::new_v4().to_string();
let service = ChatRequestService::new();
When a request arrives:
- A unique trace ID is generated for tracking
- The request is parsed and validated
- Model information is extracted to determine token limits
2. Embedding Generation
let search_term = last_message.content.as_str();
get_last_message_in_chat_request(&chat_request_model)?;
info!("Using search term: {}", search_term);
let embedding_info = EmbeddingInfo::with_fastembed("bge-large-en-v15");
let embeddings = get_embeddings_for_txt(search_term, embedding_info.clone()).await?;
let context_size = config::get_context_size();
The last user message is used as the search term to generate vector embeddings using the BGE-Large-EN-v1.5 model. This embedding represents the semantic meaning of the current query.
3. Semantic Context Retrieval
let similar = get_related_messages_with_strategy(
embeddings,
&embedding_info,
partition,
instance,
context_size,
)
.await?;
Using the generated embedding, Reservoir searches for semantically similar messages from past conversations within the same partition/instance. The search strategy includes:
Vector Similarity Search
let query_string = format!(
r#"
CALL db.index.vector.queryNodes(
'{}',
$topKExtended,
$embedding
) YIELD node, score
WITH node, score
WHERE node.partition = $partition
AND node.instance = $instance
RETURN node.partition AS partition,
node.instance AS instance,
node.embedding AS embedding,
node.model AS model,
id(node) AS id,
score
ORDER BY score DESC
"#,
embedding_info.get_index_name()
);
Synapse Expansion
pub async fn get_related_messages_with_strategy(
embedding: Vec<f32>,
embedding_info: &EmbeddingInfo,
partition: &str,
instance: &str,
top_k: usize,
) -> Result<Vec<MessageNode>, Error> {
let similar_messages =
get_most_similar_messages(embedding, embedding_info, partition, instance, top_k).await?;
let mut found_messages = vec![];
for message in similar_messages.clone() {
let mut connected = get_nodes_connected_by_synapses(connect, &message).await?;
if found_messages.len() > top_k * 3 {
break;
}
if connected.len() > 2 {
found_messages.append(connected.as_mut());
}
found_messages = deduplicate_message_nodes(found_messages);
}
Ok(found_messages.into_iter().take(top_k).collect())
}
The system expands the context by following synapse relationships - connections between messages that are semantically similar (cosine similarity > 0.85).
4. Recent History Retrieval
let last_messages = get_last_messages_for_partition_and_instance(
connect,
partition.to_string(),
instance.to_string(),
LAST_MESSAGES_LIMIT,
)
.await
.unwrap_or_else(|e| {
error!("Error finding last messages: {}", e);
Vec::new()
});
Retrieves the most recent 15 messages from the same partition/instance to provide immediate conversational context.
5. Context Injection
let mut enriched_chat_request =
enrich_chat_request(similar, last_messages, &chat_request_model);
truncate_messages_if_needed(&mut enriched_chat_request.messages, model_info.input_tokens);
The enrich_chat_request
function combines all context sources:
pub fn enrich_chat_request(
similar_messages: Vec<MessageNode>,
mut last_messages: Vec<MessageNode>, // Add `mut` here
chat_request: &ChatRequest,
) -> ChatRequest {
let mut chat_request = chat_request.clone();
let semantic_prompt = r#"The following is the result of a semantic search
of the most related messages by cosine similarity to previous
conversations"#;
let recent_prompt = r#"The following are the most recent messages in the
conversation in chronological order"#;
last_messages.sort_by(|a, b| a.timestamp.cmp(&b.timestamp));
let mut enrichment_block = Vec::new();
enrichment_block.push(Message {
role: "system".to_string(),
content: semantic_prompt.to_string(),
});
enrichment_block.extend(similar_messages.iter().map(MessageNode::to_message));
enrichment_block.push(Message {
role: "system".to_string(),
content: recent_prompt.to_string(),
});
enrichment_block.extend(last_messages.iter().map(MessageNode::to_message));
enrichment_block.retain(|m| !m.content.is_empty());
let insert_index = if chat_request
.messages
.first()
.is_some_and(|m| m.role == "system")
{
1
} else {
0
};
// Insert enrichment block
chat_request
.messages
.splice(insert_index..insert_index, enrichment_block);
chat_request
}
The enrichment process:
- Creates descriptive system prompts to explain the context
- Adds semantically similar messages with explanation
- Adds recent chronological history with explanation
- Inserts the enrichment block after any existing system message
- Filters out empty messages
6. Token Management and Truncation
truncate_messages_if_needed(&mut enriched_chat_request.messages, model_info.input_tokens);
The enriched request may exceed the model's token limits. The truncation algorithm:
pub fn truncate_messages_if_needed(messages: &mut Vec<Message>, limit: usize) {
let mut current_tokens = count_chat_tokens(messages);
info!("Current token count: {}", current_tokens);
if current_tokens <= limit {
return; // No truncation needed
}
info!(
"Token count ({}) exceeds limit ({}), truncating...",
current_tokens, limit
);
// Identify indices of system messages and the last message
let system_message_indices: HashSet<usize> = messages
.iter()
.enumerate()
.filter(|(_, m)| m.role == "system")
.map(|(i, _)| i)
.collect();
let last_message_index = messages.len().saturating_sub(1); // Index of the last message
// Start checking for removal from the first message
let mut current_index = 0;
while current_tokens > limit && current_index < messages.len() {
// Check if the current index is a system message or the last message
if system_message_indices.contains(¤t_index) || current_index == last_message_index {
// Skip this message, move to the next index
current_index += 1;
continue;
}
// If it's safe to remove (not system, not the last message)
if messages.len() > 1 {
// Ensure we don't remove the only message left (shouldn't happen here)
info!(
"Removing message at index {}: Role='{}', Content='{}...'",
current_index,
messages[current_index].role,
messages[current_index]
.content
.chars()
.take(30)
.collect::<String>()
);
messages.remove(current_index);
// Don't increment current_index, as removing shifts subsequent elements down.
// Recalculate tokens and update system/last indices if needed (though less efficient)
// For simplicity here, we just recalculate tokens. A more optimized approach
// might update indices, but given the context size, recalculating tokens is okay.
current_tokens = count_chat_tokens(messages);
// Re-evaluate system_message_indices and last_message_index is safer if indices change significantly,
// but let's stick to the simpler approach for now. If performance becomes an issue, optimize this.
} else {
// Safety break: Should not be able to remove the last message due to the check above.
error!("Warning: Truncation stopped unexpectedly.");
break;
}
}
info!("Truncated token count: {}", current_tokens);
}
The truncation algorithm preserves:
- All system messages (including enrichment context)
- The user's current/last message
- Removes older context messages if needed
7. Response Storage and Synapse Building
After receiving the LLM's response:
let message_node = chat_response.choices.first().unwrap().message.clone();
let embedding =
get_embeddings_for_txt(message_node.content.as_str(), embedding_info.clone()).await?;
let message_node = MessageNode::from_message(
&message_node,
trace_id.as_str(),
partition,
instance,
embedding,
);
save_message_node(connect, &message_node, &embedding_info)
.await
.expect("Failed to save message node");
connect_synapses(connect)
.await
.expect("Failed to connect synapses");
- The LLM's response is stored with its own embedding
- Synapses (semantic connections) are built between messages
- The system continuously builds a knowledge graph of related conversations
Context Architecture Flow
flowchart TD A["User Request Arrives"] --> B["Generate Trace ID & Parse Request"] B --> C["Extract Last User Message"] C --> D["Generate Embedding<br/>(BGE-Large-EN-v1.5)"] %% Parallel context retrieval D --> E["Semantic Search"] D --> F["Recent History Query"] E --> E1["Vector Similarity Search<br/>(Neo4j Index)"] E1 --> E2["Expand via Synapses<br/>(Related Conversations)"] E2 --> E3["Deduplicate Messages"] F --> F1["Get Last 15 Messages<br/>(Same Partition/Instance)"] F1 --> F2["Sort by Timestamp"] %% Context assembly E3 --> G["Assemble Context Block"] F2 --> G G --> G1["Add Semantic Context<br/>'The following is semantic search...'"] G1 --> G2["Add Similar Messages"] G2 --> G3["Add Recent Context<br/>'The following are recent messages...'"] G3 --> G4["Add Recent Messages"] %% Context injection G4 --> H["Inject Context into Request"] H --> H1{"Check if System Message Exists"} H1 -->|Yes| H2["Insert after System Message"] H1 -->|No| H3["Insert at Beginning"] H2 --> I["Token Management"] H3 --> I %% Token management I --> I1["Count Total Tokens"] I1 --> I2{"Exceeds Token Limit?"} I2 -->|No| J["Send to AI Provider"] I2 -->|Yes| I3["Smart Truncation"] I3 --> I4["Preserve System Messages"] I4 --> I5["Preserve Last User Message"] I5 --> I6["Remove Older Context"] I6 --> I7["Recalculate Tokens"] I7 --> I2 %% AI interaction J --> K["AI Provider Response"] K --> L["Store Response"] %% Post-processing L --> L1["Generate Response Embedding"] L1 --> L2["Save to Neo4j with Trace ID"] L2 --> L3["Link User-Assistant Messages"] L3 --> M["Build Synapses"] M --> M1["Calculate Similarity Scores<br/>(Cosine Similarity)"] M1 --> M2["Create SYNAPSE Relationships<br/>(Score > 0.85)"] M2 --> M3["Remove Weak Synapses<br/>(Score < 0.85)"] M3 --> N["Return Enriched Response"] %% Styling classDef inputStep fill:#e1f5fe classDef processStep fill:#f3e5f5 classDef storageStep fill:#e8f5e8 classDef aiStep fill:#fff3e0 classDef outputStep fill:#fce4ec class A,C inputStep class B,D,E,E1,E2,E3,F,F1,F2,G,G1,G2,G3,G4,H,H1,H2,H3,I,I1,I2,I3,I4,I5,I6,I7 processStep class L,L1,L2,L3,M,M1,M2,M3 storageStep class J,K aiStep class N outputStep
Key Configuration Parameters
Context Size
pub fn get_context_size() -> usize {
get_config().semantic_context_size.unwrap_or(15)
}
The semantic context size (default: 15) determines how many semantically similar messages are retrieved and potentially included in the context.
Recent Messages Limit
const LAST_MESSAGES_LIMIT: usize = 15;
The system retrieves up to 15 most recent messages from the same partition/instance for chronological context.
Embedding Model
let embedding_info = EmbeddingInfo::with_fastembed("bge-large-en-v15");
Reservoir by default uses a local instace of BGE-Large-EN-v1.5 for generating embeddings, for providing high-quality semantic representations.
Synapse Threshold
MATCH (m1:MessageNode)-[r:SYNAPSE]->(m2:MessageNode)
WHERE r.score < 0.85
DELETE r
Only relationships with cosine similarity scores above 0.85 are maintained as synapses, ensuring high-quality semantic connections.
Key Concepts
Partitions and Instances
Context is scoped to specific partition/instance combinations, allowing for:
- Organizational separation: Different teams or projects can have isolated contexts
- Application isolation: Multiple applications can use the same Reservoir instance without cross-contamination
- User-specific contexts: Individual users can maintain separate conversation histories
Synapses
Synapses are semantic relationships between messages that:
- Connect related conversations across different sessions
- Build over time as the system learns from interactions
- Self-organize the knowledge graph based on content similarity
- Get pruned automatically when relationships are too weak (< 0.85 similarity)
Trace IDs
Every request gets a unique trace ID that:
- Links user messages to LLM responses within the same conversation turn
- Enables conversation threading and relationship building
- Provides audit trails for debugging and analysis
- Supports parallel processing of multiple simultaneous requests
System Context Compression
pub fn compress_system_context(messages: &[Message]) -> Vec<Message> {
let first_index = messages.iter().position(|m| m.role == "system");
let last_index = messages.iter().rposition(|m| m.role == "system");
if let (Some(first), Some(last)) = (first_index, last_index) {
if first != 0 || first == last {
return messages.to_vec();
}
let mut compressed = vec![messages[0].clone()];
for item in messages.iter().take(last + 1).skip(first + 1) {
compressed[0].content += &format!("\n{}", message_to_string(item));
}
compressed.extend_from_slice(&messages[last + 1..]);
compressed
} else {
messages.to_vec()
}
}
Multiple system messages (including enrichment context) are compressed into a single system message to optimize token usage while preserving all contextual information.
Benefits of Context Enrichment
- Conversational Continuity: LLM maintains awareness of past discussions across sessions
- Semantic Understanding: Related topics are automatically surfaced even when not explicitly mentioned
- Multi-Session Learning: Knowledge accumulates over time, improving response quality
- Cross-Model Memory: Context persists when switching between different LLM providers
- Intelligent Prioritization: Most relevant historical context is prioritized while respecting token limits
- Automatic Organization: The system builds its own knowledge graph without manual intervention
Performance Considerations
- Vector Indexing: Neo4j's vector indices provide sub-second similarity search even with large conversation histories
- Parallel Processing: Semantic search and recent history retrieval happen concurrently
- Smart Truncation: Context is intelligently trimmed to fit model limits while preserving essential information
- Synapse Pruning: Weak connections are automatically removed to maintain graph quality
- Token Optimization: System messages are compressed to maximize available context within token limits
Conversation Threads (Synapses)
Synapses are Reservoir's intelligent connection system that links semantically related messages across different conversations. Unlike traditional conversation threads that follow chronological order, synapses create a web of connections based on semantic similarity, enabling cross-conversation context discovery and knowledge building.
What are Synapses?
Synapses are bidirectional relationships between MessageNodes that represent semantic similarity. They enable Reservoir to:
- Connect related discussions across different conversations
- Build knowledge networks from accumulated conversations
- Enable context jumping between related topics
- Create conversational memory that spans sessions
How Synapses Work
Similarity Calculation
Synapses are created based on vector similarity between message embeddings:
- Embedding Generation: Each message is converted to a vector using BGE-Large-EN-v1.5
- Similarity Scoring: Cosine similarity is calculated between message vectors
- Threshold Filtering: Only connections with similarity ≥ 0.85 become synapses
- Bidirectional Links: Synapses work in both directions (A ↔ B)
Synapse Creation Process
flowchart TD A["New Message Arrives"] --> B["Generate Embedding"] B --> C["Find Similar Messages"] C --> D["Calculate Similarity Scores"] D --> E{"Score ≥ 0.85?"} E -->|Yes| F["Create SYNAPSE Relationship"] E -->|No| G["Skip Connection"] F --> H["Store Score and Model Info"] H --> I["Enable Cross-Conversation Context"] G --> I
Sequential vs. Semantic Synapses
Sequential Synapses: Connect consecutive messages in the same conversation
// Messages in same conversation thread
(msg1)-[:SYNAPSE {score: 0.95, model: "embedding1536"}]-(msg2)
Semantic Synapses: Connect similar messages from different conversations
// Messages from different conversations with similar content
(msg_python_q1)-[:SYNAPSE {score: 0.88, model: "embedding1536"}]-(msg_python_q2)
Synapse Properties
Score
Represents the semantic similarity strength between two messages:
- Range: 0.0 to 1.0 (higher is more similar)
- Threshold: Minimum 0.85 for synapse creation
- Calculation: Cosine similarity between embedding vectors
- Update: Can be recalculated as models improve
Model
Indicates which embedding model was used for similarity calculation:
- Current Default: "embedding1536" (BGE-Large-EN-v1.5)
- Purpose: Enables model-specific synapse management
- Future-Proofing: Supports multiple embedding models
Example Synapse Relationship
(message1:MessageNode)-[:SYNAPSE {
score: 0.92,
model: "embedding1536"
}]-(message2:MessageNode)
Synapse Network Examples
Programming Discussion Network
"How do I handle errors in Python?"
↓ SYNAPSE (0.91)
"What's the best way to catch exceptions?"
↓ SYNAPSE (0.87)
"Try/except blocks best practices"
↓ SYNAPSE (0.89)
"Error handling in async functions"
Cross-Topic Connections
"Database optimization techniques"
↓ SYNAPSE (0.86)
"Slow query performance issues"
↓ SYNAPSE (0.88)
"Index design for better performance"
Synapse Management
Automatic Creation
Synapses are created automatically during conversation processing:
// Simplified creation logic
if similarity_score >= 0.85 {
create_synapse(message1, message2, similarity_score, "embedding1536");
}
Pruning Low-Quality Synapses
Weak connections are automatically removed to maintain network quality:
// Remove synapses below threshold
MATCH (m1:MessageNode)-[r:SYNAPSE]->(m2:MessageNode)
WHERE r.score < 0.85
DELETE r
Synapse Evolution
Synapses can be updated as the system learns:
- Score Updates: Recalculate similarity with improved models
- Model Migration: Update synapses when switching embedding models
- Network Optimization: Remove redundant or weak connections
Using Synapses for Context
RAG Strategy with Synapses
When using --link
search strategy, Reservoir leverages synapses:
# Use synapse network for enhanced search
reservoir search --link --semantic "error handling"
Process:
- Find semantically similar messages
- Follow SYNAPSE relationships to connected messages
- Explore conversation threads via synapse networks
- Deduplicate and rank results
- Return most relevant connected discussions
Context Enrichment
Synapses enable intelligent context building:
// Context enrichment query using synapses
MATCH (query_msg:MessageNode)-[:SYNAPSE*1..3]-(related:MessageNode)
WHERE query_msg.content CONTAINS "database"
AND related.partition = $partition
AND related.instance = $instance
RETURN related
ORDER BY related.timestamp DESC
LIMIT 10
Synapse Network Analysis
Finding Conversation Hubs
Identify messages that are highly connected (conversation hubs):
# CLI command to export and analyze
reservoir export | jq -r '.[].content' > messages.txt
# Or via Neo4j query
MATCH (m:MessageNode)-[s:SYNAPSE]-(related:MessageNode)
WITH m, count(s) as connectionCount, avg(s.score) as avgScore
WHERE connectionCount > 5
RETURN m.content, connectionCount, avgScore
ORDER BY connectionCount DESC
Topic Clustering
Synapses naturally create topic clusters:
Cluster 1: Web Development
├── "React component best practices" (8 connections)
├── "JavaScript async patterns" (6 connections)
└── "CSS flexbox layouts" (4 connections)
Cluster 2: Database Design
├── "SQL query optimization" (7 connections)
├── "Database normalization" (5 connections)
└── "Index strategy for performance" (3 connections)
Performance Considerations
Synapse Creation Overhead
- Computation: Vector similarity calculation for each new message
- Storage: Additional relationships in Neo4j graph
- Indexing: Maintenance of vector indices
Optimization Strategies
- Batch Processing: Create synapses in batches during low-usage periods
- Threshold Tuning: Adjust similarity threshold based on use case
- Network Pruning: Regular cleanup of weak or obsolete synapses
- Model Efficiency: Balance embedding quality vs. computation cost
Advanced Synapse Features
Multi-Hop Connections
Synapses enable multi-hop context discovery:
// Find messages connected within 3 hops
MATCH path=(start:MessageNode)-[:SYNAPSE*1..3]-(end:MessageNode)
WHERE start.content CONTAINS "machine learning"
RETURN path, length(path)
ORDER BY length(path)
Conversation Path Finding
Discover how topics connect across conversations:
// Find shortest path between two topics
MATCH path=shortestPath(
(topic1:MessageNode {content: "Python async"})-[:SYNAPSE*]-(topic2:MessageNode {content: "Error handling"})
)
RETURN path
Synapse-Based Recommendations
Use synapse networks to suggest related topics:
# Find related discussions
reservoir search --link --semantic "current topic"
# Or get synapse-connected messages directly
echo "What related topics should I explore?" | reservoir ingest
# Context will include synapse-connected discussions
Troubleshooting Synapses
Common Issues
- Too Many Synapses: Lower the similarity threshold
- Too Few Synapses: Check embedding quality and threshold
- Irrelevant Connections: Review similarity calculation method
- Performance Issues: Implement batch processing
Diagnostic Commands
# View synapse statistics
reservoir export | jq '[.[] | select(.role=="user")] | length'
# Check similarity scores distribution
# (Requires Neo4j query access)
Synapse Replay
Rebuild synapse network when needed:
# Replay embeddings and rebuild synapses
reservoir replay
# This will:
# 1. Recalculate embeddings for all messages
# 2. Rebuild synapse relationships
# 3. Update similarity scores
# 4. Prune weak connections
Future Enhancements
Planned Features
- Weighted Synapses: Consider recency and conversation importance
- Topic-Aware Synapses: Enhanced similarity based on topic detection
- Hierarchical Synapses: Multi-level relationship strengths
- Synapse Analytics: Dashboard for network visualization
Customization Options
- Custom Similarity Functions: Beyond cosine similarity
- Domain-Specific Models: Specialized embeddings for specific fields
- User-Defined Thresholds: Per-partition similarity thresholds
- Manual Synapse Management: User-controlled connection creation
Synapses transform Reservoir from a simple conversation store into an intelligent knowledge network that grows more valuable with each interaction, creating a personalized LLM assistant with genuine conversational memory.
Multi-Provider Support
Reservoir supports multiple AI providers through its flexible routing system. This allows you to use different AI models seamlessly while maintaining conversation context and history across all providers.
Supported Providers
OpenAI
- Models: GPT-4, GPT-4o, GPT-4o-mini, GPT-3.5-turbo, GPT-4o-search-preview
- API Key Required: Yes (
OPENAI_API_KEY
) - Endpoint:
https://api.openai.com/v1/chat/completions
- Features: Full feature support, web search capabilities
Ollama
- Models: llama3.2, gemma3, and any locally installed models
- API Key Required: No
- Endpoint:
http://localhost:11434/v1/chat/completions
- Features: Local inference, privacy-focused, custom model support
Mistral AI
- Models: mistral-large-2402, mistral-medium, mistral-small
- API Key Required: Yes (
MISTRAL_API_KEY
) - Endpoint:
https://api.mistral.ai/v1/chat/completions
- Features: European AI provider, competitive performance
Google Gemini
- Models: gemini-2.0-flash, gemini-2.5-flash-preview-05-20
- API Key Required: Yes (
GEMINI_API_KEY
) - Endpoint: Custom Google AI endpoint
- Features: Google's latest AI models, multimodal capabilities
Custom Providers
- Models: Any model name not explicitly configured
- Default Routing: Routes to Ollama by default
- Configuration: Set custom endpoints via environment variables
Automatic Model Routing
Reservoir automatically determines which provider to use based on the model name in your request:
{
"model": "gpt-4", // → Routes to OpenAI
"model": "llama3.2", // → Routes to Ollama
"model": "mistral-large", // → Routes to Mistral
"model": "gemini-2.0-flash" // → Routes to Google
}
Configuration
Environment Variables
Set provider endpoints and API keys:
# API Keys
export OPENAI_API_KEY="sk-your-openai-key"
export MISTRAL_API_KEY="your-mistral-key"
export GEMINI_API_KEY="your-gemini-key"
# Custom Endpoints (optional)
export RSV_OPENAI_BASE_URL="https://api.openai.com/v1/chat/completions"
export RSV_OLLAMA_BASE_URL="http://localhost:11434/v1/chat/completions"
export RSV_MISTRAL_BASE_URL="https://api.mistral.ai/v1/chat/completions"
Provider-Specific Features
OpenAI Features
- Web Search: Available with
gpt-4o-search-preview
- Function Calling: Supported on compatible models
- Vision: GPT-4o supports image inputs
- JSON Mode: Structured output support
Example with web search:
curl "http://localhost:3017/partition/$USER/instance/research/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-search-preview",
"messages": [{"role": "user", "content": "Latest AI developments"}],
"web_search_options": {
"enabled": true,
"max_results": 5
}
}'
Ollama Features
- Local Models: No API key required
- Privacy: Data never leaves your machine
- Custom Models: Load any compatible model
- Performance: Direct local inference
Example with local model:
curl "http://localhost:3017/partition/$USER/instance/local/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [{"role": "user", "content": "Explain quantum computing"}]
}'
Multi-Provider Workflows
Seamless Model Switching
You can switch between providers within the same conversation while maintaining context:
import os
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3017/v1/partition/myuser/instance/research",
api_key=os.environ.get("OPENAI_API_KEY")
)
# Start with OpenAI
response1 = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Explain neural networks"}]
)
# Continue with Ollama (context is preserved)
response2 = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "What did we just discuss?"}]
)
# Switch to Mistral (still has context)
response3 = client.chat.completions.create(
model="mistral-large-2402",
messages=[{"role": "user", "content": "How does this relate to AI safety?"}]
)
Provider-Specific Use Cases
Development Workflow
# Use Ollama for quick local testing
curl -d '{"model": "llama3.2", "messages": [...]}' localhost:3017/...
# Use OpenAI for production queries
curl -d '{"model": "gpt-4", "messages": [...]}' localhost:3017/...
# Use Mistral for European compliance
curl -d '{"model": "mistral-large", "messages": [...]}' localhost:3017/...
Error Handling
Reservoir provides consistent error handling across all providers:
Common Error Responses
{
"error": {
"type": "invalid_request_error",
"message": "Invalid model specified",
"code": "model_not_found"
}
}
Provider-Specific Errors
- OpenAI: Rate limits, quota exceeded, invalid API key
- Ollama: Model not found, service unavailable
- Mistral: Authentication errors, model access restrictions
- Gemini: API quota limits, geographic restrictions
Performance Considerations
Provider Comparison
Provider | Latency | Cost | Privacy | Features |
---|---|---|---|---|
OpenAI | Medium | High | Cloud | Most comprehensive |
Ollama | Low | Free | Local | Basic, customizable |
Mistral | Medium | Medium | Cloud | European focus |
Gemini | Medium | Medium | Cloud | Google integration |
Optimization Tips
- Use Ollama for development: Faster iteration, no API costs
- Use OpenAI for production: Most reliable, feature-rich
- Use Mistral for compliance: European data residency
- Cache responses: Reduce API calls and costs
Custom Provider Integration
To add a new OpenAI-compatible provider:
-
Set the endpoint URL:
export RSV_CUSTOM_BASE_URL="https://api.custom-provider.com/v1/chat/completions"
-
Configure model routing (if needed):
#![allow(unused)] fn main() { // In your configuration match model_name { "custom-model" => "custom-provider", _ => "default-provider" } }
-
Test the integration:
curl "http://localhost:3017/partition/$USER/instance/test/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $CUSTOM_API_KEY" \ -d '{"model": "custom-model", "messages": [...]}'
Future Enhancements
Planned improvements for multi-provider support:
- Load Balancing: Distribute requests across multiple providers
- Failover: Automatic fallback to backup providers
- Cost Optimization: Route to cheapest provider based on request
- Model Capabilities: Automatic routing based on required features
- Custom Routing Rules: User-defined routing logic
Troubleshooting
Provider Connection Issues
Check provider availability:
# OpenAI
curl https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY"
# Ollama
curl http://localhost:11434/api/tags
# Mistral
curl https://api.mistral.ai/v1/models -H "Authorization: Bearer $MISTRAL_API_KEY"
Common solutions:
- Verify API keys are correctly set
- Check network connectivity
- Ensure provider services are running
- Validate model names and availability
Multi-provider support makes Reservoir a flexible foundation for AI applications, allowing you to choose the best provider for each use case while maintaining conversation continuity.
Token Management
Reservoir intelligently manages token limits to ensure optimal context enrichment while staying within model constraints. The system automatically calculates token usage, prioritizes the most relevant context, and truncates content when necessary to fit within API limits.
Context Token Management
Automatic Context Sizing
Reservoir dynamically adjusts context size based on:
- Model Token Limits: Respects each model's maximum context window
- Content Priority: Prioritizes most relevant and recent context
- Message Truncation: Intelligently cuts content when limits are exceeded
- Reserve Allocation: Maintains buffer for user input and model response
Token Calculation
The system estimates token usage using standard approximations:
- English Text: ~4 characters per token
- Code Content: ~3 characters per token (more tokens due to syntax)
- Special Characters: Variable token usage
- Embeddings: Not included in context token count
Context Building Strategy
flowchart TD A["User Message Arrives"] --> B["Calculate Available Tokens"] B --> C["Get Semantic Context"] C --> D["Get Recent History"] D --> E["Combine Context Sources"] E --> F{"Within Token Limit?"} F -->|Yes| G["Use Full Context"] F -->|No| H["Prioritize and Truncate"] H --> I["Recent Messages Priority"] I --> J["High Similarity Priority"] J --> K["Truncate Oldest/Lowest Score"] K --> G G --> L["Send to Model"]
Token Limits by Model
OpenAI Models
Model | Context Window | Reservoir Reserve | Available for Context |
---|---|---|---|
GPT-3.5-turbo | 4,096 tokens | 1,024 tokens | ~3,000 tokens |
GPT-4 | 8,192 tokens | 2,048 tokens | ~6,000 tokens |
GPT-4-turbo | 128,000 tokens | 8,000 tokens | ~120,000 tokens |
GPT-4o | 128,000 tokens | 8,000 tokens | ~120,000 tokens |
Local Models (Ollama)
Model | Context Window | Reservoir Reserve | Available for Context |
---|---|---|---|
Llama 3.1 8B | 32,768 tokens | 2,048 tokens | ~30,000 tokens |
Llama 3.1 70B | 32,768 tokens | 2,048 tokens | ~30,000 tokens |
Mistral 7B | 32,768 tokens | 2,048 tokens | ~30,000 tokens |
CodeLlama | 16,384 tokens | 1,024 tokens | ~15,000 tokens |
Context Prioritization
Priority Order
When token limits are exceeded, Reservoir prioritizes context in this order:
- User's Current Message: Always included (highest priority)
- Recent History: Last 15 messages from same partition/instance
- High Similarity Matches: Messages with similarity score > 0.85
- Synapse Connections: Messages connected via SYNAPSE relationships
- Older Context: Historical messages (first to be truncated)
Similarity-Based Prioritization
Context is ranked by relevance:
Priority Score = (Similarity Score × 0.7) + (Recency Score × 0.3)
Where:
- Similarity Score: 0.0-1.0 from semantic search
- Recency Score: 0.0-1.0 based on message age
Truncation Strategy
When content must be truncated:
- Message-Level Truncation: Remove entire messages (preserves coherence)
- LIFO for Semantic: Last-In-First-Out for semantic matches
- FIFO for Recent: First-In-First-Out for chronological history
- Preserve Pairs: Keep user/assistant pairs together when possible
Configuration Options
Context Size Limits
Configure via environment variables or config file:
# Set maximum semantic context messages
reservoir config --set semantic_context_size=20
# Set recent history limit
reservoir config --set recent_context_size=15
# Set token reserve buffer
reservoir config --set token_reserve=2048
Model-Specific Overrides
# In reservoir.toml
[models.gpt-4-turbo]
max_context_tokens = 120000
reserve_tokens = 8000
semantic_context_size = 50
[models.gpt-3.5-turbo]
max_context_tokens = 4096
reserve_tokens = 1024
semantic_context_size = 10
Token Usage Monitoring
Built-in Monitoring
Reservoir automatically tracks:
- Input Tokens: Context + user message tokens
- Reserve Usage: How much buffer is being used
- Truncation Events: When content is cut due to limits
- Model Utilization: Percentage of context window used
Usage Examples
# View recent messages with estimated token usage
reservoir view 10 | while read -r line; do
echo "$line (est. tokens: $((${#line}/4)))"
done
# Estimate total context size
TOTAL_CHARS=$(reservoir view 15 | wc -c)
echo "Estimated tokens: $((TOTAL_CHARS/4))"
# Check if context might be truncated for a model
CONTEXT_SIZE=$(($(reservoir view 15 | wc -c) / 4))
echo "Context tokens: $CONTEXT_SIZE"
echo "Fits in GPT-3.5: $([ $CONTEXT_SIZE -lt 3000 ] && echo 'Yes' || echo 'No')"
Optimization Strategies
Reduce Context Size
Adjust Semantic Context
# Reduce semantic matches
reservoir config --set semantic_context_size=10
# Increase similarity threshold (fewer matches)
# Note: This requires code modification currently
Limit Recent History
# Reduce recent message count
reservoir config --set recent_context_size=8
Improve Context Quality
Use Higher Similarity Threshold
- Fewer but more relevant semantic matches
- Better context quality with less noise
- Requires code-level configuration changes
Partition Strategy
- Use specific partitions for focused contexts
- Separate unrelated discussions
- Improves relevance within token limits
# Focused partition for coding discussions
echo "Python async/await question" | reservoir ingest --partition alice --instance coding
# Separate partition for general chat
echo "Weather discussion" | reservoir ingest --partition alice --instance general
Model-Specific Considerations
Small Context Models (GPT-3.5)
Optimization Strategy:
- Prioritize recent messages heavily
- Limit semantic context to top 5-10 matches
- Use aggressive truncation
- Consider shorter message summaries
# Configuration for small context models
reservoir config --set semantic_context_size=5
reservoir config --set recent_context_size=8
Large Context Models (GPT-4-turbo)
Utilization Strategy:
- Include extensive semantic context
- Preserve longer conversation history
- Enable deeper synapse exploration
- Allow for more comprehensive context
# Configuration for large context models
reservoir config --set semantic_context_size=30
reservoir config --set recent_context_size=25
Advanced Token Management
Dynamic Context Adjustment
Reservoir can adjust context based on content type:
Code-Heavy Contexts: Reduce character-to-token ratio assumption Natural Language: Use standard ratios Mixed Content: Apply weighted calculations
Future Enhancements
Planned Features:
- Semantic Summarization: Summarize older context instead of truncating
- Token-Aware Similarity: Consider token cost in similarity ranking
- Model-Aware Optimization: Automatic settings per model
- Context Compression: Compress historical context intelligently
Custom Token Strategies
Per-Partition Settings
# Different strategies for different use cases
reservoir config --set partitions.coding.semantic_context_size=20
reservoir config --set partitions.research.recent_context_size=30
Content-Type Awareness
# Adjust for code vs text heavy partitions
reservoir config --set partitions.coding.token_multiplier=1.3
reservoir config --set partitions.writing.token_multiplier=0.9
Troubleshooting Token Issues
Common Problems
Context Too Large
# Symptoms: API errors about token limits
# Solution: Reduce context sizes
reservoir config --set semantic_context_size=10
reservoir config --set recent_context_size=5
Context Too Small
# Symptoms: Poor context quality, missing relevant information
# Solution: Increase context sizes (if model supports it)
reservoir config --set semantic_context_size=25
reservoir config --set recent_context_size=20
Frequent Truncation
# Symptoms: Important context being cut off
# Solution: Use larger context model or adjust priorities
Diagnostic Commands
# Estimate current context size
SEMANTIC_SIZE=$(reservoir search --semantic "test" | wc -c)
RECENT_SIZE=$(reservoir view 15 | wc -c)
TOTAL_SIZE=$((SEMANTIC_SIZE + RECENT_SIZE))
echo "Total context estimate: $((TOTAL_SIZE/4)) tokens"
# Check truncation frequency
# (This would require log analysis)
grep -i "truncat" /var/log/reservoir.log | wc -l
Token management in Reservoir ensures optimal AI performance by providing the right amount of relevant context while respecting model limitations, creating an intelligent balance between comprehensive memory and computational efficiency.
Partitioning & Organization
Reservoir uses a flexible partitioning system to organize your conversations and data. This two-level hierarchy enables you to separate different contexts, users, projects, or topics while maintaining intelligent context enrichment within each boundary.
Partitioning Concepts
Two-Level Hierarchy
Reservoir organizes data using two levels:
- Partition: The top-level organizational boundary
- Instance: The sub-level within each partition
partition_name/
├── instance_1/
├── instance_2/
└── instance_3/
Default Organization
When no partition is specified, Reservoir uses:
- Partition:
"default"
- Instance:
"default"
# These are equivalent
reservoir view 10
reservoir view --partition default --instance default 10
Partition Use Cases
User Separation
Separate different users or personas:
alice/
├── personal/ # Personal conversations
├── work/ # Work-related discussions
└── research/ # Research and learning
bob/
├── coding/ # Programming discussions
├── writing/ # Content creation
└── planning/ # Project planning
Usage Examples:
# Alice's personal conversations
echo "What's the weather like?" | reservoir ingest --partition alice --instance personal
# Bob's coding discussions
echo "How do I implement OAuth2?" | reservoir ingest --partition bob --instance coding
# View Alice's work conversations
reservoir view --partition alice --instance work 15
# Search Bob's coding history
reservoir search --partition bob --instance coding --semantic "database optimization"
Project Organization
Organize by projects or domains:
webapp_project/
├── backend/ # Backend development
├── frontend/ # Frontend development
├── database/ # Database design
└── deployment/ # DevOps and deployment
mobile_app/
├── ios/ # iOS development
├── android/ # Android development
├── api/ # API integration
└── testing/ # QA and testing
Usage Examples:
# Backend development discussions
echo "Should we use microservices or monolith?" | reservoir ingest --partition webapp_project --instance backend
# Mobile API integration
echo "API authentication best practices" | reservoir ingest --partition mobile_app --instance api
# Search across web project
reservoir search --partition webapp_project --semantic "authentication"
# View mobile testing discussions
reservoir view --partition mobile_app --instance testing 20
Team Collaboration
Organize by teams or functional areas:
engineering/
├── architecture/ # System architecture
├── reviews/ # Code reviews
├── planning/ # Sprint planning
└── incidents/ # Incident response
product/
├── requirements/ # Requirements gathering
├── research/ # User research
├── roadmap/ # Product roadmap
└── metrics/ # Analytics and metrics
Usage Examples:
# Architecture discussions
echo "Microservices vs serverless trade-offs" | reservoir ingest --partition engineering --instance architecture
# Product research notes
echo "User feedback on new feature" | reservoir ingest --partition product --instance research
# Search engineering incidents
reservoir search --partition engineering --instance incidents "database"
# View product roadmap discussions
reservoir view --partition product --instance roadmap 10
Context Isolation
How Partitioning Affects Context
Reservoir's context enrichment respects partition boundaries:
- Same Partition/Instance: Full context sharing
- Same Partition, Different Instance: Limited context sharing
- Different Partition: Complete isolation
Context Rules:
# These will share context with each other
reservoir ingest --partition alice --instance coding "How do I use async/await?"
reservoir ingest --partition alice --instance coding "What about error handling?"
# This will have separate context
reservoir ingest --partition alice --instance personal "What should I cook for dinner?"
# This will be completely isolated
reservoir ingest --partition bob --instance coding "How do I use async/await?"
Privacy and Separation
Partitions provide data privacy:
- Search Isolation: Searches are scoped to partitions
- Context Isolation: AI responses don't leak across partitions
- Export Control: Can selectively export partition data
- Access Control: Enables future per-partition access controls
Partition Management
Creating Partitions
Partitions are created automatically when first used:
# Creates "newproject" partition with "planning" instance
echo "Project kickoff meeting notes" | reservoir ingest --partition newproject --instance planning
Viewing Partition Data
# View messages from specific partition/instance
reservoir view --partition alice --instance coding 15
# View without specifying instance (shows from all instances in partition)
reservoir view --partition alice 25
# Search within partition
reservoir search --partition engineering --semantic "deployment strategy"
# Search within specific instance
reservoir search --partition engineering --instance architecture --semantic "microservices"
Partition Listing
Currently, there's no direct command to list all partitions, but you can discover them through data export and analysis:
# Export and analyze partition distribution
reservoir export | jq -r '.[] | .partition' | sort | uniq -c | sort -nr
# Find all instances within a partition
reservoir export | jq -r '.[] | select(.partition=="alice") | .instance' | sort | uniq -c
Advanced Partitioning Strategies
Time-Based Partitioning
Organize by time periods:
conversations_2024/
├── january/
├── february/
└── march/
conversations_2023/
├── q1/
├── q2/
├── q3/
└── q4/
# Current month's discussions
MONTH=$(date +%B | tr '[:upper:]' '[:lower:]')
echo "Today's important insight" | reservoir ingest --partition conversations_2024 --instance $MONTH
Topic-Based Partitioning
Organize by subject matter:
machine_learning/
├── theory/ # Theoretical discussions
├── implementation/ # Code and implementation
├── papers/ # Research papers
└── experiments/ # Experimental results
web_development/
├── frontend/ # Frontend technologies
├── backend/ # Backend systems
├── databases/ # Database design
└── devops/ # Operations and deployment
Environment-Based Partitioning
Separate by environment or context:
development/
├── local/ # Local development
├── testing/ # Testing environment
├── staging/ # Staging discussions
└── production/ # Production issues
personal/
├── learning/ # Educational content
├── projects/ # Personal projects
├── notes/ # General notes
└── ideas/ # Ideas and brainstorming
Best Practices
Naming Conventions
- Use Lowercase: Partition and instance names should be lowercase
- Use Underscores: Separate words with underscores:
machine_learning
- Be Descriptive: Choose clear, meaningful names
- Keep Consistent: Maintain consistent naming across partitions
# Good naming
reservoir ingest --partition web_development --instance frontend
reservoir ingest --partition machine_learning --instance deep_learning
# Avoid these patterns
reservoir ingest --partition WebDev --instance FE # Mixed case, abbreviated
reservoir ingest --partition "web development" --instance "front end" # Spaces
Partition Strategy
- Plan Your Structure: Design partition hierarchy before heavy usage
- Balance Granularity: Too many partitions reduce context benefits
- Consider Growth: Design for future expansion
- Document Structure: Keep a record of partition purposes
Migration Between Partitions
Currently, partition migration requires export/import workflow:
# Export messages from one partition
reservoir export | jq '.[] | select(.partition=="old_partition")' > old_partition.json
# Edit JSON to change partition/instance names
sed 's/"partition":"old_partition"/"partition":"new_partition"/g' old_partition.json > new_partition.json
# Import to new structure
reservoir import new_partition.json
# Verify migration
reservoir view --partition new_partition 10
Integration with Other Features
Search Scoping
All search operations can be scoped to partitions:
# Search across all data
reservoir search --semantic "error handling"
# Search within partition
reservoir search --partition engineering --semantic "error handling"
# Search within specific instance
reservoir search --partition engineering --instance backend --semantic "error handling"
Data Export
Partitioning enables selective data export:
# Export everything
reservoir export > all_data.json
# Export specific partition (requires jq processing)
reservoir export | jq '.[] | select(.partition=="alice")' > alice_data.json
# Export specific instance
reservoir export | jq '.[] | select(.partition=="alice" and .instance=="coding")' > alice_coding.json
Context Enrichment
Partitioning directly affects how context is built:
- Semantic Search: Limited to same partition/instance
- Recent History: Limited to same partition/instance
- Synapse Relationships: Respect partition boundaries
- Token Limits: Applied per partition context
This partitioning system makes Reservoir suitable for multi-user environments, project-based work, and any scenario where logical separation of conversation contexts is beneficial.
Web Search Integration
Reservoir supports web search integration for models that provide this capability, enabling AI assistants to access real-time information from the internet while maintaining the benefits of conversational memory and context enrichment.
Overview
Web search integration allows AI models to:
- Access Current Information: Get up-to-date data not in training sets
- Verify Facts: Cross-reference stored conversations with current sources
- Expand Context: Combine web results with Reservoir's semantic memory
- Enhanced Research: Build knowledge from both conversation history and web sources
Supported Models
OpenAI Models with Web Search
gpt-4o-search-preview
: OpenAI's experimental web search model- Future Models: Additional web-enabled models as they become available
Local Models
Web search capability depends on the underlying model's features:
- Some Ollama models may support web search plugins
- Custom implementations can be integrated via the API
Usage
Basic Web Search Request
curl -X POST "http://localhost:3017/v1/partition/research/instance/current_events/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-search-preview",
"messages": [
{
"role": "user",
"content": "What are the latest developments in renewable energy technology?"
}
],
"web_search_options": {
"enabled": true
}
}'
Web Search with Context
{
"model": "gpt-4o-search-preview",
"messages": [
{
"role": "user",
"content": "Based on our previous discussion about solar panels, what are the newest efficiency improvements announced this month?"
}
],
"web_search_options": {
"enabled": true,
"max_results": 5,
"search_depth": "recent"
}
}
Web Search Options
Configuration Parameters
{
"web_search_options": {
"enabled": true,
"max_results": 10,
"search_depth": "comprehensive",
"time_range": "recent",
"include_sources": true,
"filter_domains": ["example.com", "trusted-source.org"]
}
}
Parameter Details
Parameter | Type | Description | Default |
---|---|---|---|
enabled | boolean | Enable/disable web search | false |
max_results | integer | Maximum search results to consider | 5 |
search_depth | string | "quick" , "standard" , "comprehensive" | "standard" |
time_range | string | "recent" , "week" , "month" , "any" | "any" |
include_sources | boolean | Include source URLs in response | true |
filter_domains | array | Restrict to specific domains | [] |
How Web Search Works with Reservoir
Enhanced Context Flow
flowchart TD A["User Query Arrives"] --> B["Extract Search Terms"] B --> C["Reservoir Context Enrichment"] C --> D["Semantic Search (Local)"] C --> E["Recent History (Local)"] D --> F["Combine Local Context"] E --> F F --> G["Web Search (if enabled)"] G --> H["Merge Web Results with Context"] H --> I["Send Enriched Request to AI"] I --> J["AI Response with Web Sources"] J --> K["Store Response in Reservoir"]
Context Prioritization
When web search is enabled, context is prioritized:
- User's Current Message: Always highest priority
- Web Search Results: Real-time information
- Semantic Context: Relevant past conversations
- Recent History: Chronological conversation flow
- Additional Context: Synapse connections
Example Workflows
Research Assistant
import openai
openai.api_base = "http://localhost:3017/v1/partition/research/instance/ai_trends"
# Initial research query
response = openai.ChatCompletion.create(
model="gpt-4o-search-preview",
messages=[
{
"role": "user",
"content": "What are the latest breakthroughs in large language models?"
}
],
web_search_options={
"enabled": True,
"time_range": "recent",
"max_results": 8
}
)
print(response.choices[0].message.content)
# Follow-up question (benefits from both web search and conversation history)
response = openai.ChatCompletion.create(
model="gpt-4o-search-preview",
messages=[
{
"role": "user",
"content": "How do these breakthroughs compare to what we discussed last week about model efficiency?"
}
],
web_search_options={"enabled": True}
)
News Analysis
# Get latest information
curl -X POST "http://localhost:3017/v1/partition/news/instance/tech/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-search-preview",
"messages": [
{
"role": "user",
"content": "Summarize today'\''s major technology news"
}
],
"web_search_options": {
"enabled": true,
"time_range": "recent",
"max_results": 10
}
}'
# Follow up with context
curl -X POST "http://localhost:3017/v1/partition/news/instance/tech/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-search-preview",
"messages": [
{
"role": "user",
"content": "How does this relate to the trends we'\''ve been tracking this month?"
}
],
"web_search_options": {
"enabled": true
}
}'
Response Format
With Web Sources
{
"choices": [
{
"message": {
"role": "assistant",
"content": "Based on recent reports and our previous discussions about solar technology, here are the latest efficiency improvements:\n\n## Recent Developments\n\n1. **Perovskite-Silicon Tandem Cells**: New research published this week shows efficiency rates reaching 33.7%...\n\n2. **Quantum Dot Technology**: Scientists have achieved 15% efficiency improvements...\n\nThese developments build on your earlier questions about cost-effectiveness, and the new efficiency gains should address the concerns you raised about ROI timelines.\n\n### Sources:\n- Nature Energy, December 2024\n- MIT Technology Review, December 2024\n- Previous conversation: Solar panel efficiency discussion"
},
"finish_reason": "stop",
"index": 0
}
],
"web_sources": [
{
"title": "Breakthrough in Perovskite Solar Cell Efficiency",
"url": "https://www.nature.com/articles/...",
"snippet": "Researchers achieve record-breaking 33.7% efficiency...",
"date": "2024-12-15"
}
]
}
Configuration
Environment Variables
# Enable web search by default
export RSV_WEB_SEARCH_ENABLED=true
# Configure search limits
export RSV_WEB_SEARCH_MAX_RESULTS=5
export RSV_WEB_SEARCH_TIME_RANGE=recent
# API keys for search providers (if needed)
export SEARCH_API_KEY="your-search-api-key"
Per-Request Configuration
Web search can be enabled/disabled per request:
# Enable for research
response = openai.ChatCompletion.create(
model="gpt-4o-search-preview",
messages=[{"role": "user", "content": "Current AI research trends"}],
web_search_options={"enabled": True}
)
# Disable for private discussions
response = openai.ChatCompletion.create(
model="gpt-4o-search-preview",
messages=[{"role": "user", "content": "Help me plan my personal project"}],
web_search_options={"enabled": False}
)
Use Cases
When to Enable Web Search
✅ Good Use Cases:
- Current events and news
- Latest research and publications
- Real-time data (stock prices, weather, etc.)
- Technical documentation updates
- Recent product releases or updates
❌ Avoid Web Search For:
- Personal conversations
- Private project discussions
- Creative writing tasks
- Code debugging (unless looking for new solutions)
- Historical analysis (where training data is sufficient)
Partition Strategies
# News and current events
/v1/partition/news/instance/tech/chat/completions
# Research and academic work
/v1/partition/research/instance/ai_papers/chat/completions
# Market analysis
/v1/partition/business/instance/market_intel/chat/completions
# Personal assistant (web search disabled)
/v1/partition/personal/instance/planning/chat/completions
Performance Considerations
Latency Impact
- Web Search Enabled: +1-3 seconds for search and processing
- Web Search Disabled: Standard Reservoir latency (200-500ms)
- Caching: Some web results may be cached for performance
Cost Implications
- Web search may incur additional API costs
- Consider rate limiting for high-volume applications
- Balance between information freshness and cost
Token Usage
Web search results count toward token limits:
- Search results are included in context token calculation
- May reduce available space for conversation history
- Automatic truncation applies when limits are exceeded
Troubleshooting
Web Search Not Working
# Check model support
reservoir config --get web_search_enabled
# Verify API keys
echo $OPENAI_API_KEY | wc -c # Should be > 0
# Test with minimal request
curl -X POST "http://localhost:3017/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o-search-preview",
"messages": [{"role": "user", "content": "What is today'\''s date?"}],
"web_search_options": {"enabled": true}
}'
Search Quality Issues
- Refine Search Terms: Use more specific queries
- Adjust Time Range: Narrow to recent results for current topics
- Filter Domains: Restrict to authoritative sources
- Combine with Context: Let Reservoir's memory provide additional context
Future Enhancements
Planned Features
- Custom Search Providers: Integration with different search APIs
- Search Result Caching: Store web results for reuse
- Source Ranking: Prioritize trusted sources
- Search History: Track and learn from search patterns
Integration Possibilities
- Domain-Specific Search: Academic papers, patents, documentation
- Real-Time Data: APIs for live information
- Multi-Modal Search: Images, videos, and documents
- Knowledge Graphs: Structured information integration
Web search integration transforms Reservoir from a conversational memory system into a comprehensive knowledge assistant that combines the depth of accumulated conversations with the breadth of current web information.
Import/Export
Reservoir provides comprehensive import and export capabilities for backing up your conversation data, migrating between systems, and integrating with external tools. The system exports data in JSON format, preserving all message metadata, embeddings, and relationships.
Export Functionality
Basic Export
Export all conversation data to JSON format:
# Export to stdout
reservoir export
# Save to file
reservoir export > conversations.json
# Export with timestamp
reservoir export > backup_$(date +%Y%m%d_%H%M%S).json
Export Format
Each exported message includes complete metadata:
[
{
"id": null,
"trace_id": "550e8400-e29b-41d4-a716-446655440000",
"partition": "alice",
"instance": "coding",
"content": "How do I implement error handling in async functions?",
"role": "user",
"embedding": [0.123, -0.456, 0.789, ...],
"url": null,
"timestamp": 1705315800000
},
{
"id": null,
"trace_id": "550e8400-e29b-41d4-a716-446655440001",
"partition": "alice",
"instance": "coding",
"content": "Here are several approaches to error handling in async functions...",
"role": "assistant",
"embedding": [0.234, -0.567, 0.890, ...],
"url": null,
"timestamp": 1705315815000
}
]
What's Included in Export
- Complete Message Data: All message content and metadata
- Vector Embeddings: Full embedding vectors for similarity search
- Partition Organization: Partition and instance information
- Conversation Structure: Trace IDs linking user/assistant pairs
- Timestamps: Precise timing information
- Roles: User, assistant, and system message roles
Export Use Cases
Data Backup
# Daily backup
reservoir export > "backup_$(date +%Y%m%d).json"
# Compressed backup
reservoir export | gzip > "backup_$(date +%Y%m%d).json.gz"
Migration
# Export from source system
reservoir export > migration_data.json
# Transfer to new system
scp migration_data.json user@newserver:/path/to/reservoir/
Analysis
# Export for external analysis
reservoir export | jq '.[] | select(.role=="user")' > user_messages.json
# Export specific time range
reservoir export | jq '.[] | select(.timestamp > 1705315800000)' > recent_messages.json
Import Functionality
Basic Import
Import conversation data from JSON files:
# Import from file
reservoir import conversations.json
# Import from compressed backup
gunzip -c backup_20240115.json.gz | reservoir import /dev/stdin
Import Behavior
Data Validation
- Validates JSON format and structure
- Checks required fields (trace_id, partition, instance, role, content)
- Verifies embedding vector format and dimensions
Duplicate Handling
- Skips messages with duplicate trace_id and role combinations
- Preserves existing data integrity
- Logs skipped duplicates for review
Relationship Reconstruction
- Automatically rebuilds RESPONDED_WITH relationships
- Recreates HAS_EMBEDDING connections
- Maintains partition/instance boundaries
Import Process
- File Reading: Load and parse JSON data
- Validation: Check data format and completeness
- Message Creation: Create MessageNode entries
- Embedding Processing: Store vector embeddings
- Relationship Building: Establish graph relationships
- Index Updates: Update vector indices
Import Examples
Complete System Restore
# Stop Reservoir service
systemctl stop reservoir
# Clear existing data (if needed)
# WARNING: This is destructive!
# Import backup
reservoir import full_backup_20240115.json
# Verify import
reservoir view 10
Selective Import
# Import specific partition data
cat full_backup.json | jq '.[] | select(.partition=="alice")' > alice_data.json
reservoir import alice_data.json
# Import recent messages only
cat backup.json | jq '.[] | select(.timestamp > 1705315800000)' > recent.json
reservoir import recent.json
Advanced Export/Import
Filtering Exports
By Partition
# Export specific user's data
reservoir export | jq '.[] | select(.partition=="alice")' > alice_conversations.json
By Time Range
# Export last 24 hours
YESTERDAY=$(date -d '1 day ago' +%s)000
reservoir export | jq ".[] | select(.timestamp > $YESTERDAY)" > recent_conversations.json
By Role
# Export only user messages
reservoir export | jq '.[] | select(.role=="user")' > user_questions.json
# Export only assistant responses
reservoir export | jq '.[] | select(.role=="assistant")' > ai_responses.json
By Content
# Export messages containing specific terms
reservoir export | jq '.[] | select(.content | test("python|programming"; "i"))' > programming_discussions.json
Data Transformation
Convert to CSV
reservoir export | jq -r '.[] | [.timestamp, .partition, .instance, .role, .content] | @csv' > conversations.csv
Extract Text Only
reservoir export | jq -r '.[] | .content' > all_messages.txt
Create Markdown Format
reservoir export | jq -r '.[] | "## " + (.timestamp | tostring) + " (" + .role + ")\n\n" + .content + "\n"' > conversations.md
Batch Operations
Multiple File Import
# Import multiple backup files
for file in backup_*.json; do
echo "Importing $file..."
reservoir import "$file"
done
Incremental Backup Strategy
#!/bin/bash
# Incremental backup script
BACKUP_DIR="/backup/reservoir"
LAST_BACKUP_TIME=$(cat "$BACKUP_DIR/.last_backup" 2>/dev/null || echo "0")
CURRENT_TIME=$(date +%s)000
# Export messages since last backup
reservoir export | jq ".[] | select(.timestamp > $LAST_BACKUP_TIME)" > "$BACKUP_DIR/incremental_$(date +%Y%m%d_%H%M%S).json"
# Update last backup time
echo "$CURRENT_TIME" > "$BACKUP_DIR/.last_backup"
Data Migration Workflows
System Migration
Complete Migration
# Source system
reservoir export > complete_migration.json
# Target system
reservoir import complete_migration.json
# Verify migration
SOURCE_COUNT=$(jq length complete_migration.json)
TARGET_COUNT=$(reservoir export | jq length)
echo "Source: $SOURCE_COUNT messages, Target: $TARGET_COUNT messages"
Partition Migration
# Migrate specific user to new system
reservoir export | jq '.[] | select(.partition=="alice")' > alice_migration.json
# On target system
reservoir import alice_migration.json
# Verify partition migration
reservoir view --partition alice 10
Cross-System Integration
Export for External Processing
# Export for machine learning analysis
reservoir export | jq '.[] | {content: .content, embedding: .embedding}' > ml_dataset.json
# Export conversation pairs for training
reservoir export | jq -r 'group_by(.trace_id) | .[] | select(length == 2) | {user: .[0].content, assistant: .[1].content}' > conversation_pairs.json
Import from External Sources
Convert external data to Reservoir format:
{
"trace_id": "external-001",
"partition": "imported",
"instance": "external_system",
"content": "Question from external system",
"role": "user",
"embedding": [], // Will be generated if empty
"url": null,
"timestamp": 1705315800000
}
Data Integrity and Verification
Export Verification
# Check export completeness
EXPORTED_COUNT=$(reservoir export | jq length)
echo "Exported $EXPORTED_COUNT messages"
# Verify embeddings
EMBEDDED_COUNT=$(reservoir export | jq '[.[] | select(.embedding | length > 0)] | length')
echo "$EMBEDDED_COUNT messages have embeddings"
# Check partition distribution
reservoir export | jq -r '.[] | .partition' | sort | uniq -c
Import Validation
# Validate JSON format before import
jq . backup.json > /dev/null && echo "Valid JSON" || echo "Invalid JSON"
# Check required fields
jq '.[] | select(.trace_id and .partition and .instance and .role and .content)' backup.json | jq length
# Verify import success
reservoir view 10
reservoir search --semantic "test query"
Performance Considerations
Large Dataset Handling
Streaming Export
# For very large datasets, process in chunks
reservoir export | jq -c '.[]' | split -l 1000 - chunk_
# Import chunks
for chunk in chunk_*; do
jq -s '.' "$chunk" | reservoir import /dev/stdin
done
Compression
# Compress exports to save space
reservoir export | gzip > backup.json.gz
# Decompress for import
gunzip -c backup.json.gz | reservoir import /dev/stdin
Network Transfer
Efficient Transfer
# Direct transfer without intermediate files
ssh source_server 'reservoir export' | reservoir import /dev/stdin
# Compressed transfer
ssh source_server 'reservoir export | gzip' | gunzip | reservoir import /dev/stdin
Troubleshooting
Common Issues
Import Failures
# Check JSON validity
jq . import_file.json
# Verify required fields
jq '.[] | keys' import_file.json | head -5
# Check for duplicate trace_ids
jq -r '.[] | .trace_id' import_file.json | sort | uniq -d
Missing Embeddings
# Check embedding status
reservoir export | jq '[.[] | select(.embedding | length == 0)] | length'
# Regenerate embeddings if needed
reservoir replay
Partition Issues
# Check partition consistency
reservoir export | jq -r '.[] | "\(.partition)/\(.instance)"' | sort | uniq -c
# View messages in specific partition
reservoir view --partition problematic_partition 10
Recovery Procedures
Partial Import Recovery
# If import fails partway through, check what was imported
IMPORTED_COUNT=$(reservoir export | jq length)
TOTAL_COUNT=$(jq length backup.json)
echo "Imported $IMPORTED_COUNT of $TOTAL_COUNT messages"
# Import remaining messages (requires identifying what's missing)
Data Corruption Recovery
# Export current state
reservoir export > current_state.json
# Restore from known good backup
reservoir import good_backup.json
# Compare and merge if needed
The import/export system provides a robust foundation for data management, enabling seamless backup, migration, and integration workflows while maintaining complete data fidelity and system integrity.
Local Deployment
This guide covers setting up Reservoir for local development and production use on your local machine.
Prerequisites
Before deploying Reservoir locally, ensure you have the following installed:
- Rust (latest stable version)
- Docker (for Neo4j database)
- Git for version control
Quick Setup
Step 1: Clone the Repository
git clone https://github.com/divanvisagie/reservoir.git
cd reservoir
Step 2: Start Neo4j Database
You have several options for running Neo4j locally:
Option A: Docker Compose (Recommended)
docker-compose up -d
This starts Neo4j on the default bolt://localhost:7687
with the credentials defined in the docker-compose file.
Option B: Docker Manual Setup
docker run \
--name neo4j \
-p7474:7474 -p7687:7687 \
-d \
-v $HOME/neo4j/data:/data \
-v $HOME/neo4j/logs:/logs \
-v $HOME/neo4j/import:/var/lib/neo4j/import \
-v $HOME/neo4j/plugins:/plugins \
--env NEO4J_AUTH=neo4j/password \
neo4j:latest
Option C: Homebrew (macOS Service)
If you prefer to run Neo4j as a permanent background service:
brew install neo4j
brew services start neo4j
This will start Neo4j on bolt://localhost:7687
and ensure it runs automatically when your computer boots.
Step 3: Configure Environment Variables
Create a .env
file in the project root or export the following environment variables:
# Server Configuration
RESERVOIR_PORT=3017
RESERVOIR_HOST=127.0.0.1
# Database Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
# API Keys (required for respective providers)
OPENAI_API_KEY=sk-your-openai-key-here
MISTRAL_API_KEY=your-mistral-key-here
GEMINI_API_KEY=your-gemini-key-here
# Custom Provider URLs (optional)
RSV_OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
RSV_OLLAMA_BASE_URL=http://localhost:11434/v1/chat/completions
RSV_MISTRAL_BASE_URL=https://api.mistral.ai/v1/chat/completions
Note: Most environment variables have sensible defaults. Only the API keys for your chosen providers are required.
Step 4: Build and Run
Manual Execution
# Build the project
cargo build --release
# Run Reservoir
cargo run -- start
Using Make Commands
# Build the release binary
make main
# Run for development (with auto-reload)
make dev
# Run normally
make run
Reservoir will now be available at http://localhost:3017
.
Service Installation (macOS)
For a more permanent setup, you can install Reservoir as a macOS LaunchAgent service.
Install the Service
make install-service
This command:
- Copies the LaunchAgent plist to
~/Library/LaunchAgents/
- Loads the service using
launchctl
- Starts Reservoir automatically in the background
Service Management
Check service status:
launchctl list | grep reservoir
View service logs:
tail -f /tmp/reservoir.log
tail -f /tmp/reservoir.err
Manually start/stop the service:
# Start
launchctl start com.sectorflabs.reservoir
# Stop
launchctl stop com.sectorflabs.reservoir
Uninstall the Service
make uninstall-service
This removes the service and cleans up all related files.
Verification
Test the Installation
-
Check if Reservoir is running:
curl http://localhost:3017/health
-
Test with a simple API call:
curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{ "model": "gpt-4", "messages": [ { "role": "user", "content": "Hello, Reservoir!" } ] }'
-
Run the test suite:
./hurl/test.sh
Check Neo4j Connection
Verify that Neo4j is accessible:
# Check Neo4j web interface
open http://localhost:7474
# Test connection with curl
curl -u neo4j:password http://localhost:7474/db/data/
Configuration Options
Database Configuration
Variable | Default | Description |
---|---|---|
NEO4J_URI | bolt://localhost:7687 | Neo4j connection URI |
NEO4J_USERNAME | neo4j | Database username |
NEO4J_PASSWORD | password | Database password |
Server Configuration
Variable | Default | Description |
---|---|---|
RESERVOIR_PORT | 3017 | HTTP server port |
RESERVOIR_HOST | 127.0.0.1 | HTTP server host |
Provider Configuration
Variable | Default | Description |
---|---|---|
RSV_OPENAI_BASE_URL | https://api.openai.com/v1/chat/completions | OpenAI API endpoint |
RSV_OLLAMA_BASE_URL | http://localhost:11434/v1/chat/completions | Ollama API endpoint |
RSV_MISTRAL_BASE_URL | https://api.mistral.ai/v1/chat/completions | Mistral API endpoint |
Troubleshooting
Common Issues
Port Already in Use:
# Check what's using port 3017
lsof -i :3017
# Use a different port
export RESERVOIR_PORT=3018
Neo4j Connection Failed:
# Check if Neo4j is running
docker ps | grep neo4j
# Check Neo4j logs
docker logs neo4j
Permission Issues (macOS Service):
# Ensure the binary path is correct in the plist
ls -la ~/.cargo/bin/reservoir
# Update the path in scripts/com.sectorflabs.reservoir.plist if needed
API Key Issues:
# Verify your API key is set
echo $OPENAI_API_KEY
# Test the key directly with OpenAI
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"
Performance Tuning
For better performance in local deployment:
-
Increase Neo4j memory allocation:
# In docker-compose.yml, add: NEO4J_dbms_memory_heap_initial__size=512m NEO4J_dbms_memory_heap_max__size=2G
-
Use SSD storage for Neo4j data:
# Mount Neo4j data on fast storage -v /path/to/fast/storage:/data
-
Optimize connection pooling:
# Add to .env NEO4J_MAX_CONNECTIONS=20 NEO4J_CONNECTION_TIMEOUT=30s
Next Steps
After successful local deployment:
- Configure Environment Variables
- Set up Production Deployment
- Learn about API Usage
- Explore Chat Gipitty Integration
Your Reservoir instance is now ready for local development and testing!
Common Issues
This page covers the most common issues you might encounter when using Reservoir and how to solve them.
Server Issues
Server Not Starting
Symptoms:
- Cannot connect to
http://localhost:3017
- Connection refused errors
- Server fails to start
Solutions:
Check Neo4j
Ensure Neo4j is running and accessible:
# Check if Neo4j is running
systemctl status neo4j # Linux
brew services list | grep neo4j # macOS
# Start Neo4j if not running
systemctl start neo4j # Linux
brew services start neo4j # macOS
Port Conflicts
Default port 3017 might be in use:
# Check what's using port 3017
lsof -i :3017
# Use a different port
RESERVOIR_PORT=3018 cargo run -- start
Environment Variables
If using direnv, make sure it's loaded:
# Check if direnv is working
direnv status
# Allow direnv for current directory
direnv allow
Server Starts But Returns Errors
Check Server Logs
Look at the server output for detailed error messages:
# Start with verbose logging
RUST_LOG=debug cargo run -- start
Test Basic Connectivity
# Test if server is responding
curl http://localhost:3017/health
# If health endpoint doesn't exist, try a simple request
curl "http://localhost:3017/partition/test/instance/basic/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "gemma3", "messages": [{"role": "user", "content": "hello"}]}'
API and Model Issues
"Internal Server Error" Responses
Symptoms:
- HTTP 500 errors
- Generic error messages
- Requests failing unexpectedly
Solutions:
Verify API Keys
Check that your API keys are set correctly:
echo $OPENAI_API_KEY
echo $MISTRAL_API_KEY
echo $GEMINI_API_KEY
If not set:
export OPENAI_API_KEY="your-openai-key"
export MISTRAL_API_KEY="your-mistral-key"
export GEMINI_API_KEY="your-gemini-key"
Check Model Names
Ensure you're using supported model names:
Model | Provider | API Key Required |
---|---|---|
gpt-4 , gpt-4o , gpt-4o-mini , gpt-3.5-turbo | OpenAI | Yes (OPENAI_API_KEY ) |
gpt-4o-search-preview | OpenAI | Yes (OPENAI_API_KEY ) |
llama3.2 , gemma3 , or any custom name | Ollama | No |
mistral-large-2402 | Mistral | Yes (MISTRAL_API_KEY ) |
gemini-2.0-flash , gemini-2.5-flash-preview-05-20 | Yes (GEMINI_API_KEY ) |
Verify Ollama (for local models)
If using Ollama models, verify Ollama is running:
# Check Ollama status
ollama list
# If not running, start it
ollama serve
# Test Ollama directly
curl http://localhost:11434/api/tags
Deserialization Errors
Symptoms:
- JSON parsing errors
- "Failed to deserialize" messages
- Malformed request errors
Solutions:
Check JSON Format
Ensure your JSON request is properly formatted:
# Good format
curl "http://localhost:3017/partition/$USER/instance/test/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Hello"
}
]
}'
Content-Type Header
Always use the correct content type:
# Always include this header
-H "Content-Type: application/json"
Optional Fields
Remember that fields like web_search_options
are optional and can be omitted:
# This is valid without web_search_options
{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}
Connection Issues
Symptoms:
- Timeout errors
- Network unreachable
- DNS resolution failures
Solutions:
Check Provider URLs
Verify that custom provider URLs are accessible:
# Test OpenAI endpoint
curl -I https://api.openai.com/v1/chat/completions
# Test custom endpoint (if configured)
curl -I $RSV_OPENAI_BASE_URL
Verify Internet Connectivity
For cloud providers, ensure internet connectivity:
# Test internet connection
ping google.com
# Test specific provider
ping api.openai.com
Check Firewall Settings
Ensure no firewall is blocking outbound requests:
# Check if ports are blocked
telnet api.openai.com 443
telnet localhost 11434 # For Ollama
Database Issues
Neo4j Connection Problems
Symptoms:
- "Failed to connect to Neo4j" errors
- Database timeout errors
- Authentication failures
Solutions:
Check Neo4j Status
# Check if Neo4j is running
systemctl status neo4j # Linux
brew services list | grep neo4j # macOS
# Check Neo4j logs
journalctl -u neo4j # Linux
tail -f /usr/local/var/log/neo4j/neo4j.log # macOS
Verify Connection Details
Check your Neo4j connection settings:
# Default connection
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password
Test Neo4j Directly
# Test with cypher-shell
cypher-shell -a bolt://localhost:7687 -u neo4j -p your-password
# Or use Neo4j Browser
# Navigate to http://localhost:7474
Vector Index Issues
Symptoms:
- Slow semantic search
- "Index not found" errors
- Context enrichment not working
Solutions:
Recreate Vector Index
# Stop Reservoir
# Connect to Neo4j and run:
DROP INDEX embedding_index IF EXISTS;
CREATE VECTOR INDEX embedding_index FOR (n:EmbeddingNode) ON (n.embedding) OPTIONS {indexConfig: {`vector.dimensions`: 1536, `vector.similarity_function`: 'cosine'}};
Check Index Status
SHOW INDEXES;
Memory and Performance Issues
High Memory Usage
Symptoms:
- System running out of memory
- Slow responses
- Process killed by system
Solutions:
Monitor Resource Usage
# Check Reservoir process
ps aux | grep reservoir
# Monitor system resources
htop
# or
top
Use Smaller Models
Switch to smaller models if using Ollama:
# Instead of large models, use smaller ones
ollama pull gemma3:2b # 2B parameters instead of 7B
Limit Conversation History
The system automatically manages token limits, but you can monitor:
# View recent conversations to check size
cargo run -- view 10 --partition $USER --instance your-instance
Slow Responses
Symptoms:
- Long wait times for responses
- Timeouts
- Poor performance
Solutions:
Check Model Performance
Different models have different performance characteristics:
- Fastest: Smaller Ollama models (2B-7B parameters)
- Medium: Cloud models like GPT-3.5-turbo
- Slowest: Large local models (13B+ parameters)
Optimize Ollama
# Use GPU acceleration if available
ollama run gemma3 --gpu
# Check Ollama performance
ollama ps
Network Optimization
For cloud models:
# Test network speed to provider
curl -w "@curl-format.txt" -o /dev/null -s "https://api.openai.com/v1/models"
Testing and Debugging
Systematic Troubleshooting
Step 1: Test Basic Setup
# Test Reservoir is running
curl http://localhost:3017/health
# Test with simplest possible request
curl "http://localhost:3017/partition/test/instance/debug/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "gemma3", "messages": [{"role": "user", "content": "hi"}]}'
Step 2: Test with Different Models
# Test Ollama model (no API key)
curl "http://localhost:3017/partition/test/instance/debug/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "gemma3", "messages": [{"role": "user", "content": "test"}]}'
# Test OpenAI model (requires API key)
curl "http://localhost:3017/partition/test/instance/debug/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "test"}]}'
Step 3: Check Logs
# Run with debug logging
RUST_LOG=debug cargo run -- start
# Check for specific error patterns
grep -i error reservoir.log
grep -i "failed" reservoir.log
Using the Included Tests
Reservoir includes hurl tests that you can use to verify your setup:
# Test all endpoints
./hurl/test.sh
# Test specific endpoints
hurl --variable USER="$USER" --variable OPENAI_API_KEY="$OPENAI_API_KEY" hurl/chat_completion.hurl
hurl --variable USER="$USER" hurl/reservoir-view.hurl
hurl --variable USER="$USER" hurl/reservoir-search.hurl
# Test Ollama mode
hurl hurl/ollama_mode.hurl
Getting Help
If you encounter issues not covered here:
- Check the server logs for detailed error messages
- Verify your environment variables are set correctly
- Test with a simple curl request first
- Try the included hurl tests to isolate the problem
- Check the FAQ for additional solutions
- Review the debugging guide for advanced troubleshooting
Environment Variable Reference
For quick reference, here are the key environment variables:
# Provider endpoints
RSV_OPENAI_BASE_URL="https://api.openai.com/v1/chat/completions"
RSV_OLLAMA_BASE_URL="http://localhost:11434/v1/chat/completions"
RSV_MISTRAL_BASE_URL="https://api.mistral.ai/v1/chat/completions"
# API keys
OPENAI_API_KEY="your-openai-key"
MISTRAL_API_KEY="your-mistral-key"
GEMINI_API_KEY="your-gemini-key"
# Reservoir settings
RESERVOIR_PORT="3017"
# Neo4j settings
NEO4J_URI="bolt://localhost:7687"
NEO4J_USER="neo4j"
NEO4J_PASSWORD="your-password"
Frequently Asked Questions
This section addresses common questions and issues you might encounter while using Reservoir.
General Questions
What is Reservoir?
Reservoir is a memory system for LLM conversations that acts as a smart proxy between your applications and OpenAI-compatible APIs. It automatically stores conversation history and enriches new requests with relevant context from past conversations.
Does Reservoir support streaming responses?
No, streaming responses are not currently supported. All requests are handled in a non-streaming manner. The response is returned once the complete message is received from the LLM provider.
Can I use Reservoir with clients other than the OpenAI Python library?
Yes, Reservoir is designed to be fully OpenAI-compatible. It has been tested with:
curl
command line tool- OpenAI Python library
- Chat Gipitty
- Any application that can make HTTP requests to OpenAI-compatible endpoints
However, compatibility with some specialized clients may vary. If you encounter issues with a specific client, please report it as an issue.
What LLM providers does Reservoir support?
Reservoir supports multiple LLM providers:
- OpenAI: GPT-4, GPT-4o, GPT-3.5-turbo, and specialized models
- Ollama: Local models like Llama, Gemma, and any custom models
- Mistral AI: Cloud-hosted Mistral models
- Google Gemini: Google's AI models
- Custom providers: Any OpenAI-compatible API endpoint
How does Reservoir organize conversations?
Reservoir uses a two-level organization system:
- Partition: Top-level grouping (typically your username)
- Instance: Application-specific context within a partition
This allows you to keep conversations from different applications separate while maintaining context within each application.
Is my data private?
Yes, absolutely. All conversation data is stored locally in your Neo4j database and never leaves your infrastructure. Reservoir only forwards your requests to the LLM providers you choose to use.
Technical Questions
What database does Reservoir use?
Reservoir uses Neo4j as its graph database. Neo4j provides:
- Vector similarity search for semantic matching
- Graph relationships for conversation threading
- Efficient querying for context enrichment
- Scalable storage for large conversation histories
How does context enrichment work?
When you send a message, Reservoir:
- Stores your message in the database
- Searches for semantically similar past messages
- Retrieves recent conversation history
- Injects relevant context into your request
- Sends the enriched request to the LLM provider
- Stores the response for future context
What are the token limits?
Reservoir respects the token limits of the underlying LLM models:
- GPT-4: 8,192 tokens (context window)
- GPT-4-32k: 32,768 tokens
- GPT-3.5-turbo: 4,096 tokens
- Local models: Varies by model
Reservoir automatically truncates context to fit within these limits while preserving system prompts and your latest message.
Can I run multiple Reservoir instances?
Yes, you can run multiple instances by:
- Using different ports (
RESERVOIR_PORT
) - Using different Neo4j databases
- Using different partition/instance combinations
Troubleshooting
Neo4j Connection Issues
Problem: Unable to connect to Neo4j.
Solutions:
-
Ensure Neo4j is running:
docker ps | grep neo4j
-
Check your connection details in
.env
:NEO4J_URI=bolt://localhost:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=password
-
Test the connection manually:
curl -u neo4j:password http://localhost:7474/db/data/
OpenAI API Key Issues
Problem: Requests fail due to missing or invalid API key.
Solutions:
-
Verify your API key is set:
echo $OPENAI_API_KEY
-
Test the key directly with OpenAI:
curl https://api.openai.com/v1/models \ -H "Authorization: Bearer $OPENAI_API_KEY"
-
Ensure there are no extra spaces or quotes in your environment variable.
Token Limit Errors
Problem: Requests fail due to exceeding the token limit.
Solutions:
- Reduce the size of your input message
- Clear old conversation history for the partition/instance
- Use a model with a larger context window (e.g., GPT-4-32k)
- Check if context enrichment is adding too much historical data
Port Already in Use
Problem: Reservoir fails to start because port 3017 is already in use.
Solutions:
-
Check what's using the port:
lsof -i :3017
-
Use a different port:
export RESERVOIR_PORT=3018
-
Kill the process using the port (if safe to do so):
kill -9 $(lsof -ti:3017)
Permission Denied (macOS Service)
Problem: Service fails to start due to permission issues.
Solutions:
-
Check the binary path in the plist file:
cat ~/Library/LaunchAgents/com.sectorflabs.reservoir.plist
-
Ensure the binary exists and is executable:
ls -la ~/.cargo/bin/reservoir
-
Update the path in the plist if necessary
Slow Performance
Problem: Reservoir responses are slow.
Solutions:
- Check Neo4j memory allocation
- Ensure Neo4j data is on fast storage (SSD)
- Optimize vector index settings
- Reduce the number of context messages retrieved
- Check network connectivity to LLM providers
Installation Questions
Do I need to install Neo4j separately?
No, the recommended approach is to use Docker Compose, which automatically sets up Neo4j for you:
docker-compose up -d
Can I use an existing Neo4j instance?
Yes, you can connect to any Neo4j instance by setting the appropriate environment variables:
NEO4J_URI=bolt://your-neo4j-host:7687
NEO4J_USERNAME=your-username
NEO4J_PASSWORD=your-password
What Rust version do I need?
Reservoir requires the latest stable version of Rust. You can install it with:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Integration Questions
How do I integrate with Chat Gipitty?
See the dedicated Chat Gipitty Integration guide for detailed setup instructions.
Can I use Reservoir with my existing Python scripts?
Yes, simply change the base URL in your OpenAI client:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3017/v1/partition/myuser/instance/myapp",
api_key=os.environ.get("OPENAI_API_KEY")
)
How do I migrate my existing conversation data?
Reservoir provides import/export functionality:
# Export from another system (if supported)
reservoir export > conversations.json
# Import into Reservoir
reservoir import conversations.json
Advanced Usage
Can I customize the similarity threshold for context matching?
Currently, the similarity threshold (0.85) is hardcoded, but this may become configurable in future versions.
How do I backup my conversation data?
Use the export command to create backups:
reservoir export > backup-$(date +%Y%m%d).json
Can I run Reservoir in production?
Reservoir is currently designed for local development use. For production deployment, consider:
- Securing the Neo4j database
- Setting up proper authentication
- Configuring appropriate firewall rules
- Using HTTPS for external access
Getting Help
If your question isn't answered here:
- Check the Common Issues section
- Review the API Documentation
- Look at existing GitHub issues
- Create a new issue with details about your problem
Remember to include:
- Your operating system
- Rust version (
rustc --version
) - Neo4j version
- Relevant log output
- Steps to reproduce the issue
Contributing to Reservoir
Thank you for your interest in contributing to Reservoir! This guide will help you get started with development and contributing to the project.
Development Setup
Prerequisites
Before you begin, ensure you have the following installed:
-
Rust (latest stable version)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source ~/.cargo/env
-
Docker (for Neo4j database)
-
Git for version control
Step 1: Fork and Clone
- Fork the repository on GitHub
- Clone your fork locally:
git clone https://github.com/yourusername/reservoir.git
cd reservoir
Step 2: Start the Database
Start Neo4j using Docker Compose:
docker-compose up -d
This starts Neo4j on bolt://localhost:7687
with the default credentials.
Step 3: Environment Configuration
Create a .env
file in the project root or export environment variables:
# Server Configuration
RESERVOIR_PORT=3017
RESERVOIR_HOST=127.0.0.1
# Database Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password
# API Keys (add as needed)
OPENAI_API_KEY=sk-your-key-here
MISTRAL_API_KEY=your-mistral-key
GEMINI_API_KEY=your-gemini-key
# Provider URLs (optional)
RSV_OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
RSV_OLLAMA_BASE_URL=http://localhost:11434/v1/chat/completions
Step 4: Build and Run
# Build the project
cargo build
# Run in development mode with auto-reload
make dev
# Or run directly
cargo run -- start
Reservoir will be available at http://localhost:3017
.
Development Workflow
Making Changes
-
Create a feature branch:
git checkout -b feature/your-feature-name
-
Make your changes following the coding standards below
-
Test your changes:
# Run tests cargo test # Run API tests ./hurl/test.sh # Test specific functionality make run
-
Update documentation if needed:
# Build documentation make book # Serve locally to preview make serve-book
Code Standards
Rust Code Style
-
Use
rustfmt
for formatting:cargo fmt
-
Use
clippy
for linting:cargo clippy
-
Follow Rust naming conventions:
snake_case
for functions and variablesPascalCase
for types and structsSCREAMING_SNAKE_CASE
for constants
Documentation
- Document all public APIs with rustdoc comments
- Include examples in documentation where helpful
- Update the book documentation for user-facing changes
Testing
- Write unit tests for new functionality
- Add integration tests for API endpoints
- Ensure existing tests pass before submitting
Commit Guidelines
Use conventional commit messages:
type(scope): description
[optional body]
[optional footer]
Types:
feat
: New featurefix
: Bug fixdocs
: Documentation changesstyle
: Code style changes (formatting, etc.)refactor
: Code refactoringtest
: Adding or updating testschore
: Maintenance tasks
Examples:
feat(api): add web search options support
fix(db): resolve connection pooling issue
docs(book): update installation guide
Testing
Unit Tests
Run unit tests:
cargo test
Integration Tests
Test the API endpoints:
# Test all endpoints
./hurl/test.sh
# Test specific endpoint
hurl --variable USER="$USER" --variable OPENAI_API_KEY="$OPENAI_API_KEY" hurl/chat_completion.hurl
Manual Testing
-
Start Reservoir:
make run
-
Test with curl:
curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'
Documentation
Building the Book
The documentation is built with mdBook:
# Build documentation
make book
# Serve with live reload
make serve-book
# Clean generated docs
make clean-book
Writing Documentation
- Use clear, concise language
- Include code examples
- Test all code examples
- Link related sections
- Consider the user's journey
Submitting Changes
Pull Request Process
-
Ensure your code is ready:
-
Tests pass (
cargo test
) -
Code is formatted (
cargo fmt
) -
No clippy warnings (
cargo clippy
) - Documentation updated if needed
-
Tests pass (
-
Create a pull request:
- Use a descriptive title
- Explain what changes you made and why
- Reference any related issues
- Include screenshots for UI changes
-
Respond to feedback:
- Address review comments promptly
- Ask questions if feedback is unclear
- Update your branch as needed
Pull Request Template
When creating a PR, include:
## Description
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update
## Testing
- [ ] Unit tests pass
- [ ] Integration tests pass
- [ ] Manual testing completed
## Checklist
- [ ] Code follows project style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No breaking changes (or documented)
Release Process
For maintainers:
- Version Bump: Update version in
Cargo.toml
- Changelog: Update
CHANGELOG.md
with changes - Tag Release: Create and push a git tag
- Build Documentation: Ensure docs are up to date
- Publish: Publish to crates.io if applicable
Getting Help
- Issues: Check existing issues or create a new one
- Discussions: Use GitHub Discussions for questions
- Documentation: Refer to the full documentation at sectorflabs.com/reservoir
Code of Conduct
Please be respectful and constructive in all interactions. We're here to build something useful together!
Architecture Overview
Before making significant changes, familiarize yourself with:
Common Development Tasks
Adding a New API Endpoint
- Define the endpoint in
src/api/
- Add routing logic
- Implement request/response handling
- Add tests
- Update documentation
Adding a New AI Provider
- Implement provider trait
- Add configuration options
- Update model routing logic
- Add tests with mock responses
- Document the new provider
Database Schema Changes
- Create migration script in
migrations/
- Update data model documentation
- Test migration on sample data
- Ensure backward compatibility
Thank you for contributing to Reservoir!