Introduction

Reservoir is first and foremost a memory system for interactions with large language models, designed to build a Retrieval-Augmented Generation (RAG) database of useful context from language model interactions over time. It maintains conversation history in a Neo4j graph database and automatically injects relevant context into requests based on semantic similarity and recency. Reservoir can also act as an optional stateful proxy server for OpenAI-compatible Chat Completions APIs.

Reservoir

Problem Statement

By default , Language Model interactions are stateless. Each request must include the complete conversation history for the model to maintain context. This creates several technical challenges:

Manual conversation state management: Applications must implement their own conversation storage and retrieval systems
Token limit constraints: As conversations grow, they exceed model token limits
Inability to reference semantically related conversations: Previous relevant discussions cannot be automatically incorporated
No persistent storage: Conversation data is lost when applications terminate

Technical Solution

Reservoir addresses these limitations by acting as an intermediary layer that:

Stores all messages in a Neo4j graph database with full conversation history
Computes embeddings using BGE-Large-EN-v1.5 for semantic similarity calculation
Creates semantic relationships (synapses) between messages when cosine similarity exceeds 0.85
Automatically injects relevant context into new requests based on similarity and recency
Manages token limits through intelligent truncation while preserving system and user messages

Architecture Overview

Reservoir is a command line tool that intercepts API calls, enriches them with relevant context, and forwards requests to the target Language Model provider. It can also run as an HTTP proxy, acting as an intermediary between clients and API endpoints. All conversation data remains local to the deployment environment.

Data Model

Conversations are stored as a graph structure:

MessageNode: Individual messages with metadata and embeddings
EmbeddingNode: Vector representations for semantic search operations
SYNAPSE: Relationships between semantically similar messages
RESPONDED_WITH: Sequential conversation flow relationships
HAS_EMBEDDING: Message-to-embedding associations

Supported Providers (Proxy Mode)

The system supports multiple Language Model providers through a unified interface:

OpenAI (gpt-4, gpt-4o, gpt-3.5-turbo)
Ollama (local model execution)
Mistral AI
Google Gemini
Any OpenAI-compatible endpoint

Implementation Details

The server initializes a vector index in Neo4j for efficient semantic search and listens on a configurable port (default: 3017). Conversations are organized using a partition/instance hierarchy enabling multi-tenant isolation.

Conversation Graph View

Use Cases

Stateful chat applications: Eliminate manual conversation state management
Cross-session context: Maintain context across application restarts
Semantic search: Retrieve relevant historical conversations
Multi-provider workflows: Maintain context when switching between Language Model providers
Research and development: Build persistent knowledge bases from Language Model interactions

For implementation details, see the Quick Start guide.

Getting Started

Welcome to Reservoir! This section will guide you through everything you need to get up and running with Reservoir quickly and efficiently.

What You'll Learn

In this section, you'll learn how to:

Install Reservoir - Set up Reservoir on your system with all prerequisites
Quick Start - Get Reservoir running in minutes with basic configuration
Your First Chat - Send your first LLM conversation through Reservoir

Prerequisites

Before you begin, make sure you have:

Neo4j database running (local or remote)
Rust toolchain installed (for building from source)
API keys for your preferred LLM providers (OpenAI, Mistral, etc.)

Getting Help

If you run into any issues during setup, check out our Help & Support section for troubleshooting guides and frequently asked questions.

Let's get started!

Installation

This guide will walk you through installing and setting up Reservoir on your system.

Prerequisites

Before installing Reservoir, make sure you have the following dependencies installed:

Required Dependencies

Rust and Cargo (latest stable version)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env

Neo4j Database (version 4.4 or later)

Option A: Using Docker (Recommended)

docker run \
    --name neo4j \
    -p7474:7474 -p7687:7687 \
    -d \
    -v $HOME/neo4j/data:/data \
    -v $HOME/neo4j/logs:/logs \
    -v $HOME/neo4j/import:/var/lib/neo4j/import \
    -v $HOME/neo4j/plugins:/plugins \
    --env NEO4J_AUTH=neo4j/password \
    neo4j:latest

Option B: Native Installation

Download from Neo4j Download Center
Follow the installation instructions for your operating system

Optional Dependencies

mdBook (for building documentation)
```
cargo install mdbook
```

Hurl (for running API tests)

# macOS
brew install hurl

# Linux
curl --location --remote-name https://github.com/Orange-OpenSource/hurl/releases/latest/download/hurl_amd64.deb
sudo dpkg -i hurl_amd64.deb

Installing Reservoir

From Source (Recommended)

Clone the repository

git clone https://github.com/divanvisagie/reservoir.git
cd reservoir

Build the project
```
cargo build --release
```
Install the binary (optional)
```
cargo install --path .
```

Using Cargo Install

Once Reservoir is published to crates.io, you'll be able to install it directly:

cargo install reservoir

Configuration

Environment Variables

Create an .env file in your project directory or set these environment variables:

# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password

# Server Configuration
RESERVOIR_PORT=3017
RESERVOIR_HOST=127.0.0.1

# API Keys (set as needed)
OPENAI_API_KEY=your-openai-key-here
MISTRAL_API_KEY=your-mistral-key-here
GEMINI_API_KEY=your-gemini-key-here

# Custom Provider Endpoints (optional)
RSV_OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
RSV_OLLAMA_BASE_URL=http://localhost:11434/v1/chat/completions
RSV_MISTRAL_BASE_URL=https://api.mistral.ai/v1/chat/completions

Using direnv (Recommended)

If you're using direnv, you can create a .envrc file:

# .envrc
export NEO4J_URI=bolt://localhost:7687
export NEO4J_USERNAME=neo4j
export NEO4J_PASSWORD=password
export RESERVOIR_PORT=3017
export OPENAI_API_KEY=your-openai-key-here

Then activate it:

direnv allow

Verification

1. Check Neo4j Connection

Make sure Neo4j is running and accessible:

# If using Docker
docker ps | grep neo4j

# Test connection (replace with your credentials)
curl -u neo4j:password http://localhost:7474/db/data/

2. Start Reservoir

# From the repository directory
cargo run -- start

# Or if you installed the binary
reservoir start

You should see output similar to:

2024-01-01T12:00:00Z [INFO] Initializing vector index in Neo4j...
2024-01-01T12:00:01Z [INFO] Server starting on http://127.0.0.1:3017

3. Test the Installation

Run the included tests to verify everything is working:

# Test all endpoints
./hurl/test.sh

# Or test individual endpoints
hurl --variable USER="$USER" --variable OPENAI_API_KEY="$OPENAI_API_KEY" hurl/chat_completion.hurl

4. Simple API Test

Test with a basic curl request:

curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [
            {
                "role": "user",
                "content": "Hello, Reservoir!"
            }
        ]
    }'

Troubleshooting Installation

Common Issues

Neo4j Connection Failed

Verify Neo4j is running: docker ps or check your local Neo4j service
Check credentials in your environment variables
Ensure ports 7474 and 7687 are not blocked

Cargo Build Fails

Update Rust: rustup update
Clear cargo cache: cargo clean
Check for system dependency issues

Port Already in Use

Change the port: export RESERVOIR_PORT=3018
Kill existing processes: lsof -ti:3017 | xargs kill

API Key Issues

Verify your API keys are set correctly: echo $OPENAI_API_KEY
Check for extra whitespace or quotes in environment variables

Getting Help

If you encounter issues:

Check the Troubleshooting section
Review the server logs for detailed error messages
Verify all prerequisites are properly installed
Test with the simplest possible configuration first

Next Steps

Once Reservoir is installed and running:

Follow the Getting Started guide
Try the Chat Gipitty Integration
Explore the API Reference
Check out Usage Examples

Quick Start

This guide will get you up and running with Reservoir in just a few minutes.

Before You Begin

Make sure you have:

Reservoir installed (see Installation)
Neo4j running locally
At least one API key configured (OpenAI, Mistral, or Gemini)

Step 1: Start the Server

Open a terminal and start Reservoir:

cargo run -- start

You should see:

[INFO] Initializing vector index in Neo4j for semantic search
[INFO] Server starting on http://127.0.0.1:3017

Keep this terminal open - Reservoir is now running and ready to handle requests.

Step 2: Your First Chat Request

Open a new terminal and send your first chat request:

curl "http://127.0.0.1:3017/partition/$USER/instance/quickstart/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [
            {
                "role": "user",
                "content": "Hello! What is Reservoir?"
            }
        ]
    }'

The response will look like a standard OpenAI API response, but Reservoir has:

Stored your message and the LLM's response
Tagged them with your username and "quickstart" instance
Made them available for future context enrichment

Step 3: See the Memory in Action

Send a follow-up question that references your previous conversation:

curl "http://127.0.0.1:3017/partition/$USER/instance/quickstart/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [
            {
                "role": "user",
                "content": "Can you elaborate on what you just told me?"
            }
        ]
    }'

Notice how the LLM understands "what you just told me" - that's Reservoir automatically injecting the previous conversation context!

Step 4: View Your Conversation History

Check what Reservoir has stored:

cargo run -- view 5 --partition "$USER" --instance quickstart

You'll see output like:

2024-01-01T12:00:00+00:00 [abc123] user: Hello! What is Reservoir?
2024-01-01T12:00:01+00:00 [abc123] assistant: Reservoir is a memory system for AI conversations...
2024-01-01T12:01:00+00:00 [def456] user: Can you elaborate on what you just told me?
2024-01-01T12:01:01+00:00 [def456] assistant: Certainly! Let me expand on Reservoir's capabilities...

Step 5: Try Different Models

Reservoir supports multiple providers. Try Ollama (no API key needed):

curl "http://127.0.0.1:3017/partition/$USER/instance/quickstart/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3.2",
        "messages": [
            {
                "role": "user",
                "content": "What did we discuss earlier about Reservoir?"
            }
        ]
    }'

Even though you're using a different model (Ollama instead of OpenAI), Reservoir still provides the conversation context!

Understanding the URL Structure

The Reservoir API endpoint follows this pattern:

http://localhost:3017/partition/{partition}/instance/{instance}/v1/chat/completions

Partition: Organizes conversations (typically your username)
Instance: Sub-organizes within a partition (like "quickstart", "work", "personal")
This keeps different contexts separate while allowing context sharing within each space

What Just Happened?

Storage: Every message (yours and the LLM's) was stored in Neo4j
Context Enrichment: Reservoir automatically found relevant past messages and included them in requests
Multi-Provider: You used both OpenAI and Ollama with the same conversation history
Organization: Your conversations were organized by partition and instance

Next Steps

Now that you've seen Reservoir in action, explore:

Chat Gipitty Integration - Add memory to your existing cgip setup
Python Integration - Use with the OpenAI Python library
API Reference - Detailed API documentation
Features - Learn about advanced features

Quick Reference

Common Commands

# Start the server
cargo run -- start

# View recent messages
cargo run -- view 10 --partition $USER --instance myapp

# Export conversations
cargo run -- export > backup.json

# Import conversations
cargo run -- import backup.json

# Search conversations
cargo run -- search "your query" --partition $USER

Environment Variables

export RESERVOIR_PORT=3017                    # Server port
export NEO4J_URI=bolt://localhost:7687        # Neo4j connection
export OPENAI_API_KEY=your-key-here          # OpenAI API key
export MISTRAL_API_KEY=your-key-here         # Mistral API key

Ready to dive deeper? Check out the Usage Examples or learn about Chat Gipitty Integration!

Your First Chat

This guide will walk you through sending your first message through Reservoir and demonstrate how its memory and context features work.

Prerequisites

Before starting, make sure you have:

Reservoir server running (cargo run -- start)
Neo4j database accessible
API keys set up (if using cloud providers)

Example 1: Testing with Ollama (Local)

Let's start with a local Ollama model since it doesn't require API keys:

Step 1: Send your first message

curl "http://127.0.0.1:3017/partition/$USER/instance/first-chat/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [
            {
                "role": "user",
                "content": "Hello! My name is Alice and I love programming in Python."
            }
        ]
    }'

Step 2: Ask a follow-up question

Now ask something that requires memory of the previous conversation:

curl "http://127.0.0.1:3017/partition/$USER/instance/first-chat/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [
            {
                "role": "user",
                "content": "What programming language do I like?"
            }
        ]
    }'

Magic! The Language Model will remember that you like Python, even though you didn't include the previous conversation in your request. Reservoir handled that automatically!

Step 3: Continue the conversation

curl "http://127.0.0.1:3017/partition/$USER/instance/first-chat/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [
            {
                "role": "user",
                "content": "Can you suggest a Python project for someone at my skill level?"
            }
        ]
    }'

The Language Model will make suggestions based on knowing you're Alice who loves Python programming!

Example 2: Using OpenAI Models

If you have an OpenAI API key set up:

Step 1: Introduction with GPT-4

curl "http://127.0.0.1:3017/partition/$USER/instance/gpt-chat/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [
            {
                "role": "user",
                "content": "Hi! I am working on a machine learning project about image classification."
            }
        ]
    }'

Step 2: Ask for specific help

curl "http://127.0.0.1:3017/partition/$USER/instance/gpt-chat/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [
            {
                "role": "user",
                "content": "What neural network architecture would you recommend for my project?"
            }
        ]
    }'

GPT-4 will remember you're working on image classification and provide relevant recommendations!

Example 3: Cross-Model Conversations

One of Reservoir's unique features is that conversation context can span multiple models:

Step 1: Start with Ollama

curl "http://127.0.0.1:3017/partition/$USER/instance/cross-model/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [
            {
                "role": "user",
                "content": "I am learning about quantum computing basics."
            }
        ]
    }'

Step 2: Switch to GPT-4

curl "http://127.0.0.1:3017/partition/$USER/instance/cross-model/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [
            {
                "role": "user",
                "content": "Can you explain quantum superposition in more detail?"
            }
        ]
    }'

GPT-4 will know you're learning quantum computing and provide an explanation appropriate to your level!

Understanding the Results

What Reservoir Does Behind the Scenes

When you send a message, Reservoir:

Stores your message in Neo4j with embeddings
Searches for relevant context from previous conversations
Injects relevant history into your request automatically
Forwards the enriched request to the Language Model provider
Stores the Language Model's response for future context

Viewing Your Conversation History

You can see your stored conversations using the CLI:

# View last 5 messages in the first-chat instance
cargo run -- view 5 --partition $USER --instance first-chat

Sample output:

2025-06-21T09:10:01+00:00 [abc123] user: Hello! My name is Alice and I love programming in Python.
2025-06-21T09:10:02+00:00 [abc123] assistant: Hello Alice! It's great to meet a fellow Python enthusiast...
2025-06-21T09:11:10+00:00 [def456] user: What programming language do I like?
2025-06-21T09:11:12+00:00 [def456] assistant: You mentioned that you love programming in Python!
2025-06-21T09:12:00+00:00 [ghi789] user: Can you suggest a Python project for someone at my skill level?

Testing Different Scenarios

Scenario 1: Different Partitions

Try organizing conversations by topic using different partitions:

# Work-related conversations
curl "http://127.0.0.1:3017/partition/work/instance/coding/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{"model": "gemma3", "messages": [{"role": "user", "content": "I need help debugging a React component."}]}'

# Personal learning
curl "http://127.0.0.1:3017/partition/personal/instance/learning/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{"model": "gemma3", "messages": [{"role": "user", "content": "I want to learn guitar playing."}]}'

Each partition maintains separate conversation history!

Scenario 2: Web Search Integration

If using a model that supports web search:

curl "http://127.0.0.1:3017/partition/$USER/instance/research/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4o-search-preview",
        "messages": [{"role": "user", "content": "What are the latest trends in AI development?"}],
        "web_search_options": {"enabled": true, "max_results": 5}
    }'

Common Issues and Solutions

Server Not Responding

# Check if Reservoir is running
curl http://127.0.0.1:3017/health

# If not running, start it
cargo run -- start

"Model not found" Error

For Ollama models: Make sure Ollama is running and the model is installed
For cloud models: Check your API keys are set correctly

Empty Responses

Check your internet connection for cloud providers
Verify the model name is spelled correctly
Ensure your API key has sufficient credits

Next Steps

Now that you've sent your first chat, explore these features:

Python Integration - Use Reservoir from Python code
Partitioning & Organization - Organize your conversations
Chat Gipitty Integration - Add memory to your existing chat tools
API Reference - Learn about advanced features

Congratulations! You've successfully used Reservoir to have a conversation with persistent memory. The Language Model now remembers everything from your conversation and can reference it in future chats!

Usage & Integration

Reservoir is designed to work seamlessly with your existing AI workflows and tools. This section covers various ways to integrate and use Reservoir in your projects.

Integration Options

Chat Applications

Chat Gipitty Integration - Add persistent memory to your Chat Gipitty conversations
Python with OpenAI Library - Use Reservoir with the popular OpenAI Python client

Direct API Usage

Curl Examples - Command-line examples for testing and scripting
Ollama Integration - Use Reservoir with local Ollama models

Common Use Cases

Multi-session conversations - Maintain context across different chat sessions
Cross-application memory - Share conversation history between different tools
Local AI workflows - Keep conversations private while using local models
Research and development - Build applications that learn from past interactions

Choosing Your Integration

New to AI development? Start with Chat Gipitty Integration
Python developer? Check out Python with OpenAI Library
Command-line user? Try the Curl Examples
Privacy-focused? Use Ollama Integration for fully local conversations

Each integration method maintains the same core benefits: persistent memory, context enrichment, and seamless AI conversations.

Ollama Client Integration

You can use reservoir as a memory system for the Ollama command line client by integrating it with a simple bash script.

You can place the following function in your ~/.bashrc or ~/.zshrc file and it will use reservoir to

Fetch context from the model
Prepend the context to your query
Send the request to the model
Save the output

function contextual_ollama_with_ingest() {
    local user_query="$1"

    # Validate input
    if [ -z "$user_query" ]; then
        echo "Usage: contextual_ollama_with_ingest 'Your question goes here'" >&2
        return 1
    fi

    # Ingest the user's query into Reservoir
    echo "$user_query" | reservoir ingest

    # Generate dynamic system prompt with context
    local system_prompt_content=$(
        echo "the following is info from semantic search based on your query:"
        reservoir search "$user_query" --semantic --link
        echo "the following is recent history:"
        reservoir view 10
    )

    local full_prompt_content=$(
        echo "You are a helpful assistant. Use the following context to answer the user's question."
        echo "$system_prompt_content"
        echo "User's question: ${user_query}"
    )

    # Call cgip with enriched context
    local assistant_response=$(ollama run gemma3 "$full_prompt_content")
    
    # Store the assistant's response
    echo "$assistant_response" | reservoir ingest --role assistant

    # Display the response
    echo "$assistant_response"
}

# Create a convenient alias
alias olm='contextual_ollama_with_ingest'

By adhering to POSIX standards, reservoir become the semantic memory for any shell interaction with a language model.

Chat Gipitty Integration

Reservoir was originally designed as a memory system for Chat Gipitty. This integration gives your cgip conversations persistent memory, context awareness, and the ability to search through your LLM interaction history.

What You Get

When you integrate Reservoir with Chat Gipitty, you get:

Persistent Memory: Your conversations are remembered across sessions
Semantic Search: Find relevant past discussions automatically
Context Enrichment: Each response is informed by your conversation history
Multi-Model Support: Switch between different LLM providers while maintaining context

Setup

Prerequisites

Chat Gipitty installed and working
Reservoir installed and running (see Installation)
Your shell configured (bash or zsh)

Installation

Add this function to your ~/.bashrc or ~/.zshrc file:

function contextual_cgip_with_ingest() {
    local user_query="$1"

    # Validate input
    if [ -z "$user_query" ]; then
        echo "Usage: contextual_cgip_with_ingest 'Your question goes here'" >&2
        return 1
    fi

    # Ingest the user's query into Reservoir
    echo "$user_query" | reservoir ingest

    # Generate dynamic system prompt with context
    local system_prompt_content=$(
        echo "the following is info from semantic search based on your query:"
        reservoir search "$user_query" --semantic --link
        echo "the following is recent history:"
        reservoir view 10
    )

    # Call cgip with enriched context
    local assistant_response=$(cgip "${user_query}" --system-prompt="${system_prompt_content}")
    
    # Store the assistant's response
    echo "$assistant_response" | reservoir ingest --role assistant

    # Display the response
    echo "$assistant_response"
}

# Create a convenient alias
alias gpty='contextual_cgip_with_ingest'

After adding this to your shell configuration, reload it:

# For bash
source ~/.bashrc

# For zsh
source ~/.zshrc

Usage

Basic Usage

Use the function directly:

contextual_cgip_with_ingest "Explain quantum computing in simple terms"

Or use the convenient alias:

gpty "What is machine learning?"

Follow-up Questions

The magic happens with follow-up questions:

gpty "Explain neural networks"
# ... LLM responds with explanation ...

gpty "How do they relate to what we discussed about machine learning earlier?"
# ... LLM responds with context from the previous conversation ...

Different Topics

Start a new topic, and Reservoir will find relevant context:

gpty "I'm learning Rust programming"
# ... later in a different session ...

gpty "Show me some advanced Rust patterns"
# Reservoir will remember you're learning Rust and provide appropriate context

How It Works

Here's what happens when you use the integrated function:

Query Ingestion: Your question is stored in Reservoir
Context Gathering: Reservoir searches for:
- Semantically similar past conversations
- Recent conversation history
Context Injection: This context is provided to cgip as a system prompt
Enhanced Response: cgip responds with awareness of your history
Response Storage: The LLM's response is stored for future context

Advanced Configuration

Custom Search Parameters

You can modify the function to customize how context is gathered:

function contextual_cgip_with_ingest() {
    local user_query="$1"
    
    if [ -z "$user_query" ]; then
        echo "Usage: contextual_cgip_with_ingest 'Your question goes here'" >&2
        return 1
    fi

    echo "$user_query" | reservoir ingest

    # Customize these parameters
    local system_prompt_content=$(
        echo "=== Relevant Context ==="
        reservoir search "$user_query" --semantic --link --limit 5
        echo ""
        echo "=== Recent History ==="
        reservoir view 15 --partition "$USER" --instance "cgip"
    )

    local assistant_response=$(cgip "${user_query}" --system-prompt="${system_prompt_content}")
    
    echo "$assistant_response" | reservoir ingest --role assistant
    echo "$assistant_response"
}

Partitioned Conversations

Organize your conversations by topic or project:

function gpty_work() {
    local user_query="$1"
    if [ -z "$user_query" ]; then
        echo "Usage: gpty_work 'Your work-related question'" >&2
        return 1
    fi

    echo "$user_query" | reservoir ingest --partition "$USER" --instance "work"
    
    local system_prompt_content=$(
        echo "Context from work conversations:"
        reservoir search "$user_query" --semantic --partition "$USER" --instance "work"
        echo "Recent work discussion:"
        reservoir view 10 --partition "$USER" --instance "work"
    )

    local assistant_response=$(cgip "${user_query}" --system-prompt="${system_prompt_content}")
    echo "$assistant_response" | reservoir ingest --role assistant --partition "$USER" --instance "work"
    echo "$assistant_response"
}

function gpty_personal() {
    # Similar function for personal conversations
    # ... implement similarly with --instance "personal"
}

Model Selection

Use different models while maintaining context:

function gpty_creative() {
    local user_query="$1"
    echo "$user_query" | reservoir ingest
    
    local system_prompt_content=$(
        reservoir search "$user_query" --semantic --link
        reservoir view 5
    )

    # Use a creative model via cgip configuration
    local assistant_response=$(cgip "${user_query}" --system-prompt="${system_prompt_content}" --model gpt-4)
    
    echo "$assistant_response" | reservoir ingest --role assistant
    echo "$assistant_response"
}

Benefits of This Integration

Continuous Learning

Your LLM assistant learns from every interaction
Context builds up over time, making responses more personalized
No need to re-explain your projects or preferences

Cross-Session Memory

Resume conversations from days or weeks ago
Reference past decisions and discussions
Build on previous explanations and examples

Semantic Understanding

Ask "What did we discuss about X?" and get relevant results
Similar topics are automatically connected
Context is found even if you use different wording

Privacy

All your conversation history stays local
No data sent to external services beyond the LLM API calls
You control your data completely

Troubleshooting

Function Not Found

Make sure you've sourced your shell configuration:

source ~/.bashrc  # or ~/.zshrc

No Context Being Added

Check that Reservoir is running:

# Should show Reservoir process
ps aux | grep reservoir

# Start if not running
cargo run -- start

Empty Search Results

Build up some conversation history first:

gpty "Tell me about artificial intelligence"
gpty "What are neural networks?"
gpty "How does machine learning work?"

# Now try a search
gpty "What did we discuss about AI?"

Permission Issues

Make sure the function has access to reservoir commands:

# Test individual commands
echo "test" | reservoir ingest
reservoir view 5
reservoir search "test"

Next Steps

Explore API Reference to understand Reservoir's capabilities
Learn about Partitioning to organize conversations
Check out Python Integration for programmatic access
See Troubleshooting if you encounter issues

The Chat Gipitty integration transforms your LLM interactions from isolated conversations into a connected, searchable knowledge base that grows smarter with every interaction.

Python Integration

Reservoir works seamlessly with the popular OpenAI Python library. You simply point the client to your Reservoir instance instead of directly to OpenAI, and Reservoir handles all the memory and context management automatically.

Setup

First, install the OpenAI Python library if you haven't already:

pip install openai

Basic Configuration

import os
from openai import OpenAI

INSTANCE = "my-application"
PARTITION = os.getenv("USER")
RESERVOIR_PORT = os.getenv('RESERVOIR_PORT', '3017')
RESERVOIR_BASE_URL = f"http://localhost:{RESERVOIR_PORT}/v1/partition/{PARTITION}/instance/{INSTANCE}"

OpenAI Models

Basic Usage with OpenAI

import os
from openai import OpenAI

INSTANCE = "my-application"
PARTITION = os.getenv("USER")
RESERVOIR_PORT = os.getenv('RESERVOIR_PORT', '3017')
RESERVOIR_BASE_URL = f"http://localhost:{RESERVOIR_PORT}/v1/partition/{PARTITION}/instance/{INSTANCE}"

client = OpenAI(
    base_url=RESERVOIR_BASE_URL,
    api_key=os.environ.get("OPENAI_API_KEY")
)

completion = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "Write a one-sentence bedtime story about a curious robot."
        }
    ]
)
print(completion.choices[0].message.content)

With Web Search Options

For models that support web search (like gpt-4o-search-preview), you can enable web search capabilities:

completion = client.chat.completions.create(
    model="gpt-4o-search-preview",
    messages=[
        {
            "role": "user",
            "content": "What are the latest trends in machine learning?"
        }
    ],
    extra_body={
        "web_search_options": {
            "enabled": True,
            "max_results": 5
        }
    }
)

Ollama Models (Local)

Using Ollama (No API Key Required)

import os
from openai import OpenAI

INSTANCE = "my-application"
PARTITION = os.getenv("USER")
RESERVOIR_PORT = os.getenv('RESERVOIR_PORT', '3017')
RESERVOIR_BASE_URL = f"http://localhost:{RESERVOIR_PORT}/v1/partition/{PARTITION}/instance/{INSTANCE}"

client = OpenAI(
    base_url=RESERVOIR_BASE_URL,
    api_key="not-needed-for-ollama"  # Ollama doesn't require API keys
)

completion = client.chat.completions.create(
    model="llama3.2",  # or "gemma3", or any Ollama model
    messages=[
        {
            "role": "user",
            "content": "Explain the concept of recursion with a simple example."
        }
    ]
)
print(completion.choices[0].message.content)

Supported Models

Reservoir automatically routes requests to the appropriate provider based on the model name:

Model	Provider	API Key Required
`gpt-4`, `gpt-4o`, `gpt-4o-mini`, `gpt-3.5-turbo`	OpenAI	Yes (`OPENAI_API_KEY`)
`gpt-4o-search-preview`	OpenAI	Yes (`OPENAI_API_KEY`)
`llama3.2`, `gemma3`, or any custom name	Ollama	No
`mistral-large-2402`	Mistral	Yes (`MISTRAL_API_KEY`)
`gemini-2.0-flash`, `gemini-2.5-flash-preview-05-20`	Google	Yes (`GEMINI_API_KEY`)

Note: Any model name not explicitly configured will default to using Ollama.

Environment Variables

You can customize provider endpoints and set API keys using environment variables:

import os

# Set environment variables (or use .env file)
os.environ['OPENAI_API_KEY'] = 'your-openai-key'
os.environ['MISTRAL_API_KEY'] = 'your-mistral-key'
os.environ['GEMINI_API_KEY'] = 'your-gemini-key'

# Custom provider endpoints (optional)
os.environ['RSV_OPENAI_BASE_URL'] = 'https://api.openai.com/v1/chat/completions'
os.environ['RSV_OLLAMA_BASE_URL'] = 'http://localhost:11434/v1/chat/completions'
os.environ['RSV_MISTRAL_BASE_URL'] = 'https://api.mistral.ai/v1/chat/completions'

Complete Example

Here's a complete example that demonstrates Reservoir's memory capabilities:

import os
from openai import OpenAI

def setup_reservoir_client():
    """Setup Reservoir client with proper configuration"""
    instance = "chat-example"
    partition = os.getenv("USER", "default")
    port = os.getenv('RESERVOIR_PORT', '3017')
    base_url = f"http://localhost:{port}/v1/partition/{partition}/instance/{instance}"
    
    return OpenAI(
        base_url=base_url,
        api_key=os.environ.get("OPENAI_API_KEY", "not-needed-for-ollama")
    )

def chat_with_memory(message, model="gpt-4"):
    """Send a message through Reservoir with automatic memory"""
    client = setup_reservoir_client()
    
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": message
            }
        ]
    )
    
    return completion.choices[0].message.content

# Example conversation that builds context
if __name__ == "__main__":
    # First message
    response1 = chat_with_memory("My name is Alice and I love Python programming.")
    print("Assistant:", response1)
    
    # Second message - Reservoir will automatically include context
    response2 = chat_with_memory("What programming language do I like?")
    print("Assistant:", response2)  # Will know you like Python!
    
    # Third message - Even more context
    response3 = chat_with_memory("Can you suggest a project for me?")
    print("Assistant:", response3)  # Will suggest Python projects for Alice!

Benefits of Using Reservoir

When you use Reservoir with the OpenAI library, you get:

Automatic Context: Previous conversations are automatically included
Cross-Session Memory: Conversations persist across different Python sessions
Smart Token Management: Reservoir handles token limits automatically
Multi-Provider Support: Switch between different LLM providers seamlessly
Local Storage: All your conversation data stays on your device

Next Steps

Learn about Partitioning & Organization to organize your conversations
Check out Token Management to understand how Reservoir handles context limits
Explore the API Reference for more advanced usage patterns

Curl Examples

This page provides comprehensive examples of using Reservoir with curl commands. These examples are perfect for testing, scripting, or understanding the API structure.

Basic URL Structure

Instead of calling the provider directly, you call Reservoir with this URL pattern:

Direct Provider: https://api.openai.com/v1/chat/completions
Through Reservoir: http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions

Where:

$USER is your system username (acts as the partition)
reservoir is the instance name (you can use any name)

OpenAI Models

Basic GPT-4 Example

curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [
            {
                "role": "user",
                "content": "Write a one-sentence bedtime story about a brave little toaster."
            }
        ]
    }'

GPT-4 with System Message

curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant that explains complex topics in simple terms."
            },
            {
                "role": "user",
                "content": "Explain quantum computing to a 10-year-old."
            }
        ]
    }'

Web Search Integration

For models that support web search (like gpt-4o-search-preview):

curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4o-search-preview",
        "messages": [
            {
                "role": "user",
                "content": "What are the latest developments in AI?"
            }
        ],
        "web_search_options": {
            "enabled": true,
            "max_results": 5
        }
    }'

Ollama Models (Local)

Basic Ollama Example

No API key needed for Ollama models:

curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [
            {
                "role": "user",
                "content": "Explain quantum computing in simple terms."
            }
        ]
    }'

Using Llama Models

curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3.2",
        "messages": [
            {
                "role": "user",
                "content": "Write a Python function to calculate fibonacci numbers."
            }
        ]
    }'

Other Providers

Mistral AI

curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $MISTRAL_API_KEY" \
    -d '{
        "model": "mistral-large-2402",
        "messages": [
            {
                "role": "user",
                "content": "Explain the differences between functional and object-oriented programming."
            }
        ]
    }'

Google Gemini

curl "http://127.0.0.1:3017/partition/$USER/instance/reservoir/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $GEMINI_API_KEY" \
    -d '{
        "model": "gemini-2.0-flash",
        "messages": [
            {
                "role": "user",
                "content": "Compare different sorting algorithms and their time complexities."
            }
        ]
    }'

Partitioning Examples

Using Different Partitions

You can organize conversations by using different partition names:

# Work conversations
curl "http://127.0.0.1:3017/partition/work/instance/coding/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "Review this code for security issues"}]
    }'

# Personal conversations
curl "http://127.0.0.1:3017/partition/personal/instance/creative/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "Help me write a short story"}]
    }'

Using Different Instances

Different instances within the same partition:

# Development instance
curl "http://127.0.0.1:3017/partition/$USER/instance/development/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "Debug this Python error"}]
    }'

# Research instance
curl "http://127.0.0.1:3017/partition/$USER/instance/research/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "Explain machine learning concepts"}]
    }'

Testing Scenarios

Test Basic Connectivity

# Simple test with Ollama (no API key needed)
curl "http://127.0.0.1:3017/partition/test/instance/basic/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [{"role": "user", "content": "Hello, can you hear me?"}]
    }'

Test Memory Functionality

Send multiple requests to see memory in action:

# First message
curl "http://127.0.0.1:3017/partition/test/instance/memory/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [{"role": "user", "content": "My favorite color is blue."}]
    }'

# Second message - should remember the color
curl "http://127.0.0.1:3017/partition/test/instance/memory/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3", 
        "messages": [{"role": "user", "content": "What is my favorite color?"}]
    }'

Error Handling

Invalid Model

curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "nonexistent-model",
        "messages": [{"role": "user", "content": "Hello"}]
    }'

Missing API Key

curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "Hello"}]
    }'
# Will return error because OPENAI_API_KEY is required for GPT-4

Environment Variables

Set up your environment for easier testing:

export OPENAI_API_KEY="your-openai-key"
export MISTRAL_API_KEY="your-mistral-key"
export GEMINI_API_KEY="your-gemini-key"
export RESERVOIR_URL="http://127.0.0.1:3017"
export USER_PARTITION="$USER"

Then use in requests:

curl "$RESERVOIR_URL/partition/$USER_PARTITION/instance/test/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [{"role": "user", "content": "Hello from the environment!"}]
    }'

Debugging Tips

Pretty Print JSON Response

Add | jq to format the JSON response:

curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [{"role": "user", "content": "Hello"}]
    }' | jq

Verbose Output

Use -v flag to see request/response headers:

curl -v "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [{"role": "user", "content": "Hello"}]
    }'

Save Response

Save the response to a file:

curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [{"role": "user", "content": "Hello"}]
    }' -o response.json

Next Steps

Learn about API Reference for more endpoint details
Check out Python Integration for programmatic usage
Explore Partitioning & Organization to organize your conversations

Ollama Integration

Reservoir works seamlessly with Ollama, allowing you to use local AI models with persistent memory and context enrichment. This is perfect for privacy-focused workflows where you want to keep all your conversations completely local.

What is Ollama?

Ollama is a tool that makes it easy to run large language models locally on your machine. It supports popular models like Llama, Gemma, and many others, all running entirely on your hardware.

Benefits of Using Ollama with Reservoir

Complete Privacy: All conversations stay on your device
No API Keys: No need for cloud service API keys
Offline Capable: Works without internet connection
Cost Effective: No usage-based charges
Full Control: Choose exactly which models to use

Setting Up Ollama

Step 1: Install Ollama

First, install Ollama from ollama.ai:

# On macOS
brew install ollama

# On Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Or download from https://ollama.ai/download

Step 2: Start Ollama Service

ollama serve

This starts the Ollama service on http://localhost:11434.

Step 3: Download Models

Download the models you want to use:

# Download Gemma 3 (Google's model)
ollama pull gemma3

# Download Llama 3.2 (Meta's model)
ollama pull llama3.2

# Download Mistral (Mistral AI's model)
ollama pull mistral

# See all available models
ollama list

Using Ollama with Reservoir

Regular Mode

By default, Reservoir routes any unrecognized model names to Ollama:

curl "http://127.0.0.1:3017/partition/$USER/instance/ollama-chat/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [
            {
                "role": "user",
                "content": "Explain machine learning in simple terms."
            }
        ]
    }'

No API key required!

Ollama Mode

Reservoir also provides a special "Ollama mode" that makes it a drop-in replacement for Ollama's API:

# Start Reservoir in Ollama mode
cargo run -- start --ollama

In Ollama mode, Reservoir:

Uses the same API endpoints as Ollama
Provides the same response format
Adds memory and context enrichment automatically
Makes existing Ollama clients work with persistent memory

Testing Ollama Mode

# Test with the standard Ollama endpoint format
curl "http://127.0.0.1:3017/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [
            {
                "role": "user",
                "content": "Hello, can you remember our previous conversations?"
            }
        ]
    }'

Popular Ollama Models

Gemma 3 (Google)

Excellent for general conversation and coding:

curl "http://127.0.0.1:3017/partition/$USER/instance/coding/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [
            {
                "role": "user",
                "content": "Write a Python function to sort a list of dictionaries by a specific key."
            }
        ]
    }'

Llama 3.2 (Meta)

Great for reasoning and complex tasks:

curl "http://127.0.0.1:3017/partition/$USER/instance/reasoning/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3.2",
        "messages": [
            {
                "role": "user",
                "content": "Solve this logic puzzle: If all roses are flowers, and some flowers are red, can we conclude that some roses are red?"
            }
        ]
    }'

Mistral 7B

Efficient and good for general tasks:

curl "http://127.0.0.1:3017/partition/$USER/instance/general/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "mistral",
        "messages": [
            {
                "role": "user",
                "content": "Summarize the key points of quantum computing for a beginner."
            }
        ]
    }'

Python Integration with Ollama

Using the OpenAI library with local Ollama models:

import os
from openai import OpenAI

# Setup for Ollama through Reservoir
INSTANCE = "ollama-python"
PARTITION = os.getenv("USER", "default")
RESERVOIR_PORT = os.getenv('RESERVOIR_PORT', '3017')
RESERVOIR_BASE_URL = f"http://localhost:{RESERVOIR_PORT}/v1/partition/{PARTITION}/instance/{INSTANCE}"

client = OpenAI(
    base_url=RESERVOIR_BASE_URL,
    api_key="not-needed-for-ollama"  # Ollama doesn't require API keys
)

# Chat with memory using local model
completion = client.chat.completions.create(
    model="gemma3",
    messages=[
        {
            "role": "user",
            "content": "My favorite hobby is gardening. What plants would you recommend for a beginner?"
        }
    ]
)

print(completion.choices[0].message.content)

# Ask a follow-up that requires memory
follow_up = client.chat.completions.create(
    model="gemma3",
    messages=[
        {
            "role": "user", 
            "content": "What tools do I need to get started with my hobby?"
        }
    ]
)

print(follow_up.choices[0].message.content)
# Will remember you're interested in gardening!

Environment Configuration

You can customize the Ollama endpoint if needed:

# Default Ollama endpoint
export RSV_OLLAMA_BASE_URL="http://localhost:11434/v1/chat/completions"

# Custom endpoint (if running Ollama on different port/host)
export RSV_OLLAMA_BASE_URL="http://192.168.1.100:11434/v1/chat/completions"

Performance Tips

Model Selection

gemma3: Good balance of speed and quality
llama3.2: Higher quality but slower
mistral: Fast and efficient
smaller models (7B parameters): Faster on limited hardware
larger models (13B+): Better quality but require more resources

Hardware Considerations

RAM: 8GB minimum, 16GB+ recommended for larger models
GPU: Optional but significantly speeds up inference
Storage: Models range from 4GB to 40GB+ each

Optimizing Performance

# Use GPU acceleration if available
ollama run gemma3 --gpu

# Monitor resource usage
ollama ps

Troubleshooting Ollama

Common Issues

Ollama Not Found

# Check if Ollama is running
curl http://localhost:11434/api/tags

# If not running, start it
ollama serve

Model Not Available

# List installed models
ollama list

# Pull missing model
ollama pull gemma3

Performance Issues

# Check system resources
ollama ps

# Try a smaller model
ollama pull gemma3:2b  # 2B parameter version

Error Messages

"connection refused": Ollama service isn't running
"model not found": Model needs to be pulled with ollama pull
"out of memory": Try a smaller model or close other applications

Combining Local and Cloud Models

One of Reservoir's strengths is seamlessly switching between local and cloud models:

import os
from openai import OpenAI

# Same client setup
client = OpenAI(base_url=RESERVOIR_BASE_URL, api_key=os.environ.get("OPENAI_API_KEY", ""))

# Start with local model for initial draft
local_response = client.chat.completions.create(
    model="gemma3",  # Local Ollama model
    messages=[{"role": "user", "content": "Write a draft email about project updates"}]
)

# Refine with cloud model for better quality
cloud_response = client.chat.completions.create(
    model="gpt-4",  # Cloud OpenAI model
    messages=[{"role": "user", "content": "Please improve the writing quality and make it more professional"}]
)

Both responses will have access to the same conversation context!

Next Steps

Python Integration - Use Ollama models from Python
Features - Multi-Provider Support - Learn about mixing different providers
Partitioning & Organization - Organize your local conversations
Architecture - Data Model - Understand how conversations are stored

Ready to go private? 🔒 With Ollama and Reservoir, you have a completely local AI assistant with persistent memory!

API Overview

Reservoir provides an OpenAI-compatible API endpoint that acts as a smart proxy between your application and LLM language models. This section covers the core API structure and basic usage patterns.

URL Structure

The Reservoir API follows this pattern:

/v1/partition/{partition}/instance/{instance}/chat/completions

Parameters

{partition}: A broad category for organizing conversations (e.g., project name, application name, username)
{instance}: A specific context within the partition (e.g., user ID, session ID, specific feature)

This structure allows you to organize conversations hierarchically and scope context enrichment appropriately.

Example URL Transformation

Instead of:

https://api.openai.com/v1/chat/completions

Use:

http://localhost:3017/v1/partition/$USER/instance/my-application/chat/completions

Here, $USER is your system username, and my-application is your application instance. All context enrichment and history retrieval are scoped to this specific partition/instance combination.

Basic Request Structure

Reservoir maintains full compatibility with the OpenAI Chat Completions API. You can use the same request structure, headers, and parameters you would use with OpenAI directly.

Required Headers

Content-Type: application/json
Authorization: Bearer YOUR_API_KEY

Request Body

The request body follows the same format as OpenAI's Chat Completions API:

{
    "model": "gpt-4",
    "messages": [
        {
            "role": "user",
            "content": "Your message here"
        }
    ]
}

What Happens Behind the Scenes

When you make a request to Reservoir:

Message Storage: Your message is stored with the specified partition/instance
Context Enrichment: Reservoir finds relevant past conversations and recent history
Token Management: The enriched context is checked against token limits
Request Forwarding: The enriched request is forwarded to the appropriate LLM provider
Response Storage: The LLM's response is stored for future context

Response Format

Responses maintain the same format as the underlying LLM provider (OpenAI, Ollama, etc.), so your existing code will work without modification.

Next Steps

Chat Completions Endpoint - Detailed endpoint documentation
Search & Retrieval - Finding past conversations
Data Management - Import/export and management
Command Line Interface - CLI usage and commands

Chat Completions Endpoint

The Chat Completions endpoint is Reservoir's core API, providing full OpenAI API compatibility with intelligent context enrichment. This endpoint automatically enhances your conversations with relevant historical context while maintaining the same request/response format as OpenAI's Chat Completions API.

Endpoint URL

POST /v1/partition/{partition}/instance/{instance}/chat/completions

URL Parameters

Parameter	Description	Example
`partition`	Top-level organization boundary	`alice`, `project_name`, `$USER`
`instance`	Specific context within partition	`coding`, `research`, `session_123`

Example URLs

# User-specific coding assistant
POST /v1/partition/alice/instance/coding/chat/completions

# Project-specific documentation bot  
POST /v1/partition/docs_project/instance/support/chat/completions

# Personal research assistant
POST /v1/partition/$USER/instance/research/chat/completions

# Default partition/instance (if not specified)
POST /v1/chat/completions  # Uses partition=default, instance=default

Request Format

Headers

Content-Type: application/json
Authorization: Bearer YOUR_API_KEY

Request Body

Reservoir accepts the standard OpenAI Chat Completions request format:

{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user", 
      "content": "How do I implement error handling in async functions?"
    }
  ]
}

Supported Models

OpenAI Models:

gpt-4.1
gpt-4-turbo
gpt-4o
gpt-4o-mini
gpt-3.5-turbo
gpt-4o-search-preview

Local Models (via Ollama):

llama3.1:8b
llama3.1:70b
mistral:7b
codellama:latest
Any Ollama-supported model

Message Roles

Role	Description	Usage
`user`	User input messages	Questions, requests, instructions
`assistant`	LLM responses	Previous LLM responses in conversation
`system`	System instructions	Behavior modification, context setting

Context Enrichment Process

When you send a request, Reservoir automatically enhances it with relevant context:

1. Message Analysis

// Your original request
{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "How do I handle database timeouts?"
    }
  ]
}

2. Context Discovery

Reservoir finds relevant context through:

Semantic Search: Messages similar to "database timeouts"
Recent History: Last 15 messages from same partition/instance
Synapse Connections: Related discussions via SYNAPSE relationships

3. Context Injection

// Enriched request sent to the Language Model
{
  "model": "gpt-4", 
  "messages": [
    {
      "role": "system",
      "content": "The following is the result of a semantic search of the most related messages by cosine similarity to previous conversations"
    },
    {
      "role": "user",
      "content": "What's the best way to configure database connection pools?"
    },
    {
      "role": "assistant", 
      "content": "For database connection pools, consider these settings..."
    },
    {
      "role": "system",
      "content": "The following are the most recent messages in the conversation in chronological order"
    },
    {
      "role": "user",
      "content": "I'm working on optimizing database queries"
    },
    {
      "role": "assistant",
      "content": "Here are some query optimization techniques..."
    },
    {
      "role": "user",
      "content": "How do I handle database timeouts?"  // Your original message
    }
  ]
}

Response Format

Reservoir returns responses in the standard OpenAI Chat Completions format:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion", 
  "created": 1677858242,
  "model": "gpt-4",
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 7,
    "total_tokens": 20
  },
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "To handle database timeouts, you should implement retry logic with exponential backoff..."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ]
}

Configuration and Model Selection

Environment Variables

Configure different LLM providers:

# OpenAI (default)
export OPENAI_API_KEY="your-openai-api-key"
export RSV_OPENAI_BASE_URL="https://api.openai.com/v1/chat/completions"

# Ollama (local)
export RSV_OLLAMA_BASE_URL="http://localhost:11434/v1/chat/completions"

# Mistral
export MISTRAL_API_KEY="your-mistral-api-key"
export RSV_MISTRAL_BASE_URL="https://api.mistral.ai/v1/chat/completions"

# Gemini
export GEMINI_API_KEY="your-gemini-api-key"

Model Detection

Reservoir automatically routes requests based on model name:

OpenAI models: gpt-* → OpenAI API
Local models: llama*, mistral*, etc. → Ollama API
Mistral models: mistral-* → Mistral API

Error Handling

Token Limit Errors

If your message exceeds model token limits:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Your last message is too long. It contains approximately 5000 tokens, which exceeds the maximum limit of 4096. Please shorten your message."
      },
      "finish_reason": "length",
      "index": 0
    }
  ]
}

API Connection Errors

{
  "error": {
    "message": "Failed to connect to OpenAI API: Connection timeout. Check your API key and network connection. Using model 'gpt-4' at 'https://api.openai.com/v1/chat/completions'"
  }
}

Invalid Model Errors

{
  "error": {
    "message": "Invalid OpenAI model name: 'gpt-5'. Valid models are: ['gpt-4.1', 'gpt-4-turbo', 'gpt-4o', 'gpt-4o-mini', 'gpt-3.5-turbo', 'gpt-4o-search-preview']"
  }
}

Usage Examples

Basic Request

curl -X POST "http://localhost:3017/v1/partition/alice/instance/coding/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": "Explain async/await in Python"
      }
    ]
  }'

With System Message

curl -X POST "http://localhost:3017/v1/partition/docs/instance/writing/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "system",
        "content": "You are a technical documentation expert. Provide clear, concise explanations."
      },
      {
        "role": "user",
        "content": "How should I document API endpoints?"
      }
    ]
  }'

Local Model (Ollama)

curl -X POST "http://localhost:3017/v1/partition/alice/instance/local/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [
      {
        "role": "user",
        "content": "What are the benefits of using local LLMs?"
      }
    ]
  }'

Integration Examples

Python with OpenAI Library

import openai

# Configure to use Reservoir instead of OpenAI directly
openai.api_base = "http://localhost:3017/v1/partition/alice/instance/coding"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "How do I optimize this database query?"}
    ]
)

print(response.choices[0].message.content)

JavaScript/Node.js

const OpenAI = require('openai');

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: 'http://localhost:3017/v1/partition/myapp/instance/support'
});

async function chat(message) {
  const completion = await openai.chat.completions.create({
    messages: [{ role: 'user', content: message }],
    model: 'gpt-4',
  });

  return completion.choices[0].message.content;
}

Streaming Responses

Reservoir supports streaming responses when the underlying model supports it:

import openai

openai.api_base = "http://localhost:3017/v1/partition/alice/instance/chat"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain machine learning"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Advanced Features

Web Search Integration

Some models support web search capabilities:

{
  "model": "gpt-4o-search-preview",
  "messages": [
    {
      "role": "user",
      "content": "What are the latest developments in AI?"
    }
  ],
  "web_search_options": {
    "enabled": true
  }
}

Message Storage

All messages (user and assistant) are automatically stored with:

Embeddings: For semantic search and context enrichment
Timestamps: For chronological ordering
Partition/Instance: For data organization
Trace IDs: For linking request/response pairs

Context Control

Control context enrichment via configuration:

# Adjust context size
reservoir config --set semantic_context_size=20
reservoir config --set recent_context_size=15

# View current settings
reservoir config --get semantic_context_size

Performance Considerations

Token Management

Reservoir automatically manages token limits for each model
Context is intelligently truncated when necessary
Priority given to most relevant and recent content

Caching

Embeddings are cached to avoid recomputation
Vector indices are optimized for fast similarity search
Connection pooling for database efficiency

Latency

Typical latency: 200-500ms for context enrichment
Parallel processing of semantic search and recent history
Optimized Neo4j queries for fast retrieval

The Chat Completions endpoint provides the full power of Reservoir's context enrichment while maintaining complete compatibility with existing OpenAI-based applications, making it easy to add conversational memory to any LLM application.

Search & Retrieval

Reservoir provides powerful search capabilities for finding relevant conversations and messages across your entire conversation history. The search system supports both keyword-based and semantic similarity searches, enabling you to discover related discussions even when they use different terminology.

Search Methods

Keyword Search

Traditional text-based search that finds exact matches or partial matches within message content.

CLI Usage:

# Basic keyword search
reservoir search "python programming"

# Search in specific partition
reservoir search --partition alice "machine learning"

Characteristics:

Fast and precise for exact term matches
Case-insensitive matching
Supports partial word matching
Best for finding specific technical terms or names

Semantic Search

Vector-based similarity search that finds conceptually related messages even when they use different words.

CLI Usage:

# Semantic search
reservoir search --semantic "machine learning concepts"

# Use RAG strategy (same as context enrichment)
reservoir search --link --semantic "database design"

Characteristics:

Finds conceptually similar content
Works across different terminology
Uses BGE-Large-EN-v1.5 embeddings
Powers Reservoir's context enrichment system

Search Options

Partitioning

Scope your search to specific organizational boundaries:

# Search in specific partition
reservoir search --partition alice "neural networks"

# Search in specific instance within partition
reservoir search --partition alice --instance coding "API design"

Deduplication

Remove duplicate or highly similar results:

# Remove duplicate results
reservoir search --deduplicate --semantic "error handling"

RAG Strategy

Use the same search strategy that powers context enrichment:

# Use advanced search with synapse expansion
reservoir search --link --semantic "software architecture"

The --link option:

Searches for semantically similar messages
Expands results using synapse relationships
Follows conversation threads
Deduplicates automatically
Limits results to most relevant matches

Search Implementation

Vector Similarity

Reservoir uses cosine similarity to find related messages:

Query Embedding: Your search term is converted to a vector using BGE-Large-EN-v1.5
Index Search: Neo4j's vector index finds similar message embeddings
Scoring: Results are ranked by similarity score (0.0 to 1.0)
Filtering: Results are filtered by partition/instance boundaries

Synapse Expansion

When using --link, the search expands beyond direct similarity:

Initial Search: Find semantically similar messages
Synapse Following: Explore connected messages via SYNAPSE relationships
Thread Discovery: Follow conversation threads and related discussions
Relevance Scoring: Combine similarity scores with relationship strength
Result Limiting: Return top matches within context limits

Example Queries

Finding Programming Discussions

# Find all Python-related conversations
reservoir search --semantic "python programming"

# Find specific error discussions
reservoir search "TypeError" 

# Find design pattern conversations
reservoir search --link --semantic "software design patterns"

Research and Analysis

# Find machine learning discussions
reservoir search --semantic "neural networks deep learning"

# Find database-related conversations
reservoir search --partition research --semantic "database optimization"

# Find recent discussions on a topic
reservoir view 50 | grep -i "kubernetes"

Cross-Conversation Discovery

# Find related discussions across all conversations
reservoir search --link --semantic "microservices architecture"

# Discover connections between topics
reservoir search --deduplicate --semantic "testing strategies"

Search Results Format

CLI Output

Search results include:

Timestamp: When the message was created
Partition/Instance: Organizational context
Role: User or assistant message
Content: The actual message text
Score: Similarity score (for semantic search)

JSON Format

When exported, search results follow the MessageNode structure:

{
  "trace_id": "abc123-def456",
  "partition": "alice",
  "instance": "coding",
  "role": "user",
  "content": "How do I implement error handling in async functions?",
  "timestamp": "2024-01-15T10:30:00Z",
  "embedding": [0.1, -0.2, 0.3, ...],
  "url": null
}

Integration with Context Enrichment

The search system directly powers Reservoir's context enrichment:

Automatic Search: Every user message triggers a semantic search
Context Building: Search results become conversation context
Relevance Filtering: Only high-quality matches (>0.85 similarity) are used
Token Management: Results are truncated to fit model token limits

Performance Considerations

Vector Index

Reservoir maintains optimized vector indices for fast search:

CREATE VECTOR INDEX embedding1536 
FOR (n:Embedding1536) ON (n.embedding) 
OPTIONS {
  indexConfig: {
    `vector.dimensions`: 1536,
    `vector.similarity_function`: 'cosine'
  }
}

Search Strategies

Keyword Search: Fastest for exact matches
Basic Semantic: Good balance of speed and relevance
RAG Strategy (--link): Most comprehensive but slower
Deduplication: Adds processing time but improves result quality

Optimization Tips

Use Specific Partitions: Reduces search space
Keyword for Exact Terms: Faster than semantic for specific names
Semantic for Concepts: Better for finding related ideas
Limit Result Count: Implicit in CLI, configurable in API

Advanced Usage

Combining with Other Commands

# Search and then view context
reservoir search --semantic "error handling" | head -5
reservoir view 10

# Search and ingest related information
echo "Related to error handling discussion" | reservoir ingest

# Export search results for analysis
reservoir search --semantic "API design" > api_discussions.txt

Scripting and Automation

#!/bin/bash
# Find and analyze topic discussions

TOPIC="$1"
echo "Searching for discussions about: $TOPIC"

# Semantic search with RAG strategy
reservoir search --link --semantic "$TOPIC" > "search_results_$TOPIC.txt"

# Count total discussions
TOTAL=$(wc -l < "search_results_$TOPIC.txt")
echo "Found $TOTAL related messages"

# Show recent activity
echo "Recent activity:"
reservoir view 20 | grep -i "$TOPIC" | head -3

The search system is designed to make your conversation history searchable and discoverable, turning your accumulated AI interactions into a valuable knowledge base that grows more useful over time.

Data Management

Reservoir provides comprehensive data management capabilities for backing up, migrating, and organizing your conversation data. The system supports full data export/import, individual message management, and flexible partitioning strategies.

Export and Import

Export All Data

Export your entire conversation history as JSON for backup or migration:

# Export all messages to stdout
reservoir export

# Save to file with timestamp
reservoir export > backup_$(date +%Y%m%d_%H%M%S).json

# Export and compress for storage
reservoir export | gzip > reservoir_backup.json.gz

Export Format: Each message is exported as a complete MessageNode with all metadata:

[
  {
    "trace_id": "550e8400-e29b-41d4-a716-446655440000",
    "partition": "default",
    "instance": "default", 
    "role": "user",
    "content": "How do I implement error handling in async functions?",
    "timestamp": "2024-01-15T10:30:00.000Z",
    "embedding": [0.123, -0.456, 0.789, ...],
    "url": null
  },
  {
    "trace_id": "550e8400-e29b-41d4-a716-446655440001",
    "partition": "default",
    "instance": "default",
    "role": "assistant", 
    "content": "Here are several approaches to error handling in async functions...",
    "timestamp": "2024-01-15T10:30:15.000Z",
    "embedding": [0.234, -0.567, 0.890, ...],
    "url": null
  }
]

Import Data

Import message data from JSON files:

# Import from a backup file
reservoir import backup_20240115.json

# Import from another Reservoir instance
reservoir import exported_conversations.json

# Import compressed backup
gunzip -c reservoir_backup.json.gz | reservoir import /dev/stdin

Import Behavior:

Validates JSON format and MessageNode structure
Preserves all metadata including timestamps and embeddings
Maintains partition/instance organization
Skips duplicate messages (based on trace_id)
Rebuilds relationships and synapses

Migration Workflows

Complete System Migration:

# On source system
reservoir export > full_backup.json

# Transfer file to new system
scp full_backup.json user@newserver:/path/to/reservoir/

# On destination system
reservoir import full_backup.json

# Verify migration
reservoir view 10
reservoir search --semantic "test query"

Selective Migration:

# Export from specific partition
reservoir export | jq '.[] | select(.partition=="alice")' > alice_messages.json

# Import to different partition (requires manual editing or processing)
# Edit JSON to change partition names, then import
reservoir import alice_messages.json

Message Management

Manual Message Ingestion

Add messages manually for testing, note-taking, or data entry:

# Add a user message
echo "How do I configure Neo4j for production?" | reservoir ingest

# Add to specific partition/instance
echo "Remember to update dependencies" | reservoir ingest --partition alice --instance notes

# Add assistant message
echo "Here's the production Neo4j configuration..." | reservoir ingest --role assistant

# Ingest from file
cat meeting_notes.txt | reservoir ingest --partition team --instance meetings

Use Cases:

Documentation: Add important information manually
Testing: Create test scenarios with known data
Migration: Import data from other systems
Notes: Add personal reminders or observations

Viewing Recent Data

Monitor recent activity and verify data integrity:

# View last 10 messages
reservoir view 10

# View from specific partition
reservoir view --partition alice 15

# View from specific instance
reservoir view --partition alice --instance coding 20

# Pipe to other tools for analysis
reservoir view 50 | grep -i "error" | wc -l

Partitioning Strategy

Organizational Structure

Reservoir uses a two-level organizational hierarchy:

Partition: High-level boundary (user, project, team)
Instance: Sub-boundary within partition (topic, session, category)

default/
├── default/          # General conversations
├── coding/           # Programming discussions  
└── research/         # Research and analysis

alice/
├── personal/         # Personal conversations
├── work/            # Work-related discussions
└── learning/        # Educational content

team/
├── meetings/        # Team meeting notes
├── planning/        # Project planning
└── retrospectives/  # Review sessions

Partition Management

Creating Partitions: Partitions are created automatically when first used:

# Create new partition by using it
echo "Starting new project discussions" | reservoir ingest --partition newproject

# Create instance within partition
echo "Technical architecture discussion" | reservoir ingest --partition newproject --instance architecture

Partition Benefits:

Isolation: Keep different contexts separate
Search Scoping: Limit searches to relevant content
Access Control: Enable future access restrictions
Organization: Maintain clean separation of concerns

Data Isolation

Partitions provide logical isolation:

Context Enrichment: Only includes messages from same partition/instance
Search: Can be scoped to specific partitions
Export: Can filter by partition (with additional tooling)
Privacy: Enables separation of personal/professional content

Data Integrity

Backup Strategies

Daily Backups:

#!/bin/bash
# Daily backup script

BACKUP_DIR="/backup/reservoir"
DATE=$(date +%Y%m%d)
TIMESTAMP=$(date +%H%M%S)

# Create backup directory
mkdir -p "$BACKUP_DIR/$DATE"

# Export data
reservoir export > "$BACKUP_DIR/$DATE/reservoir_$TIMESTAMP.json"

# Compress older backups
find "$BACKUP_DIR" -name "*.json" -mtime +7 -exec gzip {} \;

# Clean old backups (keep 30 days)
find "$BACKUP_DIR" -name "*.json.gz" -mtime +30 -delete

# Log backup
echo "$(date): Backup completed - $BACKUP_DIR/$DATE/reservoir_$TIMESTAMP.json" >> /var/log/reservoir_backup.log

Incremental Exports:

# Export recent messages (last 24 hours)
reservoir view 1000 | jq -r '.[] | select(.timestamp > "'$(date -d '1 day ago' -Iseconds)'")' > incremental_backup.json

Data Validation

Verify Data Integrity:

# Check message count
TOTAL_MESSAGES=$(reservoir export | jq length)
echo "Total messages: $TOTAL_MESSAGES"

# Verify embeddings
EMBEDDED_COUNT=$(reservoir export | jq '[.[] | select(.embedding != null)] | length')
echo "Messages with embeddings: $EMBEDDED_COUNT"

# Check partition distribution
reservoir export | jq -r '.[] | .partition' | sort | uniq -c

Recovery Procedures

Restore from Backup:

# Stop Reservoir (if running as service)
systemctl stop reservoir

# Clear existing data (WARNING: destructive)
# This requires manual Neo4j database clearing

# Import backup
reservoir import /backup/reservoir/20240115/reservoir_full.json

# Verify restoration
reservoir view 10
reservoir search --semantic "test"

# Restart service
systemctl start reservoir

Advanced Data Operations

Data Analysis

Export for Analysis:

# Export specific fields for analysis
reservoir export | jq -r '.[] | [.timestamp, .partition, .role, (.content | length)] | @csv' > message_stats.csv

# Analyze conversation patterns
reservoir export | jq -r '.[] | .partition' | sort | uniq -c | sort -nr

# Find most active time periods
reservoir export | jq -r '.[] | .timestamp[0:10]' | sort | uniq -c | sort -nr

Data Transformation

Format Conversion:

# Convert to CSV format
reservoir export | jq -r '.[] | [.timestamp, .partition, .instance, .role, .content] | @csv' > conversations.csv

# Extract just message content
reservoir export | jq -r '.[] | .content' > all_messages.txt

# Create markdown format
reservoir export | jq -r '.[] | "## " + .timestamp + " (" + .role + ")\n\n" + .content + "\n"' > conversations.md

Embedding Management

Replay Embeddings: When embedding models change or for data recovery:

# Replay embeddings for all messages
reservoir replay

# Replay for specific model/partition
reservoir replay bge-large-en-v15

# Monitor embedding progress
# (Check logs for embedding generation status)
tail -f /var/log/reservoir.log | grep -i embedding

Best Practices

Regular Maintenance

Schedule Regular Backups: Daily exports with compression
Monitor Disk Usage: Embeddings require significant storage
Validate Data Integrity: Regular checks for missing embeddings
Clean Old Logs: Rotate and archive log files
Test Recovery: Periodically test backup restoration

Storage Optimization

Compress Backups: Use gzip for long-term storage
Archive Old Data: Move historical data to cold storage
Monitor Neo4j Storage: Regular database maintenance
Embedding Efficiency: Consider embedding model size vs. quality

Security Considerations

Encrypt Backups: Sensitive conversation data should be encrypted
Access Controls: Limit access to export/import capabilities
Audit Trails: Log all data management operations
Data Retention: Define policies for data lifecycle management

Data management in Reservoir is designed to be straightforward while providing enterprise-grade capabilities for backup, migration, and organization of your conversation data.

Command Line Interface

Reservoir provides a comprehensive command-line interface for managing your conversation data, searching through message history, and configuring the system. This section covers all available commands and their usage.

Overview

Reservoir's CLI allows you to:

Start the proxy server
Search through conversations
Import and export conversation data
View recent messages
Ingest new messages manually
Configure system settings
Replay embeddings for existing data

Available Commands

`reservoir start`

Start the Reservoir proxy server.

reservoir start [OPTIONS]

Options:

-o, --ollama - Ollama mode which sets up on same default port as ollama useful for using as a proxy for clients that don't support setting a url
-h, --help - Print help
-V, --version - Print version

Examples:

# Start in normal mode
reservoir start

# Start in Ollama mode (uses port 11434)
reservoir start --ollama

`reservoir search`

Search messages by keyword or semantic similarity.

reservoir search [OPTIONS] <TERM>

Arguments:

<TERM> - The search term (keyword or semantic)

Options:

--semantic - Use semantic search instead of keyword search
-p, --partition <PARTITION> - Partition to search (defaults to "default")
-i, --instance <INSTANCE> - Instance to search (defaults to partition)
-l, --link - Use the same search strategy as RAG does when injecting into the model
-d, --deduplicate - Deduplicate first similarity results
-h, --help - Print help
-V, --version - Print version

Examples:

# Keyword search
reservoir search "python programming"

# Semantic search
reservoir search --semantic "machine learning concepts"

# Search in specific partition/instance
reservoir search --partition alice --instance coding "neural networks"

# Use RAG search strategy
reservoir search --link --semantic "database design"

# Deduplicate results
reservoir search --deduplicate --semantic "API design"

`reservoir export`

Export all message nodes as JSON.

reservoir export

Options:

-h, --help - Print help

Examples:

# Export all messages to stdout
reservoir export > my_conversations.json

# Export and view
reservoir export | jq '.[0]'

`reservoir import`

Import message nodes from a JSON file.

reservoir import <FILE>

Arguments:

<FILE> - Path to the JSON file to import

Options:

-h, --help - Print help
-V, --version - Print version

Examples:

# Import from a file
reservoir import my_conversations.json

# Import from a backup
reservoir import backup_2024_01_15.json

`reservoir view`

View last x messages in the default partition/instance.

reservoir view [OPTIONS] <COUNT>

Arguments:

<COUNT> - Number of messages to display

Options:

-p, --partition <PARTITION> - Partition to view (defaults to "default")
-i, --instance <INSTANCE> - Instance to view (defaults to partition)
-h, --help - Print help
-V, --version - Print version

Examples:

# View last 10 messages
reservoir view 10

# View messages from specific partition
reservoir view --partition alice 5

# View messages from specific instance
reservoir view --partition alice --instance coding 15

`reservoir ingest`

Ingest a message from stdin as a user MessageNode.

reservoir ingest [OPTIONS]

Options:

-p, --partition <PARTITION> - Partition to save the message in (defaults to "default")
-i, --instance <INSTANCE> - Instance to save the message in (defaults to partition)
--role <ROLE> - Role to assign to the message (defaults to "user")
-h, --help - Print help
-V, --version - Print version

Examples:

# Ingest a user message
echo "How do I implement a binary search tree?" | reservoir ingest

# Ingest to specific partition/instance
echo "What are design patterns?" | reservoir ingest --partition alice --instance coding

# Ingest as assistant message
echo "Here's how to implement a BST..." | reservoir ingest --role assistant

# Ingest from file
cat question.txt | reservoir ingest --partition research --instance ai

`reservoir config`

Set or get default configuration values with your config.toml.

reservoir config [OPTIONS]

Options:

-s, --set <SET> - Set a configuration value. Use the format key=value. reservoir config --set model=gpt-4-turbo
-g, --get <GET> - Get your current configuration value. reservoir config --get model
-h, --help - Print help
-V, --version - Print version

Examples:

# View current configuration
reservoir config --get semantic_context_size

# Set configuration value
reservoir config --set semantic_context_size=20

# Set Neo4j connection
reservoir config --set neo4j_uri=bolt://localhost:7687

`reservoir replay`

Replay embeddings process.

reservoir replay [MODEL]

Arguments:

[MODEL] - Partition to replay (defaults to "default")

Options:

-h, --help - Print help
-V, --version - Print version

Examples:

# Replay embeddings for default model
reservoir replay

# Replay for specific model
reservoir replay bge-large-en-v15

Common Workflows

Daily Usage

# Start the server
reservoir start

# View recent conversations
reservoir view 10

# Search for specific topics
reservoir search --semantic "machine learning"

# Add a note or question
echo "Remember to implement error handling" | reservoir ingest

Data Management

# Export all data for backup
reservoir export > backup_$(date +%Y%m%d).json

# Import previous backup
reservoir import backup_20240115.json

# View configuration
reservoir config --get semantic_context_size

Development and Testing

# Start in Ollama mode for local testing
reservoir start --ollama

# Search with debugging
reservoir search --link --deduplicate --semantic "API design"

# Replay embeddings after model changes
reservoir replay bge-large-en-v15

Configuration

The CLI respects configuration from:

Command-line arguments (highest priority)
Configuration file (~/.config/reservoir/reservoir.toml)
Environment variables
Default values (lowest priority)

See Environment Variables for detailed configuration options.

Error Handling

The CLI provides helpful error messages for common issues:

Connection errors: Check if Neo4j is running
Permission errors: Verify file permissions for import/export
Invalid arguments: Use --help for correct syntax
Configuration errors: Verify config file format

Integration with Scripts

The CLI is designed to work well in scripts and automation:

#!/bin/bash
# Backup and restart script

# Export current data
reservoir export > "backup_$(date +%Y%m%d_%H%M%S).json"

# Restart with fresh embeddings
reservoir replay

# Start the server
reservoir start

System Architecture

Reservoir is designed as a transparent proxy for OpenAI-compatible APIs, with a focus on capturing and enriching AI conversations. This section provides an overview of the system architecture and how components interact.

Request Processing Sequence

Reservoir intercepts your API calls, enriches them with relevant history, manages token limits, and then forwards them to the actual Language Model service. Here's the detailed sequence:

sequenceDiagram
    participant App
    participant Reservoir
    participant Neo4j
    participant LLM as OpenAI/Ollama

    App->>Reservoir: Request (e.g. /v1/chat/completions/$USER/my-application)
    Reservoir->>Reservoir: Check if last message exceeds token limit (Return error if true)
    Reservoir->>Reservoir: Tag with Trace ID + Partition
    Reservoir->>Neo4j: Store original request message(s)

    %% --- Context Enrichment Steps ---
    Reservoir->>Neo4j: Query for similar & recent messages
    Neo4j-->>Reservoir: Return relevant context messages
    Reservoir->>Reservoir: Inject context messages into request payload
    %% --- End Enrichment Steps ---

    Reservoir->>Reservoir: Check total token count & truncate if needed (preserving system/last messages)

    Reservoir->>LLM: Forward enriched & potentially truncated request
    LLM->>Reservoir: Return LLM response
    Reservoir->>Neo4j: Store LLM response message
    Reservoir->>App: Return LLM response

High-Level Architecture

flowchart TB
    Client(["Client App"]) -->|API Request| HTTPServer{{HTTP Server}}
    HTTPServer -->|Process Request| Handler[Request Handler]

    subgraph Handler Logic
        direction LR
        Handler_Start(Start) --> CheckInputTokens(Check Input Tokens)
        CheckInputTokens -- OK --> StoreRequest(Store Request)
        CheckInputTokens -- Too Long --> ReturnError(Return Error Response)
        StoreRequest --> QueryContext(Query Neo4j for Context)
        QueryContext --> InjectContext(Inject Context)
        InjectContext --> CheckTotalTokens(Check/Truncate Total Tokens)
        CheckTotalTokens --> ForwardRequest(Forward to LLM)
    end

    Handler -->|Store/Query| Neo4j[(Neo4j Database)]
    Handler -->|Forward/Receive| OpenAI([OpenAI/Ollama API])
    OpenAI --> Handler
    Handler -->|Return Response| HTTPServer
    HTTPServer -->|API Response| Client

    Config[/Env Vars/] --> HTTPServer
    Config --> Handler
    Config --> Neo4j

Core Components

1. Client Application

Your application making API calls to Reservoir. This could be:

A web application using the OpenAI JavaScript library
A Python script using the OpenAI Python library
A command-line tool like curl
Any application that can make HTTP requests

2. HTTP Server (Hyper/Tokio)

The HTTP server built on Rust's async ecosystem:

Receives requests on the configured port (default: 3017)
Routes based on URL path following the pattern /v1/partition/{partition}/instance/{instance}/chat/completions
Handles CORS for web applications
Manages concurrent requests efficiently using Tokio's async runtime

3. Request Handler

The core logic that processes each request:

Input Validation

Token size checking: Validates that the last message doesn't exceed token limits
Request format validation: Ensures the request follows OpenAI's API structure
Authentication: Forwards API keys to the appropriate provider

Context Management

Trace ID assignment: Each request gets a unique identifier for tracking
Partition/Instance extraction: Pulls organization parameters from the URL path
Message storage: Stores incoming messages in Neo4j with proper tagging

Context Enrichment

Historical context query: Searches Neo4j for relevant past conversations
Similarity matching: Uses vector embeddings to find semantically similar messages
Recency filtering: Includes recent messages from the same partition/instance
Context injection: Adds relevant context to the request payload

Token Management

Total token calculation: Counts tokens in the enriched message list
Smart truncation: Removes older context while preserving system prompts and latest messages
Provider-specific limits: Respects different token limits for different models

Request Forwarding

Provider routing: Automatically routes to the correct provider based on model name
Request forwarding: Sends the enriched request to the upstream LLM
Response handling: Processes and stores the LLM's response

Relationship Building

Synapse connections: Links semantically similar messages using vector similarity
Weak connection removal: Removes relationships with similarity scores below 0.85
Conversation threading: Maintains coherent conversation threads over time

4. Neo4j Database

The graph database that stores all conversation data:

Data Storage

MessageNode entities: Each message is stored as a node with properties
Partition/Instance tagging: Messages are tagged for proper organization
Vector embeddings: Semantic representations for similarity search
Temporal information: Timestamps for recency-based queries

Graph Relationships

Synapse relationships: Connect related messages across conversations
Conversation threads: Maintain sequential flow of discussions
Similarity scores: Weighted relationships based on semantic similarity

Query Capabilities

Vector similarity search: Find semantically similar messages
Temporal queries: Retrieve recent messages within time windows
Graph traversal: Navigate conversation relationships
Partition/Instance filtering: Scope queries to specific contexts

5. External LLM Services

Reservoir supports multiple AI providers:

OpenAI: GPT-4, GPT-4o, GPT-3.5-turbo, and specialized models
Ollama: Local models like Llama, Gemma, and custom models
Mistral AI: Mistral's cloud-hosted models
Google Gemini: Google's AI models
Custom providers: Any OpenAI-compatible API endpoint

6. Configuration Management

Environment-based configuration:

Database connection: Neo4j URI, credentials, and connection pooling
Server settings: Port, host, CORS configuration
API keys: Credentials for various AI providers
Provider endpoints: Custom URLs for different services
Token limits: Configurable limits for different models

Request Processing Flow

Request Arrival: Client sends a request to Reservoir's endpoint
URL Parsing: Extract partition and instance from the URL path
Input Validation: Check message format and token limits
Message Storage: Store the user's message in Neo4j
Context Retrieval: Query for relevant historical context
Context Enrichment: Inject relevant messages into the request
Token Management: Ensure the enriched request fits within limits
Provider Routing: Determine which AI provider to use
Request Forwarding: Send the enriched request to the AI provider
Response Processing: Receive and process the AI's response
Response Storage: Store the AI's response in Neo4j
Relationship Building: Create or update message relationships
Response Return: Send the response back to the client

Scalability Considerations

Horizontal Scaling

Stateless design: Each request is independent
Database connection pooling: Efficient resource utilization
Async processing: Non-blocking I/O for high concurrency

Vertical Scaling

Memory management: Efficient vector storage and retrieval
CPU optimization: Fast similarity calculations
Disk I/O: Optimized database queries and indexing

Performance Optimizations

Vector indexing: Fast similarity search in Neo4j
Connection pooling: Reuse database connections
Caching strategies: Cache frequently accessed data
Batching: Efficient bulk operations where possible

Security Architecture

Authentication

API key forwarding: Secure handling of provider credentials
No key storage: Reservoir doesn't store AI provider keys
Environment-based secrets: Secure configuration management

Data Privacy

Local storage: All conversation data stays on your infrastructure
No external logging: Conversation content never leaves your network
Configurable retention: Control how long data is stored

Access Control

Partition isolation: Conversations are isolated by partition/instance
URL-based permissions: Access control through URL structure
Network security: Configurable CORS and network policies

Monitoring and Observability

Logging

Request tracing: Unique trace IDs for each request
Error logging: Detailed error information for debugging
Performance metrics: Request timing and processing statistics

Health Checks

Database connectivity: Monitor Neo4j connection health
Provider availability: Check AI service availability
Resource utilization: Memory and CPU monitoring

This architecture provides a robust, scalable foundation for AI conversation management while maintaining transparency and compatibility with existing applications.

Data Model

Reservoir uses Neo4j as its graph database to store conversations and their relationships. This section provides a detailed overview of the data model, including nodes, relationships, and how they work together to enable intelligent conversation management.

Overview

The data model is designed around the concept of messages as nodes in a graph, with relationships that capture both the conversational flow and semantic similarities. This approach enables powerful querying capabilities for context enrichment and conversation analysis.

Nodes

MessageNode

Represents a single message in a conversation, whether from a user or an LLM assistant.

Property	Type	Description
`trace_id`	String	Unique identifier per request/response pair
`partition`	String	Logical namespace from URL, typically the system username (`$USER`)
`instance`	String	Specific context within partition, typically the application name
`role`	String	Role of the message (`user` or `assistant`)
`content`	String	The text content of the message
`timestamp`	DateTime	When the message was created (ISO 8601 format)
`embedding`	Vector	Vector representation of the message for similarity search
`url`	String	Optional URL associated with the message

Example MessageNode

CREATE (m:MessageNode {
    trace_id: "abc123-def456-ghi789",
    partition: "alice",
    instance: "code-assistant",
    role: "user",
    content: "How do I implement a binary search tree?",
    timestamp: "2024-01-15T10:30:00Z",
    embedding: [0.1, -0.2, 0.3, ...],
    url: null
})

Relationships

The data model uses two types of relationships to capture different aspects of conversation structure:

RESPONDED_WITH

Links a user message to its corresponding assistant response, preserving the original conversation flow.

Properties:

Direction: (User Message)-[:RESPONDED_WITH]->(Assistant Message)
Cardinality: One-to-one (each user message has exactly one assistant response)
Mutability: Immutable once created

Purpose:

Maintains conversation integrity
Enables reconstruction of original conversation threads
Provides audit trail for request/response pairs

SYNAPSE

Links semantically similar messages based on vector similarity, enabling cross-conversation context discovery.

Properties:

Direction: Bidirectional (similarity is symmetric)
Score: Float value representing similarity strength (0.0 to 1.0)
Threshold: Minimum score of 0.85 required for synapse creation
Mutability: Dynamic (can be created, updated, or removed)

Creation Rules:

Sequential Synapses: Initially created between consecutive messages in a conversation
Similarity Synapses: Created between messages with high semantic similarity (≥ 0.85)
Cross-Conversation: Can link messages from different conversations within the same partition/instance
Pruning: Synapses with scores below threshold are automatically removed

Example Synapse

(m1:MessageNode)-[:SYNAPSE {score: 0.92}]-(m2:MessageNode)

Graph Structure Example

┌─────────────────┐    RESPONDED_WITH   ┌─────────────────┐
│  User Message   │────────────────────→│Assistant Message│
│ "Explain BST"   │                     │ "A binary..."   │
└─────────────────┘                     └─────────────────┘
         │                                       │
         │ SYNAPSE                               │ SYNAPSE
         │ {score: 0.91}                         │ {score: 0.87}
         ▼                                       ▼
┌─────────────────┐    RESPONDED_WITH   ┌─────────────────┐
│  User Message   │────────────────────→│Assistant Message│
│ "How to code    │                     │ "Here's how..." │
│  tree search?"  │                     │                 │
└─────────────────┘                     └─────────────────┘

Real Conversation Graph Visualization

Here's an example of how conversations and their threads appear in practice, showing the synapse relationships that connect semantically related messages across different conversation flows:

Conversation Graph View

This visualization shows:

Message nodes representing individual user and assistant messages
RESPONDED_WITH relationships (direct conversation flow)
SYNAPSE relationships connecting semantically similar messages
Conversation threads formed by chains of related messages
Cross-conversation connections where topics are discussed in multiple conversations

The graph structure enables Reservoir to find relevant context from past conversations when enriching new requests, creating a rich conversational memory that spans multiple sessions and topics.

Vector Index

Reservoir maintains a vector index called messageEmbeddings in Neo4j for efficient similarity searches.

Index Configuration

CREATE VECTOR INDEX messageEmbeddings 
FOR (m:MessageNode) ON (m.embedding) 
OPTIONS {indexConfig: {
  `vector.dimensions`: 1536,
  `vector.similarity_function`: 'cosine'
}}

Similarity Search

The vector index enables fast cosine similarity searches:

CALL db.index.vector.queryNodes('messageEmbeddings', 10, $queryEmbedding)
YIELD node, score
WHERE node.partition = $partition AND node.instance = $instance
RETURN node, score
ORDER BY score DESC

Partitioning Strategy

Partition

Purpose: Top-level organization boundary
Typical Value: System username ($USER)
Scope: All messages for a specific user
Isolation: Messages from different partitions never interact

Instance

Purpose: Application-specific context within a partition
Typical Value: Application name (e.g., "code-assistant", "chat-app")
Scope: Specific use case or application context
Organization: Multiple instances can exist within a partition

Example Organization

Partition: "alice"
├── Instance: "code-assistant"
│   ├── Programming questions
│   └── Code review discussions
├── Instance: "research-helper"
│   ├── Literature reviews
│   └── Data analysis questions
└── Instance: "personal-chat"
    ├── General conversations
    └── Daily planning

Relationship Types: Fixed vs. Dynamic

Fixed Relationships

Characteristics:

Immutable once created
Preserve data integrity
Represent factual conversation structure

Examples:

MessageNode properties (once created, content doesn't change)
RESPONDED_WITH relationships (permanent conversation pairs)

Dynamic Relationships

Characteristics:

Mutable and adaptive
Support learning and optimization
Reflect current understanding of semantic relationships

Examples:

SYNAPSE relationships (can be created, updated, or removed)
Similarity scores (can be recalculated as algorithms improve)

Query Patterns

Context Enrichment Query

// Find recent and similar messages for context
MATCH (m:MessageNode)
WHERE m.partition = $partition 
  AND m.instance = $instance
  AND m.timestamp > $recentThreshold
WITH m
ORDER BY m.timestamp DESC
LIMIT 10

UNION

CALL db.index.vector.queryNodes('messageEmbeddings', 5, $queryEmbedding)
YIELD node, score
WHERE node.partition = $partition 
  AND node.instance = $instance
  AND score > 0.85
RETURN node, score
ORDER BY score DESC

Conversation Thread Reconstruction

// Reconstruct a conversation thread
MATCH (user:MessageNode {role: 'user'})-[:RESPONDED_WITH]->(assistant:MessageNode)
WHERE user.trace_id = $traceId
RETURN user, assistant
ORDER BY user.timestamp

Synapse Network Analysis

// Find highly connected messages (conversation hubs)
MATCH (m:MessageNode)-[s:SYNAPSE]-(related:MessageNode)
WHERE m.partition = $partition AND m.instance = $instance
WITH m, count(s) as connectionCount, avg(s.score) as avgScore
WHERE connectionCount > 3
RETURN m, connectionCount, avgScore
ORDER BY connectionCount DESC, avgScore DESC

Data Lifecycle

Message Storage

Ingestion: New messages are stored with embeddings
Indexing: Vector embeddings are indexed for similarity search
Relationship Creation: RESPONDED_WITH links are established
Synapse Building: Similar messages are connected via SYNAPSE relationships

Synapse Evolution

Initial Creation: Sequential synapses between consecutive messages
Similarity Detection: Cross-conversation synapses based on semantic similarity
Threshold Enforcement: Weak synapses (score < 0.85) are removed
Continuous Optimization: Relationships are updated as new messages arrive

Cleanup and Maintenance

Orphaned Relationships: Periodic cleanup of broken relationships
Index Optimization: Regular vector index maintenance
Storage Optimization: Archival of old messages based on retention policies

Performance Considerations

Indexing Strategy

Vector Index: Primary index for similarity searches
Partition/Instance Index: Composite index for scoped queries
Timestamp Index: Range queries for recent messages
Role Index: Fast filtering by message role

Query Optimization

Parameterized Queries: Use query parameters to enable plan caching
Result Limiting: Always limit result sets for performance
Selective Filtering: Apply partition/instance filters early
Vector Search Tuning: Optimize similarity thresholds and result counts

Scaling Considerations

Horizontal Partitioning: Distribute data across multiple Neo4j instances
Read Replicas: Use read replicas for query-heavy workloads
Connection Pooling: Efficient database connection management
Batch Operations: Use batch writes for bulk data operations

This data model provides a robust foundation for conversation storage and retrieval while maintaining flexibility for future enhancements and optimizations.

Context Enrichment

Context enrichment is Reservoir's core mechanism for providing intelligent, memory-aware LLM conversations. By automatically injecting relevant historical context and recent conversation history into each request, Reservoir gives LLM models a persistent memory that improves response quality and maintains conversational continuity across sessions.

Overview

When you send a message to Reservoir, the system automatically enhances your request with:

Semantically similar messages from past conversations (using vector similarity search)
Recent conversation history from the same partition/instance
Connected conversation threads through synapse relationships

This enriched context is injected into your request before forwarding it to the LLM provider, making the LLM aware of relevant past discussions.

Context Enrichment Process

1. Message Reception and Initial Processing

pub async fn handle_with_partition(
    partition: &str,
    instance: &str,
    whole_body: Bytes,
) -> Result<Bytes, Error> {
    let json_string = String::from_utf8_lossy(&whole_body).to_string();
    let chat_request_model = ChatRequest::from_json(json_string.as_str()).expect("Valid JSON");
    let model_info = ModelInfo::new(chat_request_model.model.clone());

    let trace_id = Uuid::new_v4().to_string();
    let service = ChatRequestService::new();

When a request arrives:

A unique trace ID is generated for tracking
The request is parsed and validated
Model information is extracted to determine token limits

2. Embedding Generation

    let search_term = last_message.content.as_str();
    get_last_message_in_chat_request(&chat_request_model)?;

    info!("Using search term: {}", search_term);
    let embedding_info = EmbeddingInfo::with_fastembed("bge-large-en-v15");
    let embeddings = get_embeddings_for_txt(search_term, embedding_info.clone()).await?;

    let context_size = config::get_context_size();

The last user message is used as the search term to generate vector embeddings using the BGE-Large-EN-v1.5 model. This embedding represents the semantic meaning of the current query.

3. Semantic Context Retrieval

    let similar = get_related_messages_with_strategy(
        embeddings,
        &embedding_info,
        partition,
        instance,
        context_size,
    )
    .await?;

Using the generated embedding, Reservoir searches for semantically similar messages from past conversations within the same partition/instance. The search strategy includes:

Vector Similarity Search

    let query_string = format!(
        r#"
                CALL db.index.vector.queryNodes(
                    '{}',
                    $topKExtended,
                    $embedding
                ) YIELD node, score
                WITH node, score
                WHERE node.partition = $partition
                  AND node.instance = $instance
                RETURN node.partition AS partition,
                       node.instance AS instance,
                       node.embedding AS embedding,
                       node.model AS model,
                       id(node) AS id,
                       score
                ORDER BY score DESC
                "#,
        embedding_info.get_index_name()
    );

Synapse Expansion

pub async fn get_related_messages_with_strategy(
    embedding: Vec<f32>,
    embedding_info: &EmbeddingInfo,
    partition: &str,
    instance: &str,
    top_k: usize,
) -> Result<Vec<MessageNode>, Error> {
    let similar_messages =
        get_most_similar_messages(embedding, embedding_info, partition, instance, top_k).await?;
    let mut found_messages = vec![];
    for message in similar_messages.clone() {
        let mut connected = get_nodes_connected_by_synapses(connect, &message).await?;
        if found_messages.len() > top_k * 3 {
            break;
        }
        if connected.len() > 2 {
            found_messages.append(connected.as_mut());
        }
        found_messages = deduplicate_message_nodes(found_messages);
    }

    Ok(found_messages.into_iter().take(top_k).collect())
}

The system expands the context by following synapse relationships - connections between messages that are semantically similar (cosine similarity > 0.85).

4. Recent History Retrieval

    let last_messages = get_last_messages_for_partition_and_instance(
        connect,
        partition.to_string(),
        instance.to_string(),
        LAST_MESSAGES_LIMIT,
    )
    .await
    .unwrap_or_else(|e| {
        error!("Error finding last messages: {}", e);
        Vec::new()
    });

Retrieves the most recent 15 messages from the same partition/instance to provide immediate conversational context.

5. Context Injection

let mut enriched_chat_request =
        enrich_chat_request(similar, last_messages, &chat_request_model);
truncate_messages_if_needed(&mut enriched_chat_request.messages, model_info.input_tokens);

The enrich_chat_request function combines all context sources:

pub fn enrich_chat_request(
    similar_messages: Vec<MessageNode>,
    mut last_messages: Vec<MessageNode>, // Add `mut` here
    chat_request: &ChatRequest,
) -> ChatRequest {
    let mut chat_request = chat_request.clone();

    let semantic_prompt = r#"The following is the result of a semantic search 
        of the most related messages by cosine similarity to previous 
        conversations"#;
    let recent_prompt = r#"The following are the most recent messages in the 
        conversation in chronological order"#;

    last_messages.sort_by(|a, b| a.timestamp.cmp(&b.timestamp));

    let mut enrichment_block = Vec::new();

    enrichment_block.push(Message {
        role: "system".to_string(),
        content: semantic_prompt.to_string(),
    });
    enrichment_block.extend(similar_messages.iter().map(MessageNode::to_message));
    enrichment_block.push(Message {
        role: "system".to_string(),
        content: recent_prompt.to_string(),
    });
    enrichment_block.extend(last_messages.iter().map(MessageNode::to_message));

    enrichment_block.retain(|m| !m.content.is_empty());

    let insert_index = if chat_request
        .messages
        .first()
        .is_some_and(|m| m.role == "system")
    {
        1
    } else {
        0
    };

    // Insert enrichment block
    chat_request
        .messages
        .splice(insert_index..insert_index, enrichment_block);
    chat_request
}

The enrichment process:

Creates descriptive system prompts to explain the context
Adds semantically similar messages with explanation
Adds recent chronological history with explanation
Inserts the enrichment block after any existing system message
Filters out empty messages

6. Token Management and Truncation

truncate_messages_if_needed(&mut enriched_chat_request.messages, model_info.input_tokens);

The enriched request may exceed the model's token limits. The truncation algorithm:

pub fn truncate_messages_if_needed(messages: &mut Vec<Message>, limit: usize) {
    let mut current_tokens = count_chat_tokens(messages);
    info!("Current token count: {}", current_tokens);

    if current_tokens <= limit {
        return; // No truncation needed
    }

    info!(
        "Token count ({}) exceeds limit ({}), truncating...",
        current_tokens, limit
    );

    // Identify indices of system messages and the last message
    let system_message_indices: HashSet<usize> = messages
        .iter()
        .enumerate()
        .filter(|(_, m)| m.role == "system")
        .map(|(i, _)| i)
        .collect();
    let last_message_index = messages.len().saturating_sub(1); // Index of the last message

    // Start checking for removal from the first message
    let mut current_index = 0;

    while current_tokens > limit && current_index < messages.len() {
        // Check if the current index is a system message or the last message
        if system_message_indices.contains(&current_index) || current_index == last_message_index {
            // Skip this message, move to the next index
            current_index += 1;
            continue;
        }

        // If it's safe to remove (not system, not the last message)
        if messages.len() > 1 {
            // Ensure we don't remove the only message left (shouldn't happen here)
            info!(
                "Removing message at index {}: Role='{}', Content='{}...'",
                current_index,
                messages[current_index].role,
                messages[current_index]
                    .content
                    .chars()
                    .take(30)
                    .collect::<String>()
            );
            messages.remove(current_index);
            // Don't increment current_index, as removing shifts subsequent elements down.
            // Recalculate tokens and update system/last indices if needed (though less efficient)
            // For simplicity here, we just recalculate tokens. A more optimized approach
            // might update indices, but given the context size, recalculating tokens is okay.
            current_tokens = count_chat_tokens(messages);
            // Re-evaluate system_message_indices and last_message_index is safer if indices change significantly,
            // but let's stick to the simpler approach for now. If performance becomes an issue, optimize this.
        } else {
            // Safety break: Should not be able to remove the last message due to the check above.
            error!("Warning: Truncation stopped unexpectedly.");
            break;
        }
    }

    info!("Truncated token count: {}", current_tokens);
}

The truncation algorithm preserves:

All system messages (including enrichment context)
The user's current/last message
Removes older context messages if needed

7. Response Storage and Synapse Building

After receiving the LLM's response:

let message_node = chat_response.choices.first().unwrap().message.clone();
let embedding =
    get_embeddings_for_txt(message_node.content.as_str(), embedding_info.clone()).await?;
let message_node = MessageNode::from_message(
    &message_node,
    trace_id.as_str(),
    partition,
    instance,
    embedding,
);
save_message_node(connect, &message_node, &embedding_info)
    .await
    .expect("Failed to save message node");

connect_synapses(connect)
    .await
    .expect("Failed to connect synapses");

The LLM's response is stored with its own embedding
Synapses (semantic connections) are built between messages
The system continuously builds a knowledge graph of related conversations

Context Architecture Flow

flowchart TD
    A["User Request Arrives"] --> B["Generate Trace ID & Parse Request"]
    B --> C["Extract Last User Message"]
    C --> D["Generate Embedding<br/>(BGE-Large-EN-v1.5)"]
    
    %% Parallel context retrieval
    D --> E["Semantic Search"]
    D --> F["Recent History Query"]
    
    E --> E1["Vector Similarity Search<br/>(Neo4j Index)"]
    E1 --> E2["Expand via Synapses<br/>(Related Conversations)"]
    E2 --> E3["Deduplicate Messages"]
    
    F --> F1["Get Last 15 Messages<br/>(Same Partition/Instance)"]
    F1 --> F2["Sort by Timestamp"]
    
    %% Context assembly
    E3 --> G["Assemble Context Block"]
    F2 --> G
    
    G --> G1["Add Semantic Context<br/>'The following is semantic search...'"]
    G1 --> G2["Add Similar Messages"]
    G2 --> G3["Add Recent Context<br/>'The following are recent messages...'"]
    G3 --> G4["Add Recent Messages"]
    
    %% Context injection
    G4 --> H["Inject Context into Request"]
    H --> H1{"Check if System Message Exists"}
    H1 -->|Yes| H2["Insert after System Message"]
    H1 -->|No| H3["Insert at Beginning"]
    
    H2 --> I["Token Management"]
    H3 --> I
    
    %% Token management
    I --> I1["Count Total Tokens"]
    I1 --> I2{"Exceeds Token Limit?"}
    I2 -->|No| J["Send to AI Provider"]
    I2 -->|Yes| I3["Smart Truncation"]
    
    I3 --> I4["Preserve System Messages"]
    I4 --> I5["Preserve Last User Message"]
    I5 --> I6["Remove Older Context"]
    I6 --> I7["Recalculate Tokens"]
    I7 --> I2
    
    %% AI interaction
    J --> K["AI Provider Response"]
    K --> L["Store Response"]
    
    %% Post-processing
    L --> L1["Generate Response Embedding"]
    L1 --> L2["Save to Neo4j with Trace ID"]
    L2 --> L3["Link User-Assistant Messages"]
    L3 --> M["Build Synapses"]
    
    M --> M1["Calculate Similarity Scores<br/>(Cosine Similarity)"]
    M1 --> M2["Create SYNAPSE Relationships<br/>(Score > 0.85)"]
    M2 --> M3["Remove Weak Synapses<br/>(Score < 0.85)"]
    
    M3 --> N["Return Enriched Response"]
    
    %% Styling
    classDef inputStep fill:#e1f5fe
    classDef processStep fill:#f3e5f5
    classDef storageStep fill:#e8f5e8
    classDef aiStep fill:#fff3e0
    classDef outputStep fill:#fce4ec
    
    class A,C inputStep
    class B,D,E,E1,E2,E3,F,F1,F2,G,G1,G2,G3,G4,H,H1,H2,H3,I,I1,I2,I3,I4,I5,I6,I7 processStep
    class L,L1,L2,L3,M,M1,M2,M3 storageStep
    class J,K aiStep
    class N outputStep

Key Configuration Parameters

Context Size

pub fn get_context_size() -> usize {
    get_config().semantic_context_size.unwrap_or(15)
}

The semantic context size (default: 15) determines how many semantically similar messages are retrieved and potentially included in the context.

Recent Messages Limit

const LAST_MESSAGES_LIMIT: usize = 15;

The system retrieves up to 15 most recent messages from the same partition/instance for chronological context.

Embedding Model

let embedding_info = EmbeddingInfo::with_fastembed("bge-large-en-v15");

Reservoir by default uses a local instace of BGE-Large-EN-v1.5 for generating embeddings, for providing high-quality semantic representations.

Synapse Threshold

MATCH (m1:MessageNode)-[r:SYNAPSE]->(m2:MessageNode)
WHERE r.score < 0.85
DELETE r

Only relationships with cosine similarity scores above 0.85 are maintained as synapses, ensuring high-quality semantic connections.

Key Concepts

Partitions and Instances

Context is scoped to specific partition/instance combinations, allowing for:

Organizational separation: Different teams or projects can have isolated contexts
Application isolation: Multiple applications can use the same Reservoir instance without cross-contamination
User-specific contexts: Individual users can maintain separate conversation histories

Synapses

Synapses are semantic relationships between messages that:

Connect related conversations across different sessions
Build over time as the system learns from interactions
Self-organize the knowledge graph based on content similarity
Get pruned automatically when relationships are too weak (< 0.85 similarity)

Trace IDs

Every request gets a unique trace ID that:

Links user messages to LLM responses within the same conversation turn
Enables conversation threading and relationship building
Provides audit trails for debugging and analysis
Supports parallel processing of multiple simultaneous requests

System Context Compression

pub fn compress_system_context(messages: &[Message]) -> Vec<Message> {
    let first_index = messages.iter().position(|m| m.role == "system");
    let last_index = messages.iter().rposition(|m| m.role == "system");

    if let (Some(first), Some(last)) = (first_index, last_index) {
        if first != 0 || first == last {
            return messages.to_vec();
        }

        let mut compressed = vec![messages[0].clone()];

        for item in messages.iter().take(last + 1).skip(first + 1) {
            compressed[0].content += &format!("\n{}", message_to_string(item));
        }

        compressed.extend_from_slice(&messages[last + 1..]);
        compressed
    } else {
        messages.to_vec()
    }
}

Multiple system messages (including enrichment context) are compressed into a single system message to optimize token usage while preserving all contextual information.

Benefits of Context Enrichment

Conversational Continuity: LLM maintains awareness of past discussions across sessions
Semantic Understanding: Related topics are automatically surfaced even when not explicitly mentioned
Multi-Session Learning: Knowledge accumulates over time, improving response quality
Cross-Model Memory: Context persists when switching between different LLM providers
Intelligent Prioritization: Most relevant historical context is prioritized while respecting token limits
Automatic Organization: The system builds its own knowledge graph without manual intervention

Performance Considerations

Vector Indexing: Neo4j's vector indices provide sub-second similarity search even with large conversation histories
Parallel Processing: Semantic search and recent history retrieval happen concurrently
Smart Truncation: Context is intelligently trimmed to fit model limits while preserving essential information
Synapse Pruning: Weak connections are automatically removed to maintain graph quality
Token Optimization: System messages are compressed to maximize available context within token limits

Conversation Threads (Synapses)

Synapses are Reservoir's intelligent connection system that links semantically related messages across different conversations. Unlike traditional conversation threads that follow chronological order, synapses create a web of connections based on semantic similarity, enabling cross-conversation context discovery and knowledge building.

What are Synapses?

Synapses are bidirectional relationships between MessageNodes that represent semantic similarity. They enable Reservoir to:

Connect related discussions across different conversations
Build knowledge networks from accumulated conversations
Enable context jumping between related topics
Create conversational memory that spans sessions

How Synapses Work

Similarity Calculation

Synapses are created based on vector similarity between message embeddings:

Embedding Generation: Each message is converted to a vector using BGE-Large-EN-v1.5
Similarity Scoring: Cosine similarity is calculated between message vectors
Threshold Filtering: Only connections with similarity ≥ 0.85 become synapses
Bidirectional Links: Synapses work in both directions (A ↔ B)

Synapse Creation Process

flowchart TD
    A["New Message Arrives"] --> B["Generate Embedding"]
    B --> C["Find Similar Messages"]
    C --> D["Calculate Similarity Scores"]
    D --> E{"Score ≥ 0.85?"}
    E -->|Yes| F["Create SYNAPSE Relationship"]
    E -->|No| G["Skip Connection"]
    F --> H["Store Score and Model Info"]
    H --> I["Enable Cross-Conversation Context"]
    G --> I

Sequential vs. Semantic Synapses

Sequential Synapses: Connect consecutive messages in the same conversation

// Messages in same conversation thread
(msg1)-[:SYNAPSE {score: 0.95, model: "embedding1536"}]-(msg2)

Semantic Synapses: Connect similar messages from different conversations

// Messages from different conversations with similar content
(msg_python_q1)-[:SYNAPSE {score: 0.88, model: "embedding1536"}]-(msg_python_q2)

Synapse Properties

Score

Represents the semantic similarity strength between two messages:

Range: 0.0 to 1.0 (higher is more similar)
Threshold: Minimum 0.85 for synapse creation
Calculation: Cosine similarity between embedding vectors
Update: Can be recalculated as models improve

Model

Indicates which embedding model was used for similarity calculation:

Current Default: "embedding1536" (BGE-Large-EN-v1.5)
Purpose: Enables model-specific synapse management
Future-Proofing: Supports multiple embedding models

Example Synapse Relationship

(message1:MessageNode)-[:SYNAPSE {
    score: 0.92,
    model: "embedding1536"
}]-(message2:MessageNode)

Synapse Network Examples

Programming Discussion Network

"How do I handle errors in Python?"
    ↓ SYNAPSE (0.91)
"What's the best way to catch exceptions?"
    ↓ SYNAPSE (0.87)
"Try/except blocks best practices"
    ↓ SYNAPSE (0.89)
"Error handling in async functions"

Cross-Topic Connections

"Database optimization techniques"
    ↓ SYNAPSE (0.86)
"Slow query performance issues"
    ↓ SYNAPSE (0.88)
"Index design for better performance"

Synapse Management

Automatic Creation

Synapses are created automatically during conversation processing:

// Simplified creation logic
if similarity_score >= 0.85 {
    create_synapse(message1, message2, similarity_score, "embedding1536");
}

Pruning Low-Quality Synapses

Weak connections are automatically removed to maintain network quality:

// Remove synapses below threshold
MATCH (m1:MessageNode)-[r:SYNAPSE]->(m2:MessageNode)
WHERE r.score < 0.85
DELETE r

Synapse Evolution

Synapses can be updated as the system learns:

Score Updates: Recalculate similarity with improved models
Model Migration: Update synapses when switching embedding models
Network Optimization: Remove redundant or weak connections

Using Synapses for Context

RAG Strategy with Synapses

When using --link search strategy, Reservoir leverages synapses:

# Use synapse network for enhanced search
reservoir search --link --semantic "error handling"

Process:

Find semantically similar messages
Follow SYNAPSE relationships to connected messages
Explore conversation threads via synapse networks
Deduplicate and rank results
Return most relevant connected discussions

Context Enrichment

Synapses enable intelligent context building:

// Context enrichment query using synapses
MATCH (query_msg:MessageNode)-[:SYNAPSE*1..3]-(related:MessageNode)
WHERE query_msg.content CONTAINS "database"
  AND related.partition = $partition
  AND related.instance = $instance
RETURN related
ORDER BY related.timestamp DESC
LIMIT 10

Synapse Network Analysis

Finding Conversation Hubs

Identify messages that are highly connected (conversation hubs):

# CLI command to export and analyze
reservoir export | jq -r '.[].content' > messages.txt

# Or via Neo4j query
MATCH (m:MessageNode)-[s:SYNAPSE]-(related:MessageNode)
WITH m, count(s) as connectionCount, avg(s.score) as avgScore
WHERE connectionCount > 5
RETURN m.content, connectionCount, avgScore
ORDER BY connectionCount DESC

Topic Clustering

Synapses naturally create topic clusters:

Cluster 1: Web Development
├── "React component best practices" (8 connections)
├── "JavaScript async patterns" (6 connections)
└── "CSS flexbox layouts" (4 connections)

Cluster 2: Database Design
├── "SQL query optimization" (7 connections)
├── "Database normalization" (5 connections)
└── "Index strategy for performance" (3 connections)

Performance Considerations

Synapse Creation Overhead

Computation: Vector similarity calculation for each new message
Storage: Additional relationships in Neo4j graph
Indexing: Maintenance of vector indices

Optimization Strategies

Batch Processing: Create synapses in batches during low-usage periods
Threshold Tuning: Adjust similarity threshold based on use case
Network Pruning: Regular cleanup of weak or obsolete synapses
Model Efficiency: Balance embedding quality vs. computation cost

Advanced Synapse Features

Multi-Hop Connections

Synapses enable multi-hop context discovery:

// Find messages connected within 3 hops
MATCH path=(start:MessageNode)-[:SYNAPSE*1..3]-(end:MessageNode)
WHERE start.content CONTAINS "machine learning"
RETURN path, length(path)
ORDER BY length(path)

Conversation Path Finding

Discover how topics connect across conversations:

// Find shortest path between two topics
MATCH path=shortestPath(
    (topic1:MessageNode {content: "Python async"})-[:SYNAPSE*]-(topic2:MessageNode {content: "Error handling"})
)
RETURN path

Synapse-Based Recommendations

Use synapse networks to suggest related topics:

# Find related discussions
reservoir search --link --semantic "current topic"

# Or get synapse-connected messages directly
echo "What related topics should I explore?" | reservoir ingest
# Context will include synapse-connected discussions

Troubleshooting Synapses

Common Issues

Too Many Synapses: Lower the similarity threshold
Too Few Synapses: Check embedding quality and threshold
Irrelevant Connections: Review similarity calculation method
Performance Issues: Implement batch processing

Diagnostic Commands

# View synapse statistics
reservoir export | jq '[.[] | select(.role=="user")] | length'

# Check similarity scores distribution
# (Requires Neo4j query access)

Synapse Replay

Rebuild synapse network when needed:

# Replay embeddings and rebuild synapses
reservoir replay

# This will:
# 1. Recalculate embeddings for all messages
# 2. Rebuild synapse relationships
# 3. Update similarity scores
# 4. Prune weak connections

Future Enhancements

Planned Features

Weighted Synapses: Consider recency and conversation importance
Topic-Aware Synapses: Enhanced similarity based on topic detection
Hierarchical Synapses: Multi-level relationship strengths
Synapse Analytics: Dashboard for network visualization

Customization Options

Custom Similarity Functions: Beyond cosine similarity
Domain-Specific Models: Specialized embeddings for specific fields
User-Defined Thresholds: Per-partition similarity thresholds
Manual Synapse Management: User-controlled connection creation

Synapses transform Reservoir from a simple conversation store into an intelligent knowledge network that grows more valuable with each interaction, creating a personalized LLM assistant with genuine conversational memory.

Multi-Provider Support

Reservoir supports multiple AI providers through its flexible routing system. This allows you to use different AI models seamlessly while maintaining conversation context and history across all providers.

Supported Providers

OpenAI

Models: GPT-4, GPT-4o, GPT-4o-mini, GPT-3.5-turbo, GPT-4o-search-preview
API Key Required: Yes (OPENAI_API_KEY)
Endpoint: https://api.openai.com/v1/chat/completions
Features: Full feature support, web search capabilities

Ollama

Models: llama3.2, gemma3, and any locally installed models
API Key Required: No
Endpoint: http://localhost:11434/v1/chat/completions
Features: Local inference, privacy-focused, custom model support

Mistral AI

Models: mistral-large-2402, mistral-medium, mistral-small
API Key Required: Yes (MISTRAL_API_KEY)
Endpoint: https://api.mistral.ai/v1/chat/completions
Features: European AI provider, competitive performance

Google Gemini

Models: gemini-2.0-flash, gemini-2.5-flash-preview-05-20
API Key Required: Yes (GEMINI_API_KEY)
Endpoint: Custom Google AI endpoint
Features: Google's latest AI models, multimodal capabilities

Custom Providers

Models: Any model name not explicitly configured
Default Routing: Routes to Ollama by default
Configuration: Set custom endpoints via environment variables

Automatic Model Routing

Reservoir automatically determines which provider to use based on the model name in your request:

{
  "model": "gpt-4",           // → Routes to OpenAI
  "model": "llama3.2",        // → Routes to Ollama
  "model": "mistral-large",   // → Routes to Mistral
  "model": "gemini-2.0-flash" // → Routes to Google
}

Configuration

Environment Variables

Set provider endpoints and API keys:

# API Keys
export OPENAI_API_KEY="sk-your-openai-key"
export MISTRAL_API_KEY="your-mistral-key"
export GEMINI_API_KEY="your-gemini-key"

# Custom Endpoints (optional)
export RSV_OPENAI_BASE_URL="https://api.openai.com/v1/chat/completions"
export RSV_OLLAMA_BASE_URL="http://localhost:11434/v1/chat/completions"
export RSV_MISTRAL_BASE_URL="https://api.mistral.ai/v1/chat/completions"

Provider-Specific Features

OpenAI Features

Web Search: Available with gpt-4o-search-preview
Function Calling: Supported on compatible models
Vision: GPT-4o supports image inputs
JSON Mode: Structured output support

Example with web search:

curl "http://localhost:3017/partition/$USER/instance/research/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4o-search-preview",
        "messages": [{"role": "user", "content": "Latest AI developments"}],
        "web_search_options": {
            "enabled": true,
            "max_results": 5
        }
    }'

Ollama Features

Local Models: No API key required
Privacy: Data never leaves your machine
Custom Models: Load any compatible model
Performance: Direct local inference

Example with local model:

curl "http://localhost:3017/partition/$USER/instance/local/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3.2",
        "messages": [{"role": "user", "content": "Explain quantum computing"}]
    }'

Multi-Provider Workflows

Seamless Model Switching

You can switch between providers within the same conversation while maintaining context:

import os
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3017/v1/partition/myuser/instance/research",
    api_key=os.environ.get("OPENAI_API_KEY")
)

# Start with OpenAI
response1 = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain neural networks"}]
)

# Continue with Ollama (context is preserved)
response2 = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "What did we just discuss?"}]
)

# Switch to Mistral (still has context)
response3 = client.chat.completions.create(
    model="mistral-large-2402",
    messages=[{"role": "user", "content": "How does this relate to AI safety?"}]
)

Provider-Specific Use Cases

Development Workflow

# Use Ollama for quick local testing
curl -d '{"model": "llama3.2", "messages": [...]}' localhost:3017/...

# Use OpenAI for production queries
curl -d '{"model": "gpt-4", "messages": [...]}' localhost:3017/...

# Use Mistral for European compliance
curl -d '{"model": "mistral-large", "messages": [...]}' localhost:3017/...

Error Handling

Reservoir provides consistent error handling across all providers:

Common Error Responses

{
  "error": {
    "type": "invalid_request_error",
    "message": "Invalid model specified",
    "code": "model_not_found"
  }
}

Provider-Specific Errors

OpenAI: Rate limits, quota exceeded, invalid API key
Ollama: Model not found, service unavailable
Mistral: Authentication errors, model access restrictions
Gemini: API quota limits, geographic restrictions

Performance Considerations

Provider Comparison

Provider	Latency	Cost	Privacy	Features
OpenAI	Medium	High	Cloud	Most comprehensive
Ollama	Low	Free	Local	Basic, customizable
Mistral	Medium	Medium	Cloud	European focus
Gemini	Medium	Medium	Cloud	Google integration

Optimization Tips

Use Ollama for development: Faster iteration, no API costs
Use OpenAI for production: Most reliable, feature-rich
Use Mistral for compliance: European data residency
Cache responses: Reduce API calls and costs

Custom Provider Integration

To add a new OpenAI-compatible provider:

Set the endpoint URL:

export RSV_CUSTOM_BASE_URL="https://api.custom-provider.com/v1/chat/completions"

Configure model routing (if needed):

#![allow(unused)]
fn main() {
// In your configuration
match model_name {
    "custom-model" => "custom-provider",
    _ => "default-provider"
}
}

Test the integration:

curl "http://localhost:3017/partition/$USER/instance/test/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $CUSTOM_API_KEY" \
    -d '{"model": "custom-model", "messages": [...]}'

Future Enhancements

Planned improvements for multi-provider support:

Load Balancing: Distribute requests across multiple providers
Failover: Automatic fallback to backup providers
Cost Optimization: Route to cheapest provider based on request
Model Capabilities: Automatic routing based on required features
Custom Routing Rules: User-defined routing logic

Troubleshooting

Provider Connection Issues

Check provider availability:

# OpenAI
curl https://api.openai.com/v1/models -H "Authorization: Bearer $OPENAI_API_KEY"

# Ollama
curl http://localhost:11434/api/tags

# Mistral
curl https://api.mistral.ai/v1/models -H "Authorization: Bearer $MISTRAL_API_KEY"

Common solutions:

Verify API keys are correctly set
Check network connectivity
Ensure provider services are running
Validate model names and availability

Multi-provider support makes Reservoir a flexible foundation for AI applications, allowing you to choose the best provider for each use case while maintaining conversation continuity.

Token Management

Reservoir intelligently manages token limits to ensure optimal context enrichment while staying within model constraints. The system automatically calculates token usage, prioritizes the most relevant context, and truncates content when necessary to fit within API limits.

Context Token Management

Automatic Context Sizing

Reservoir dynamically adjusts context size based on:

Model Token Limits: Respects each model's maximum context window
Content Priority: Prioritizes most relevant and recent context
Message Truncation: Intelligently cuts content when limits are exceeded
Reserve Allocation: Maintains buffer for user input and model response

Token Calculation

The system estimates token usage using standard approximations:

English Text: ~4 characters per token
Code Content: ~3 characters per token (more tokens due to syntax)
Special Characters: Variable token usage
Embeddings: Not included in context token count

Context Building Strategy

flowchart TD
    A["User Message Arrives"] --> B["Calculate Available Tokens"]
    B --> C["Get Semantic Context"]
    C --> D["Get Recent History"]
    D --> E["Combine Context Sources"]
    E --> F{"Within Token Limit?"}
    F -->|Yes| G["Use Full Context"]
    F -->|No| H["Prioritize and Truncate"]
    H --> I["Recent Messages Priority"]
    I --> J["High Similarity Priority"]  
    J --> K["Truncate Oldest/Lowest Score"]
    K --> G
    G --> L["Send to Model"]

Token Limits by Model

OpenAI Models

Model	Context Window	Reservoir Reserve	Available for Context
GPT-3.5-turbo	4,096 tokens	1,024 tokens	~3,000 tokens
GPT-4	8,192 tokens	2,048 tokens	~6,000 tokens
GPT-4-turbo	128,000 tokens	8,000 tokens	~120,000 tokens
GPT-4o	128,000 tokens	8,000 tokens	~120,000 tokens

Local Models (Ollama)

Model	Context Window	Reservoir Reserve	Available for Context
Llama 3.1 8B	32,768 tokens	2,048 tokens	~30,000 tokens
Llama 3.1 70B	32,768 tokens	2,048 tokens	~30,000 tokens
Mistral 7B	32,768 tokens	2,048 tokens	~30,000 tokens
CodeLlama	16,384 tokens	1,024 tokens	~15,000 tokens

Context Prioritization

Priority Order

When token limits are exceeded, Reservoir prioritizes context in this order:

User's Current Message: Always included (highest priority)
Recent History: Last 15 messages from same partition/instance
High Similarity Matches: Messages with similarity score > 0.85
Synapse Connections: Messages connected via SYNAPSE relationships
Older Context: Historical messages (first to be truncated)

Similarity-Based Prioritization

Context is ranked by relevance:

Priority Score = (Similarity Score × 0.7) + (Recency Score × 0.3)

Where:
- Similarity Score: 0.0-1.0 from semantic search
- Recency Score: 0.0-1.0 based on message age

Truncation Strategy

When content must be truncated:

Message-Level Truncation: Remove entire messages (preserves coherence)
LIFO for Semantic: Last-In-First-Out for semantic matches
FIFO for Recent: First-In-First-Out for chronological history
Preserve Pairs: Keep user/assistant pairs together when possible

Configuration Options

Context Size Limits

Configure via environment variables or config file:

# Set maximum semantic context messages
reservoir config --set semantic_context_size=20

# Set recent history limit
reservoir config --set recent_context_size=15

# Set token reserve buffer
reservoir config --set token_reserve=2048

Model-Specific Overrides

# In reservoir.toml
[models.gpt-4-turbo]
max_context_tokens = 120000
reserve_tokens = 8000
semantic_context_size = 50

[models.gpt-3.5-turbo]
max_context_tokens = 4096
reserve_tokens = 1024
semantic_context_size = 10

Token Usage Monitoring

Built-in Monitoring

Reservoir automatically tracks:

Input Tokens: Context + user message tokens
Reserve Usage: How much buffer is being used
Truncation Events: When content is cut due to limits
Model Utilization: Percentage of context window used

Usage Examples

# View recent messages with estimated token usage
reservoir view 10 | while read -r line; do
    echo "$line (est. tokens: $((${#line}/4)))"
done

# Estimate total context size
TOTAL_CHARS=$(reservoir view 15 | wc -c)
echo "Estimated tokens: $((TOTAL_CHARS/4))"

# Check if context might be truncated for a model
CONTEXT_SIZE=$(($(reservoir view 15 | wc -c) / 4))
echo "Context tokens: $CONTEXT_SIZE"
echo "Fits in GPT-3.5: $([ $CONTEXT_SIZE -lt 3000 ] && echo 'Yes' || echo 'No')"

Optimization Strategies

Reduce Context Size

Adjust Semantic Context

# Reduce semantic matches
reservoir config --set semantic_context_size=10

# Increase similarity threshold (fewer matches)
# Note: This requires code modification currently

Limit Recent History

# Reduce recent message count
reservoir config --set recent_context_size=8

Improve Context Quality

Use Higher Similarity Threshold

Fewer but more relevant semantic matches
Better context quality with less noise
Requires code-level configuration changes

Partition Strategy

Use specific partitions for focused contexts
Separate unrelated discussions
Improves relevance within token limits

# Focused partition for coding discussions
echo "Python async/await question" | reservoir ingest --partition alice --instance coding

# Separate partition for general chat
echo "Weather discussion" | reservoir ingest --partition alice --instance general

Model-Specific Considerations

Small Context Models (GPT-3.5)

Optimization Strategy:

Prioritize recent messages heavily
Limit semantic context to top 5-10 matches
Use aggressive truncation
Consider shorter message summaries

# Configuration for small context models
reservoir config --set semantic_context_size=5
reservoir config --set recent_context_size=8

Large Context Models (GPT-4-turbo)

Utilization Strategy:

Include extensive semantic context
Preserve longer conversation history
Enable deeper synapse exploration
Allow for more comprehensive context

# Configuration for large context models
reservoir config --set semantic_context_size=30
reservoir config --set recent_context_size=25

Advanced Token Management

Dynamic Context Adjustment

Reservoir can adjust context based on content type:

Code-Heavy Contexts: Reduce character-to-token ratio assumption Natural Language: Use standard ratios Mixed Content: Apply weighted calculations

Future Enhancements

Planned Features:

Semantic Summarization: Summarize older context instead of truncating
Token-Aware Similarity: Consider token cost in similarity ranking
Model-Aware Optimization: Automatic settings per model
Context Compression: Compress historical context intelligently

Custom Token Strategies

Per-Partition Settings

# Different strategies for different use cases
reservoir config --set partitions.coding.semantic_context_size=20
reservoir config --set partitions.research.recent_context_size=30

Content-Type Awareness

# Adjust for code vs text heavy partitions
reservoir config --set partitions.coding.token_multiplier=1.3
reservoir config --set partitions.writing.token_multiplier=0.9

Troubleshooting Token Issues

Common Problems

Context Too Large

# Symptoms: API errors about token limits
# Solution: Reduce context sizes
reservoir config --set semantic_context_size=10
reservoir config --set recent_context_size=5

Context Too Small

# Symptoms: Poor context quality, missing relevant information
# Solution: Increase context sizes (if model supports it)
reservoir config --set semantic_context_size=25
reservoir config --set recent_context_size=20

Frequent Truncation

# Symptoms: Important context being cut off
# Solution: Use larger context model or adjust priorities

Diagnostic Commands

# Estimate current context size
SEMANTIC_SIZE=$(reservoir search --semantic "test" | wc -c)
RECENT_SIZE=$(reservoir view 15 | wc -c)
TOTAL_SIZE=$((SEMANTIC_SIZE + RECENT_SIZE))
echo "Total context estimate: $((TOTAL_SIZE/4)) tokens"

# Check truncation frequency
# (This would require log analysis)
grep -i "truncat" /var/log/reservoir.log | wc -l

Token management in Reservoir ensures optimal AI performance by providing the right amount of relevant context while respecting model limitations, creating an intelligent balance between comprehensive memory and computational efficiency.

Partitioning & Organization

Reservoir uses a flexible partitioning system to organize your conversations and data. This two-level hierarchy enables you to separate different contexts, users, projects, or topics while maintaining intelligent context enrichment within each boundary.

Partitioning Concepts

Two-Level Hierarchy

Reservoir organizes data using two levels:

Partition: The top-level organizational boundary
Instance: The sub-level within each partition

partition_name/
├── instance_1/
├── instance_2/
└── instance_3/

Default Organization

When no partition is specified, Reservoir uses:

Partition: "default"
Instance: "default"

# These are equivalent
reservoir view 10
reservoir view --partition default --instance default 10

Partition Use Cases

User Separation

Separate different users or personas:

alice/
├── personal/      # Personal conversations
├── work/         # Work-related discussions  
└── research/     # Research and learning

bob/
├── coding/       # Programming discussions
├── writing/      # Content creation
└── planning/     # Project planning

Usage Examples:

# Alice's personal conversations
echo "What's the weather like?" | reservoir ingest --partition alice --instance personal

# Bob's coding discussions
echo "How do I implement OAuth2?" | reservoir ingest --partition bob --instance coding

# View Alice's work conversations
reservoir view --partition alice --instance work 15

# Search Bob's coding history
reservoir search --partition bob --instance coding --semantic "database optimization"

Project Organization

Organize by projects or domains:

webapp_project/
├── backend/      # Backend development
├── frontend/     # Frontend development
├── database/     # Database design
└── deployment/   # DevOps and deployment

mobile_app/
├── ios/          # iOS development
├── android/      # Android development
├── api/          # API integration
└── testing/      # QA and testing

Usage Examples:

# Backend development discussions
echo "Should we use microservices or monolith?" | reservoir ingest --partition webapp_project --instance backend

# Mobile API integration
echo "API authentication best practices" | reservoir ingest --partition mobile_app --instance api

# Search across web project
reservoir search --partition webapp_project --semantic "authentication"

# View mobile testing discussions
reservoir view --partition mobile_app --instance testing 20

Team Collaboration

Organize by teams or functional areas:

engineering/
├── architecture/  # System architecture
├── reviews/      # Code reviews
├── planning/     # Sprint planning
└── incidents/    # Incident response

product/
├── requirements/ # Requirements gathering
├── research/     # User research
├── roadmap/      # Product roadmap
└── metrics/      # Analytics and metrics

Usage Examples:

# Architecture discussions
echo "Microservices vs serverless trade-offs" | reservoir ingest --partition engineering --instance architecture

# Product research notes
echo "User feedback on new feature" | reservoir ingest --partition product --instance research

# Search engineering incidents
reservoir search --partition engineering --instance incidents "database"

# View product roadmap discussions
reservoir view --partition product --instance roadmap 10

Context Isolation

How Partitioning Affects Context

Reservoir's context enrichment respects partition boundaries:

Same Partition/Instance: Full context sharing
Same Partition, Different Instance: Limited context sharing
Different Partition: Complete isolation

Context Rules:

# These will share context with each other
reservoir ingest --partition alice --instance coding "How do I use async/await?"
reservoir ingest --partition alice --instance coding "What about error handling?"

# This will have separate context
reservoir ingest --partition alice --instance personal "What should I cook for dinner?"

# This will be completely isolated
reservoir ingest --partition bob --instance coding "How do I use async/await?"

Privacy and Separation

Partitions provide data privacy:

Search Isolation: Searches are scoped to partitions
Context Isolation: AI responses don't leak across partitions
Export Control: Can selectively export partition data
Access Control: Enables future per-partition access controls

Partition Management

Creating Partitions

Partitions are created automatically when first used:

# Creates "newproject" partition with "planning" instance
echo "Project kickoff meeting notes" | reservoir ingest --partition newproject --instance planning

Viewing Partition Data

# View messages from specific partition/instance
reservoir view --partition alice --instance coding 15

# View without specifying instance (shows from all instances in partition)
reservoir view --partition alice 25

# Search within partition
reservoir search --partition engineering --semantic "deployment strategy"

# Search within specific instance
reservoir search --partition engineering --instance architecture --semantic "microservices"

Partition Listing

Currently, there's no direct command to list all partitions, but you can discover them through data export and analysis:

# Export and analyze partition distribution
reservoir export | jq -r '.[] | .partition' | sort | uniq -c | sort -nr

# Find all instances within a partition
reservoir export | jq -r '.[] | select(.partition=="alice") | .instance' | sort | uniq -c

Advanced Partitioning Strategies

Time-Based Partitioning

Organize by time periods:

conversations_2024/
├── january/
├── february/
└── march/

conversations_2023/
├── q1/
├── q2/
├── q3/
└── q4/

# Current month's discussions
MONTH=$(date +%B | tr '[:upper:]' '[:lower:]')
echo "Today's important insight" | reservoir ingest --partition conversations_2024 --instance $MONTH

Topic-Based Partitioning

Organize by subject matter:

machine_learning/
├── theory/       # Theoretical discussions
├── implementation/ # Code and implementation
├── papers/       # Research papers
└── experiments/  # Experimental results

web_development/
├── frontend/     # Frontend technologies
├── backend/      # Backend systems
├── databases/    # Database design
└── devops/       # Operations and deployment

Environment-Based Partitioning

Separate by environment or context:

development/
├── local/        # Local development
├── testing/      # Testing environment
├── staging/      # Staging discussions
└── production/   # Production issues

personal/
├── learning/     # Educational content
├── projects/     # Personal projects
├── notes/        # General notes
└── ideas/        # Ideas and brainstorming

Best Practices

Naming Conventions

Use Lowercase: Partition and instance names should be lowercase
Use Underscores: Separate words with underscores: machine_learning
Be Descriptive: Choose clear, meaningful names
Keep Consistent: Maintain consistent naming across partitions

# Good naming
reservoir ingest --partition web_development --instance frontend
reservoir ingest --partition machine_learning --instance deep_learning

# Avoid these patterns
reservoir ingest --partition WebDev --instance FE  # Mixed case, abbreviated
reservoir ingest --partition "web development" --instance "front end"  # Spaces

Partition Strategy

Plan Your Structure: Design partition hierarchy before heavy usage
Balance Granularity: Too many partitions reduce context benefits
Consider Growth: Design for future expansion
Document Structure: Keep a record of partition purposes

Migration Between Partitions

Currently, partition migration requires export/import workflow:

# Export messages from one partition
reservoir export | jq '.[] | select(.partition=="old_partition")' > old_partition.json

# Edit JSON to change partition/instance names
sed 's/"partition":"old_partition"/"partition":"new_partition"/g' old_partition.json > new_partition.json

# Import to new structure
reservoir import new_partition.json

# Verify migration
reservoir view --partition new_partition 10

Integration with Other Features

Search Scoping

All search operations can be scoped to partitions:

# Search across all data
reservoir search --semantic "error handling"

# Search within partition  
reservoir search --partition engineering --semantic "error handling"

# Search within specific instance
reservoir search --partition engineering --instance backend --semantic "error handling"

Data Export

Partitioning enables selective data export:

# Export everything
reservoir export > all_data.json

# Export specific partition (requires jq processing)
reservoir export | jq '.[] | select(.partition=="alice")' > alice_data.json

# Export specific instance
reservoir export | jq '.[] | select(.partition=="alice" and .instance=="coding")' > alice_coding.json

Context Enrichment

Partitioning directly affects how context is built:

Semantic Search: Limited to same partition/instance
Recent History: Limited to same partition/instance
Synapse Relationships: Respect partition boundaries
Token Limits: Applied per partition context

This partitioning system makes Reservoir suitable for multi-user environments, project-based work, and any scenario where logical separation of conversation contexts is beneficial.

Web Search Integration

Reservoir supports web search integration for models that provide this capability, enabling AI assistants to access real-time information from the internet while maintaining the benefits of conversational memory and context enrichment.

Overview

Web search integration allows AI models to:

Access Current Information: Get up-to-date data not in training sets
Verify Facts: Cross-reference stored conversations with current sources
Expand Context: Combine web results with Reservoir's semantic memory
Enhanced Research: Build knowledge from both conversation history and web sources

Supported Models

OpenAI Models with Web Search

gpt-4o-search-preview: OpenAI's experimental web search model
Future Models: Additional web-enabled models as they become available

Local Models

Web search capability depends on the underlying model's features:

Some Ollama models may support web search plugins
Custom implementations can be integrated via the API

Usage

Basic Web Search Request

curl -X POST "http://localhost:3017/v1/partition/research/instance/current_events/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o-search-preview",
    "messages": [
      {
        "role": "user",
        "content": "What are the latest developments in renewable energy technology?"
      }
    ],
    "web_search_options": {
      "enabled": true
    }
  }'

Web Search with Context

{
  "model": "gpt-4o-search-preview",
  "messages": [
    {
      "role": "user",
      "content": "Based on our previous discussion about solar panels, what are the newest efficiency improvements announced this month?"
    }
  ],
  "web_search_options": {
    "enabled": true,
    "max_results": 5,
    "search_depth": "recent"
  }
}

Web Search Options

Configuration Parameters

{
  "web_search_options": {
    "enabled": true,
    "max_results": 10,
    "search_depth": "comprehensive",
    "time_range": "recent",
    "include_sources": true,
    "filter_domains": ["example.com", "trusted-source.org"]
  }
}

Parameter Details

Parameter	Type	Description	Default
`enabled`	boolean	Enable/disable web search	`false`
`max_results`	integer	Maximum search results to consider	`5`
`search_depth`	string	`"quick"`, `"standard"`, `"comprehensive"`	`"standard"`
`time_range`	string	`"recent"`, `"week"`, `"month"`, `"any"`	`"any"`
`include_sources`	boolean	Include source URLs in response	`true`
`filter_domains`	array	Restrict to specific domains	`[]`

How Web Search Works with Reservoir

Enhanced Context Flow

flowchart TD
    A["User Query Arrives"] --> B["Extract Search Terms"]
    B --> C["Reservoir Context Enrichment"]
    C --> D["Semantic Search (Local)"]
    C --> E["Recent History (Local)"]
    D --> F["Combine Local Context"]
    E --> F
    F --> G["Web Search (if enabled)"]
    G --> H["Merge Web Results with Context"]
    H --> I["Send Enriched Request to AI"]
    I --> J["AI Response with Web Sources"]
    J --> K["Store Response in Reservoir"]

Context Prioritization

When web search is enabled, context is prioritized:

User's Current Message: Always highest priority
Web Search Results: Real-time information
Semantic Context: Relevant past conversations
Recent History: Chronological conversation flow
Additional Context: Synapse connections

Example Workflows

Research Assistant

import openai

openai.api_base = "http://localhost:3017/v1/partition/research/instance/ai_trends"

# Initial research query
response = openai.ChatCompletion.create(
    model="gpt-4o-search-preview",
    messages=[
        {
            "role": "user", 
            "content": "What are the latest breakthroughs in large language models?"
        }
    ],
    web_search_options={
        "enabled": True,
        "time_range": "recent",
        "max_results": 8
    }
)

print(response.choices[0].message.content)

# Follow-up question (benefits from both web search and conversation history)
response = openai.ChatCompletion.create(
    model="gpt-4o-search-preview",
    messages=[
        {
            "role": "user",
            "content": "How do these breakthroughs compare to what we discussed last week about model efficiency?"
        }
    ],
    web_search_options={"enabled": True}
)

News Analysis

# Get latest information
curl -X POST "http://localhost:3017/v1/partition/news/instance/tech/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o-search-preview",
    "messages": [
      {
        "role": "user",
        "content": "Summarize today'\''s major technology news"
      }
    ],
    "web_search_options": {
      "enabled": true,
      "time_range": "recent",
      "max_results": 10
    }
  }'

# Follow up with context
curl -X POST "http://localhost:3017/v1/partition/news/instance/tech/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o-search-preview", 
    "messages": [
      {
        "role": "user",
        "content": "How does this relate to the trends we'\''ve been tracking this month?"
      }
    ],
    "web_search_options": {
      "enabled": true
    }
  }'

Response Format

With Web Sources

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Based on recent reports and our previous discussions about solar technology, here are the latest efficiency improvements:\n\n## Recent Developments\n\n1. **Perovskite-Silicon Tandem Cells**: New research published this week shows efficiency rates reaching 33.7%...\n\n2. **Quantum Dot Technology**: Scientists have achieved 15% efficiency improvements...\n\nThese developments build on your earlier questions about cost-effectiveness, and the new efficiency gains should address the concerns you raised about ROI timelines.\n\n### Sources:\n- Nature Energy, December 2024\n- MIT Technology Review, December 2024\n- Previous conversation: Solar panel efficiency discussion"
      },
      "finish_reason": "stop",
      "index": 0
    }
  ],
  "web_sources": [
    {
      "title": "Breakthrough in Perovskite Solar Cell Efficiency",
      "url": "https://www.nature.com/articles/...",
      "snippet": "Researchers achieve record-breaking 33.7% efficiency...",
      "date": "2024-12-15"
    }
  ]
}

Configuration

Environment Variables

# Enable web search by default
export RSV_WEB_SEARCH_ENABLED=true

# Configure search limits
export RSV_WEB_SEARCH_MAX_RESULTS=5
export RSV_WEB_SEARCH_TIME_RANGE=recent

# API keys for search providers (if needed)
export SEARCH_API_KEY="your-search-api-key"

Per-Request Configuration

Web search can be enabled/disabled per request:

# Enable for research
response = openai.ChatCompletion.create(
    model="gpt-4o-search-preview",
    messages=[{"role": "user", "content": "Current AI research trends"}],
    web_search_options={"enabled": True}
)

# Disable for private discussions
response = openai.ChatCompletion.create(
    model="gpt-4o-search-preview", 
    messages=[{"role": "user", "content": "Help me plan my personal project"}],
    web_search_options={"enabled": False}
)

Use Cases

When to Enable Web Search

✅ Good Use Cases:

Current events and news
Latest research and publications
Real-time data (stock prices, weather, etc.)
Technical documentation updates
Recent product releases or updates

❌ Avoid Web Search For:

Personal conversations
Private project discussions
Creative writing tasks
Code debugging (unless looking for new solutions)
Historical analysis (where training data is sufficient)

Partition Strategies

# News and current events
/v1/partition/news/instance/tech/chat/completions

# Research and academic work
/v1/partition/research/instance/ai_papers/chat/completions

# Market analysis
/v1/partition/business/instance/market_intel/chat/completions

# Personal assistant (web search disabled)
/v1/partition/personal/instance/planning/chat/completions

Performance Considerations

Latency Impact

Web Search Enabled: +1-3 seconds for search and processing
Web Search Disabled: Standard Reservoir latency (200-500ms)
Caching: Some web results may be cached for performance

Cost Implications

Web search may incur additional API costs
Consider rate limiting for high-volume applications
Balance between information freshness and cost

Token Usage

Web search results count toward token limits:

Search results are included in context token calculation
May reduce available space for conversation history
Automatic truncation applies when limits are exceeded

Troubleshooting

Web Search Not Working

# Check model support
reservoir config --get web_search_enabled

# Verify API keys
echo $OPENAI_API_KEY | wc -c  # Should be > 0

# Test with minimal request
curl -X POST "http://localhost:3017/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o-search-preview",
    "messages": [{"role": "user", "content": "What is today'\''s date?"}],
    "web_search_options": {"enabled": true}
  }'

Search Quality Issues

Refine Search Terms: Use more specific queries
Adjust Time Range: Narrow to recent results for current topics
Filter Domains: Restrict to authoritative sources
Combine with Context: Let Reservoir's memory provide additional context

Future Enhancements

Planned Features

Custom Search Providers: Integration with different search APIs
Search Result Caching: Store web results for reuse
Source Ranking: Prioritize trusted sources
Search History: Track and learn from search patterns

Integration Possibilities

Domain-Specific Search: Academic papers, patents, documentation
Real-Time Data: APIs for live information
Multi-Modal Search: Images, videos, and documents
Knowledge Graphs: Structured information integration

Web search integration transforms Reservoir from a conversational memory system into a comprehensive knowledge assistant that combines the depth of accumulated conversations with the breadth of current web information.

Import/Export

Reservoir provides comprehensive import and export capabilities for backing up your conversation data, migrating between systems, and integrating with external tools. The system exports data in JSON format, preserving all message metadata, embeddings, and relationships.

Export Functionality

Basic Export

Export all conversation data to JSON format:

# Export to stdout
reservoir export

# Save to file
reservoir export > conversations.json

# Export with timestamp
reservoir export > backup_$(date +%Y%m%d_%H%M%S).json

Export Format

Each exported message includes complete metadata:

[
  {
    "id": null,
    "trace_id": "550e8400-e29b-41d4-a716-446655440000",
    "partition": "alice",
    "instance": "coding",
    "content": "How do I implement error handling in async functions?",
    "role": "user",
    "embedding": [0.123, -0.456, 0.789, ...],
    "url": null,
    "timestamp": 1705315800000
  },
  {
    "id": null,
    "trace_id": "550e8400-e29b-41d4-a716-446655440001",
    "partition": "alice",
    "instance": "coding",
    "content": "Here are several approaches to error handling in async functions...",
    "role": "assistant",
    "embedding": [0.234, -0.567, 0.890, ...],
    "url": null,
    "timestamp": 1705315815000
  }
]

What's Included in Export

Complete Message Data: All message content and metadata
Vector Embeddings: Full embedding vectors for similarity search
Partition Organization: Partition and instance information
Conversation Structure: Trace IDs linking user/assistant pairs
Timestamps: Precise timing information
Roles: User, assistant, and system message roles

Export Use Cases

Data Backup

# Daily backup
reservoir export > "backup_$(date +%Y%m%d).json"

# Compressed backup
reservoir export | gzip > "backup_$(date +%Y%m%d).json.gz"

Migration

# Export from source system
reservoir export > migration_data.json

# Transfer to new system
scp migration_data.json user@newserver:/path/to/reservoir/

Analysis

# Export for external analysis
reservoir export | jq '.[] | select(.role=="user")' > user_messages.json

# Export specific time range
reservoir export | jq '.[] | select(.timestamp > 1705315800000)' > recent_messages.json

Import Functionality

Basic Import

Import conversation data from JSON files:

# Import from file
reservoir import conversations.json

# Import from compressed backup
gunzip -c backup_20240115.json.gz | reservoir import /dev/stdin

Import Behavior

Data Validation

Validates JSON format and structure
Checks required fields (trace_id, partition, instance, role, content)
Verifies embedding vector format and dimensions

Duplicate Handling

Skips messages with duplicate trace_id and role combinations
Preserves existing data integrity
Logs skipped duplicates for review

Relationship Reconstruction

Automatically rebuilds RESPONDED_WITH relationships
Recreates HAS_EMBEDDING connections
Maintains partition/instance boundaries

Import Process

File Reading: Load and parse JSON data
Validation: Check data format and completeness
Message Creation: Create MessageNode entries
Embedding Processing: Store vector embeddings
Relationship Building: Establish graph relationships
Index Updates: Update vector indices

Import Examples

Complete System Restore

# Stop Reservoir service
systemctl stop reservoir

# Clear existing data (if needed)
# WARNING: This is destructive!

# Import backup
reservoir import full_backup_20240115.json

# Verify import
reservoir view 10

Selective Import

# Import specific partition data
cat full_backup.json | jq '.[] | select(.partition=="alice")' > alice_data.json
reservoir import alice_data.json

# Import recent messages only
cat backup.json | jq '.[] | select(.timestamp > 1705315800000)' > recent.json
reservoir import recent.json

Advanced Export/Import

Filtering Exports

By Partition

# Export specific user's data
reservoir export | jq '.[] | select(.partition=="alice")' > alice_conversations.json

By Time Range

# Export last 24 hours
YESTERDAY=$(date -d '1 day ago' +%s)000
reservoir export | jq ".[] | select(.timestamp > $YESTERDAY)" > recent_conversations.json

By Role

# Export only user messages
reservoir export | jq '.[] | select(.role=="user")' > user_questions.json

# Export only assistant responses
reservoir export | jq '.[] | select(.role=="assistant")' > ai_responses.json

By Content

# Export messages containing specific terms
reservoir export | jq '.[] | select(.content | test("python|programming"; "i"))' > programming_discussions.json

Data Transformation

Convert to CSV

reservoir export | jq -r '.[] | [.timestamp, .partition, .instance, .role, .content] | @csv' > conversations.csv

Extract Text Only

reservoir export | jq -r '.[] | .content' > all_messages.txt

Create Markdown Format

reservoir export | jq -r '.[] | "## " + (.timestamp | tostring) + " (" + .role + ")\n\n" + .content + "\n"' > conversations.md

Batch Operations

Multiple File Import

# Import multiple backup files
for file in backup_*.json; do
    echo "Importing $file..."
    reservoir import "$file"
done

Incremental Backup Strategy

#!/bin/bash
# Incremental backup script

BACKUP_DIR="/backup/reservoir"
LAST_BACKUP_TIME=$(cat "$BACKUP_DIR/.last_backup" 2>/dev/null || echo "0")
CURRENT_TIME=$(date +%s)000

# Export messages since last backup
reservoir export | jq ".[] | select(.timestamp > $LAST_BACKUP_TIME)" > "$BACKUP_DIR/incremental_$(date +%Y%m%d_%H%M%S).json"

# Update last backup time
echo "$CURRENT_TIME" > "$BACKUP_DIR/.last_backup"

Data Migration Workflows

System Migration

Complete Migration

# Source system
reservoir export > complete_migration.json

# Target system  
reservoir import complete_migration.json

# Verify migration
SOURCE_COUNT=$(jq length complete_migration.json)
TARGET_COUNT=$(reservoir export | jq length)
echo "Source: $SOURCE_COUNT messages, Target: $TARGET_COUNT messages"

Partition Migration

# Migrate specific user to new system
reservoir export | jq '.[] | select(.partition=="alice")' > alice_migration.json

# On target system
reservoir import alice_migration.json

# Verify partition migration
reservoir view --partition alice 10

Cross-System Integration

Export for External Processing

# Export for machine learning analysis
reservoir export | jq '.[] | {content: .content, embedding: .embedding}' > ml_dataset.json

# Export conversation pairs for training
reservoir export | jq -r 'group_by(.trace_id) | .[] | select(length == 2) | {user: .[0].content, assistant: .[1].content}' > conversation_pairs.json

Import from External Sources

Convert external data to Reservoir format:

{
  "trace_id": "external-001",
  "partition": "imported",
  "instance": "external_system",
  "content": "Question from external system",
  "role": "user",
  "embedding": [], // Will be generated if empty
  "url": null,
  "timestamp": 1705315800000
}

Data Integrity and Verification

Export Verification

# Check export completeness
EXPORTED_COUNT=$(reservoir export | jq length)
echo "Exported $EXPORTED_COUNT messages"

# Verify embeddings
EMBEDDED_COUNT=$(reservoir export | jq '[.[] | select(.embedding | length > 0)] | length')
echo "$EMBEDDED_COUNT messages have embeddings"

# Check partition distribution
reservoir export | jq -r '.[] | .partition' | sort | uniq -c

Import Validation

# Validate JSON format before import
jq . backup.json > /dev/null && echo "Valid JSON" || echo "Invalid JSON"

# Check required fields
jq '.[] | select(.trace_id and .partition and .instance and .role and .content)' backup.json | jq length

# Verify import success
reservoir view 10
reservoir search --semantic "test query"

Performance Considerations

Large Dataset Handling

Streaming Export

# For very large datasets, process in chunks
reservoir export | jq -c '.[]' | split -l 1000 - chunk_

# Import chunks
for chunk in chunk_*; do
    jq -s '.' "$chunk" | reservoir import /dev/stdin
done

Compression

# Compress exports to save space
reservoir export | gzip > backup.json.gz

# Decompress for import
gunzip -c backup.json.gz | reservoir import /dev/stdin

Network Transfer

Efficient Transfer

# Direct transfer without intermediate files
ssh source_server 'reservoir export' | reservoir import /dev/stdin

# Compressed transfer
ssh source_server 'reservoir export | gzip' | gunzip | reservoir import /dev/stdin

Troubleshooting

Common Issues

Import Failures

# Check JSON validity
jq . import_file.json

# Verify required fields
jq '.[] | keys' import_file.json | head -5

# Check for duplicate trace_ids
jq -r '.[] | .trace_id' import_file.json | sort | uniq -d

Missing Embeddings

# Check embedding status
reservoir export | jq '[.[] | select(.embedding | length == 0)] | length'

# Regenerate embeddings if needed
reservoir replay

Partition Issues

# Check partition consistency
reservoir export | jq -r '.[] | "\(.partition)/\(.instance)"' | sort | uniq -c

# View messages in specific partition
reservoir view --partition problematic_partition 10

Recovery Procedures

Partial Import Recovery

# If import fails partway through, check what was imported
IMPORTED_COUNT=$(reservoir export | jq length)
TOTAL_COUNT=$(jq length backup.json)
echo "Imported $IMPORTED_COUNT of $TOTAL_COUNT messages"

# Import remaining messages (requires identifying what's missing)

Data Corruption Recovery

# Export current state
reservoir export > current_state.json

# Restore from known good backup
reservoir import good_backup.json

# Compare and merge if needed

The import/export system provides a robust foundation for data management, enabling seamless backup, migration, and integration workflows while maintaining complete data fidelity and system integrity.

Local Deployment

This guide covers setting up Reservoir for local development and production use on your local machine.

Prerequisites

Before deploying Reservoir locally, ensure you have the following installed:

Rust (latest stable version)
Docker (for Neo4j database)
Git for version control

Quick Setup

Step 1: Clone the Repository

git clone https://github.com/divanvisagie/reservoir.git
cd reservoir

Step 2: Start Neo4j Database

You have several options for running Neo4j locally:

Option A: Docker Compose (Recommended)

docker-compose up -d

This starts Neo4j on the default bolt://localhost:7687 with the credentials defined in the docker-compose file.

Option B: Docker Manual Setup

docker run \
    --name neo4j \
    -p7474:7474 -p7687:7687 \
    -d \
    -v $HOME/neo4j/data:/data \
    -v $HOME/neo4j/logs:/logs \
    -v $HOME/neo4j/import:/var/lib/neo4j/import \
    -v $HOME/neo4j/plugins:/plugins \
    --env NEO4J_AUTH=neo4j/password \
    neo4j:latest

Option C: Homebrew (macOS Service)

If you prefer to run Neo4j as a permanent background service:

brew install neo4j
brew services start neo4j

This will start Neo4j on bolt://localhost:7687 and ensure it runs automatically when your computer boots.

Step 3: Configure Environment Variables

Create a .env file in the project root or export the following environment variables:

# Server Configuration
RESERVOIR_PORT=3017
RESERVOIR_HOST=127.0.0.1

# Database Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password

# API Keys (required for respective providers)
OPENAI_API_KEY=sk-your-openai-key-here
MISTRAL_API_KEY=your-mistral-key-here
GEMINI_API_KEY=your-gemini-key-here

# Custom Provider URLs (optional)
RSV_OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
RSV_OLLAMA_BASE_URL=http://localhost:11434/v1/chat/completions
RSV_MISTRAL_BASE_URL=https://api.mistral.ai/v1/chat/completions

Note: Most environment variables have sensible defaults. Only the API keys for your chosen providers are required.

Step 4: Build and Run

Manual Execution

# Build the project
cargo build --release

# Run Reservoir
cargo run -- start

Using Make Commands

# Build the release binary
make main

# Run for development (with auto-reload)
make dev

# Run normally
make run

Reservoir will now be available at http://localhost:3017.

Service Installation (macOS)

For a more permanent setup, you can install Reservoir as a macOS LaunchAgent service.

Install the Service

make install-service

This command:

Copies the LaunchAgent plist to ~/Library/LaunchAgents/
Loads the service using launchctl
Starts Reservoir automatically in the background

Service Management

Check service status:

launchctl list | grep reservoir

View service logs:

tail -f /tmp/reservoir.log
tail -f /tmp/reservoir.err

Manually start/stop the service:

# Start
launchctl start com.sectorflabs.reservoir

# Stop
launchctl stop com.sectorflabs.reservoir

Uninstall the Service

make uninstall-service

This removes the service and cleans up all related files.

Verification

Test the Installation

Check if Reservoir is running:
```
curl http://localhost:3017/health
```

Test with a simple API call:

curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4",
        "messages": [
            {
                "role": "user",
                "content": "Hello, Reservoir!"
            }
        ]
    }'

Run the test suite:
```
./hurl/test.sh
```

Check Neo4j Connection

Verify that Neo4j is accessible:

# Check Neo4j web interface
open http://localhost:7474

# Test connection with curl
curl -u neo4j:password http://localhost:7474/db/data/

Configuration Options

Database Configuration

Variable	Default	Description
`NEO4J_URI`	`bolt://localhost:7687`	Neo4j connection URI
`NEO4J_USERNAME`	`neo4j`	Database username
`NEO4J_PASSWORD`	`password`	Database password

Server Configuration

Variable	Default	Description
`RESERVOIR_PORT`	`3017`	HTTP server port
`RESERVOIR_HOST`	`127.0.0.1`	HTTP server host

Provider Configuration

Variable	Default	Description
`RSV_OPENAI_BASE_URL`	`https://api.openai.com/v1/chat/completions`	OpenAI API endpoint
`RSV_OLLAMA_BASE_URL`	`http://localhost:11434/v1/chat/completions`	Ollama API endpoint
`RSV_MISTRAL_BASE_URL`	`https://api.mistral.ai/v1/chat/completions`	Mistral API endpoint

Troubleshooting

Common Issues

Port Already in Use:

# Check what's using port 3017
lsof -i :3017

# Use a different port
export RESERVOIR_PORT=3018

Neo4j Connection Failed:

# Check if Neo4j is running
docker ps | grep neo4j

# Check Neo4j logs
docker logs neo4j

Permission Issues (macOS Service):

# Ensure the binary path is correct in the plist
ls -la ~/.cargo/bin/reservoir

# Update the path in scripts/com.sectorflabs.reservoir.plist if needed

API Key Issues:

# Verify your API key is set
echo $OPENAI_API_KEY

# Test the key directly with OpenAI
curl https://api.openai.com/v1/models \
    -H "Authorization: Bearer $OPENAI_API_KEY"

Performance Tuning

For better performance in local deployment:

Increase Neo4j memory allocation:

# In docker-compose.yml, add:
NEO4J_dbms_memory_heap_initial__size=512m
NEO4J_dbms_memory_heap_max__size=2G

Use SSD storage for Neo4j data:

# Mount Neo4j data on fast storage
-v /path/to/fast/storage:/data

Optimize connection pooling:

# Add to .env
NEO4J_MAX_CONNECTIONS=20
NEO4J_CONNECTION_TIMEOUT=30s

Next Steps

After successful local deployment:

Your Reservoir instance is now ready for local development and testing!

Common Issues

This page covers the most common issues you might encounter when using Reservoir and how to solve them.

Server Issues

Server Not Starting

Symptoms:

Cannot connect to http://localhost:3017
Connection refused errors
Server fails to start

Solutions:

Check Neo4j

Ensure Neo4j is running and accessible:

# Check if Neo4j is running
systemctl status neo4j  # Linux
brew services list | grep neo4j  # macOS

# Start Neo4j if not running
systemctl start neo4j  # Linux
brew services start neo4j  # macOS

Port Conflicts

Default port 3017 might be in use:

# Check what's using port 3017
lsof -i :3017

# Use a different port
RESERVOIR_PORT=3018 cargo run -- start

Environment Variables

If using direnv, make sure it's loaded:

# Check if direnv is working
direnv status

# Allow direnv for current directory
direnv allow

Server Starts But Returns Errors

Check Server Logs

Look at the server output for detailed error messages:

# Start with verbose logging
RUST_LOG=debug cargo run -- start

Test Basic Connectivity

# Test if server is responding
curl http://localhost:3017/health

# If health endpoint doesn't exist, try a simple request
curl "http://localhost:3017/partition/test/instance/basic/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{"model": "gemma3", "messages": [{"role": "user", "content": "hello"}]}'

API and Model Issues

"Internal Server Error" Responses

Symptoms:

HTTP 500 errors
Generic error messages
Requests failing unexpectedly

Solutions:

Verify API Keys

Check that your API keys are set correctly:

echo $OPENAI_API_KEY
echo $MISTRAL_API_KEY
echo $GEMINI_API_KEY

If not set:

export OPENAI_API_KEY="your-openai-key"
export MISTRAL_API_KEY="your-mistral-key"
export GEMINI_API_KEY="your-gemini-key"

Check Model Names

Ensure you're using supported model names:

Model	Provider	API Key Required
`gpt-4`, `gpt-4o`, `gpt-4o-mini`, `gpt-3.5-turbo`	OpenAI	Yes (`OPENAI_API_KEY`)
`gpt-4o-search-preview`	OpenAI	Yes (`OPENAI_API_KEY`)
`llama3.2`, `gemma3`, or any custom name	Ollama	No
`mistral-large-2402`	Mistral	Yes (`MISTRAL_API_KEY`)
`gemini-2.0-flash`, `gemini-2.5-flash-preview-05-20`	Google	Yes (`GEMINI_API_KEY`)

Verify Ollama (for local models)

If using Ollama models, verify Ollama is running:

# Check Ollama status
ollama list

# If not running, start it
ollama serve

# Test Ollama directly
curl http://localhost:11434/api/tags

Deserialization Errors

Symptoms:

JSON parsing errors
"Failed to deserialize" messages
Malformed request errors

Solutions:

Check JSON Format

Ensure your JSON request is properly formatted:

# Good format
curl "http://localhost:3017/partition/$USER/instance/test/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "gemma3",
        "messages": [
            {
                "role": "user",
                "content": "Hello"
            }
        ]
    }'

Content-Type Header

Always use the correct content type:

# Always include this header
-H "Content-Type: application/json"

Optional Fields

Remember that fields like web_search_options are optional and can be omitted:

# This is valid without web_search_options
{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello"}]
}

Connection Issues

Symptoms:

Timeout errors
Network unreachable
DNS resolution failures

Solutions:

Check Provider URLs

Verify that custom provider URLs are accessible:

# Test OpenAI endpoint
curl -I https://api.openai.com/v1/chat/completions

# Test custom endpoint (if configured)
curl -I $RSV_OPENAI_BASE_URL

Verify Internet Connectivity

For cloud providers, ensure internet connectivity:

# Test internet connection
ping google.com

# Test specific provider
ping api.openai.com

Check Firewall Settings

Ensure no firewall is blocking outbound requests:

# Check if ports are blocked
telnet api.openai.com 443
telnet localhost 11434  # For Ollama

Database Issues

Neo4j Connection Problems

Symptoms:

"Failed to connect to Neo4j" errors
Database timeout errors
Authentication failures

Solutions:

Check Neo4j Status

# Check if Neo4j is running
systemctl status neo4j  # Linux
brew services list | grep neo4j  # macOS

# Check Neo4j logs
journalctl -u neo4j  # Linux
tail -f /usr/local/var/log/neo4j/neo4j.log  # macOS

Verify Connection Details

Check your Neo4j connection settings:

# Default connection
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password

Test Neo4j Directly

# Test with cypher-shell
cypher-shell -a bolt://localhost:7687 -u neo4j -p your-password

# Or use Neo4j Browser
# Navigate to http://localhost:7474

Vector Index Issues

Symptoms:

Slow semantic search
"Index not found" errors
Context enrichment not working

Solutions:

Recreate Vector Index

# Stop Reservoir
# Connect to Neo4j and run:
DROP INDEX embedding_index IF EXISTS;
CREATE VECTOR INDEX embedding_index FOR (n:EmbeddingNode) ON (n.embedding) OPTIONS {indexConfig: {`vector.dimensions`: 1536, `vector.similarity_function`: 'cosine'}};

Check Index Status

SHOW INDEXES;

Memory and Performance Issues

High Memory Usage

Symptoms:

System running out of memory
Slow responses
Process killed by system

Solutions:

Monitor Resource Usage

# Check Reservoir process
ps aux | grep reservoir

# Monitor system resources
htop
# or
top

Use Smaller Models

Switch to smaller models if using Ollama:

# Instead of large models, use smaller ones
ollama pull gemma3:2b  # 2B parameters instead of 7B

Limit Conversation History

The system automatically manages token limits, but you can monitor:

# View recent conversations to check size
cargo run -- view 10 --partition $USER --instance your-instance

Slow Responses

Symptoms:

Long wait times for responses
Timeouts
Poor performance

Solutions:

Check Model Performance

Different models have different performance characteristics:

Fastest: Smaller Ollama models (2B-7B parameters)
Medium: Cloud models like GPT-3.5-turbo
Slowest: Large local models (13B+ parameters)

Optimize Ollama

# Use GPU acceleration if available
ollama run gemma3 --gpu

# Check Ollama performance
ollama ps

Network Optimization

For cloud models:

# Test network speed to provider
curl -w "@curl-format.txt" -o /dev/null -s "https://api.openai.com/v1/models"

Testing and Debugging

Systematic Troubleshooting

Step 1: Test Basic Setup

# Test Reservoir is running
curl http://localhost:3017/health

# Test with simplest possible request
curl "http://localhost:3017/partition/test/instance/debug/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{"model": "gemma3", "messages": [{"role": "user", "content": "hi"}]}'

Step 2: Test with Different Models

# Test Ollama model (no API key)
curl "http://localhost:3017/partition/test/instance/debug/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -d '{"model": "gemma3", "messages": [{"role": "user", "content": "test"}]}'

# Test OpenAI model (requires API key)
curl "http://localhost:3017/partition/test/instance/debug/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{"model": "gpt-3.5-turbo", "messages": [{"role": "user", "content": "test"}]}'

Step 3: Check Logs

# Run with debug logging
RUST_LOG=debug cargo run -- start

# Check for specific error patterns
grep -i error reservoir.log
grep -i "failed" reservoir.log

Using the Included Tests

Reservoir includes hurl tests that you can use to verify your setup:

# Test all endpoints
./hurl/test.sh

# Test specific endpoints
hurl --variable USER="$USER" --variable OPENAI_API_KEY="$OPENAI_API_KEY" hurl/chat_completion.hurl
hurl --variable USER="$USER" hurl/reservoir-view.hurl
hurl --variable USER="$USER" hurl/reservoir-search.hurl

# Test Ollama mode
hurl hurl/ollama_mode.hurl

Getting Help

If you encounter issues not covered here:

Check the server logs for detailed error messages
Verify your environment variables are set correctly
Test with a simple curl request first
Try the included hurl tests to isolate the problem
Check the FAQ for additional solutions
Review the debugging guide for advanced troubleshooting

Environment Variable Reference

For quick reference, here are the key environment variables:

# Provider endpoints
RSV_OPENAI_BASE_URL="https://api.openai.com/v1/chat/completions"
RSV_OLLAMA_BASE_URL="http://localhost:11434/v1/chat/completions"
RSV_MISTRAL_BASE_URL="https://api.mistral.ai/v1/chat/completions"

# API keys
OPENAI_API_KEY="your-openai-key"
MISTRAL_API_KEY="your-mistral-key"
GEMINI_API_KEY="your-gemini-key"

# Reservoir settings
RESERVOIR_PORT="3017"

# Neo4j settings
NEO4J_URI="bolt://localhost:7687"
NEO4J_USER="neo4j"
NEO4J_PASSWORD="your-password"

Frequently Asked Questions

This section addresses common questions and issues you might encounter while using Reservoir.

General Questions

What is Reservoir?

Reservoir is a memory system for LLM conversations that acts as a smart proxy between your applications and OpenAI-compatible APIs. It automatically stores conversation history and enriches new requests with relevant context from past conversations.

Does Reservoir support streaming responses?

No, streaming responses are not currently supported. All requests are handled in a non-streaming manner. The response is returned once the complete message is received from the LLM provider.

Can I use Reservoir with clients other than the OpenAI Python library?

Yes, Reservoir is designed to be fully OpenAI-compatible. It has been tested with:

curl command line tool
OpenAI Python library
Chat Gipitty
Any application that can make HTTP requests to OpenAI-compatible endpoints

However, compatibility with some specialized clients may vary. If you encounter issues with a specific client, please report it as an issue.

What LLM providers does Reservoir support?

Reservoir supports multiple LLM providers:

OpenAI: GPT-4, GPT-4o, GPT-3.5-turbo, and specialized models
Ollama: Local models like Llama, Gemma, and any custom models
Mistral AI: Cloud-hosted Mistral models
Google Gemini: Google's AI models
Custom providers: Any OpenAI-compatible API endpoint

How does Reservoir organize conversations?

Reservoir uses a two-level organization system:

Partition: Top-level grouping (typically your username)
Instance: Application-specific context within a partition

This allows you to keep conversations from different applications separate while maintaining context within each application.

Is my data private?

Yes, absolutely. All conversation data is stored locally in your Neo4j database and never leaves your infrastructure. Reservoir only forwards your requests to the LLM providers you choose to use.

Technical Questions

What database does Reservoir use?

Reservoir uses Neo4j as its graph database. Neo4j provides:

Vector similarity search for semantic matching
Graph relationships for conversation threading
Efficient querying for context enrichment
Scalable storage for large conversation histories

How does context enrichment work?

When you send a message, Reservoir:

Stores your message in the database
Searches for semantically similar past messages
Retrieves recent conversation history
Injects relevant context into your request
Sends the enriched request to the LLM provider
Stores the response for future context

What are the token limits?

Reservoir respects the token limits of the underlying LLM models:

GPT-4: 8,192 tokens (context window)
GPT-4-32k: 32,768 tokens
GPT-3.5-turbo: 4,096 tokens
Local models: Varies by model

Reservoir automatically truncates context to fit within these limits while preserving system prompts and your latest message.

Can I run multiple Reservoir instances?

Yes, you can run multiple instances by:

Using different ports (RESERVOIR_PORT)
Using different Neo4j databases
Using different partition/instance combinations

Troubleshooting

Neo4j Connection Issues

Problem: Unable to connect to Neo4j.

Solutions:

Ensure Neo4j is running:
```
docker ps | grep neo4j
```

Check your connection details in .env:

NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password

Test the connection manually:

curl -u neo4j:password http://localhost:7474/db/data/

OpenAI API Key Issues

Problem: Requests fail due to missing or invalid API key.

Solutions:

Verify your API key is set:
```
echo $OPENAI_API_KEY
```

Test the key directly with OpenAI:

curl https://api.openai.com/v1/models \
    -H "Authorization: Bearer $OPENAI_API_KEY"

Ensure there are no extra spaces or quotes in your environment variable.

Token Limit Errors

Problem: Requests fail due to exceeding the token limit.

Solutions:

Reduce the size of your input message
Clear old conversation history for the partition/instance
Use a model with a larger context window (e.g., GPT-4-32k)
Check if context enrichment is adding too much historical data

Port Already in Use

Problem: Reservoir fails to start because port 3017 is already in use.

Solutions:

Check what's using the port:
```
lsof -i :3017
```
Use a different port:
```
export RESERVOIR_PORT=3018
```
Kill the process using the port (if safe to do so):
```
kill -9 $(lsof -ti:3017)
```

Permission Denied (macOS Service)

Problem: Service fails to start due to permission issues.

Solutions:

Check the binary path in the plist file:

cat ~/Library/LaunchAgents/com.sectorflabs.reservoir.plist

Ensure the binary exists and is executable:
```
ls -la ~/.cargo/bin/reservoir
```
Update the path in the plist if necessary

Slow Performance

Problem: Reservoir responses are slow.

Solutions:

Check Neo4j memory allocation
Ensure Neo4j data is on fast storage (SSD)
Optimize vector index settings
Reduce the number of context messages retrieved
Check network connectivity to LLM providers

Installation Questions

Do I need to install Neo4j separately?

No, the recommended approach is to use Docker Compose, which automatically sets up Neo4j for you:

docker-compose up -d

Can I use an existing Neo4j instance?

Yes, you can connect to any Neo4j instance by setting the appropriate environment variables:

NEO4J_URI=bolt://your-neo4j-host:7687
NEO4J_USERNAME=your-username
NEO4J_PASSWORD=your-password

What Rust version do I need?

Reservoir requires the latest stable version of Rust. You can install it with:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Integration Questions

How do I integrate with Chat Gipitty?

See the dedicated Chat Gipitty Integration guide for detailed setup instructions.

Can I use Reservoir with my existing Python scripts?

Yes, simply change the base URL in your OpenAI client:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3017/v1/partition/myuser/instance/myapp",
    api_key=os.environ.get("OPENAI_API_KEY")
)

How do I migrate my existing conversation data?

Reservoir provides import/export functionality:

# Export from another system (if supported)
reservoir export > conversations.json

# Import into Reservoir
reservoir import conversations.json

Advanced Usage

Can I customize the similarity threshold for context matching?

Currently, the similarity threshold (0.85) is hardcoded, but this may become configurable in future versions.

How do I backup my conversation data?

Use the export command to create backups:

reservoir export > backup-$(date +%Y%m%d).json

Can I run Reservoir in production?

Reservoir is currently designed for local development use. For production deployment, consider:

Securing the Neo4j database
Setting up proper authentication
Configuring appropriate firewall rules
Using HTTPS for external access

Getting Help

If your question isn't answered here:

Check the Common Issues section
Review the API Documentation
Look at existing GitHub issues
Create a new issue with details about your problem

Remember to include:

Your operating system
Rust version (rustc --version)
Neo4j version
Relevant log output
Steps to reproduce the issue

Contributing to Reservoir

Thank you for your interest in contributing to Reservoir! This guide will help you get started with development and contributing to the project.

Development Setup

Prerequisites

Before you begin, ensure you have the following installed:

Rust (latest stable version)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env

Docker (for Neo4j database)
- Install Docker
- Install Docker Compose
Git for version control

Step 1: Fork and Clone

Fork the repository on GitHub
Clone your fork locally:

git clone https://github.com/yourusername/reservoir.git
cd reservoir

Step 2: Start the Database

Start Neo4j using Docker Compose:

docker-compose up -d

This starts Neo4j on bolt://localhost:7687 with the default credentials.

Step 3: Environment Configuration

Create a .env file in the project root or export environment variables:

# Server Configuration
RESERVOIR_PORT=3017
RESERVOIR_HOST=127.0.0.1

# Database Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=password

# API Keys (add as needed)
OPENAI_API_KEY=sk-your-key-here
MISTRAL_API_KEY=your-mistral-key
GEMINI_API_KEY=your-gemini-key

# Provider URLs (optional)
RSV_OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
RSV_OLLAMA_BASE_URL=http://localhost:11434/v1/chat/completions

Step 4: Build and Run

# Build the project
cargo build

# Run in development mode with auto-reload
make dev

# Or run directly
cargo run -- start

Reservoir will be available at http://localhost:3017.

Development Workflow

Making Changes

Create a feature branch:

git checkout -b feature/your-feature-name

Make your changes following the coding standards below

Test your changes:

# Run tests
cargo test

# Run API tests
./hurl/test.sh

# Test specific functionality
make run

Update documentation if needed:

# Build documentation
make book

# Serve locally to preview
make serve-book

Code Standards

Rust Code Style

Use rustfmt for formatting:
```
cargo fmt
```
Use clippy for linting:
```
cargo clippy
```
Follow Rust naming conventions:
- snake_case for functions and variables
- PascalCase for types and structs
- SCREAMING_SNAKE_CASE for constants

Documentation

Document all public APIs with rustdoc comments
Include examples in documentation where helpful
Update the book documentation for user-facing changes

Testing

Write unit tests for new functionality
Add integration tests for API endpoints
Ensure existing tests pass before submitting

Commit Guidelines

Use conventional commit messages:

type(scope): description

[optional body]

[optional footer]

Types:

feat: New feature
fix: Bug fix
docs: Documentation changes
style: Code style changes (formatting, etc.)
refactor: Code refactoring
test: Adding or updating tests
chore: Maintenance tasks

Examples:

feat(api): add web search options support
fix(db): resolve connection pooling issue
docs(book): update installation guide

Testing

Unit Tests

Run unit tests:

cargo test

Integration Tests

Test the API endpoints:

# Test all endpoints
./hurl/test.sh

# Test specific endpoint
hurl --variable USER="$USER" --variable OPENAI_API_KEY="$OPENAI_API_KEY" hurl/chat_completion.hurl

Manual Testing

Start Reservoir:
```
make run
```

Test with curl:

curl "http://127.0.0.1:3017/partition/$USER/instance/test/v1/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'

Documentation

Building the Book

The documentation is built with mdBook:

# Build documentation
make book

# Serve with live reload
make serve-book

# Clean generated docs
make clean-book

Writing Documentation

Use clear, concise language
Include code examples
Test all code examples
Link related sections
Consider the user's journey

Submitting Changes

Pull Request Process

Ensure your code is ready:
- Tests pass (cargo test)
- Code is formatted (cargo fmt)
- No clippy warnings (cargo clippy)
- Documentation updated if needed
Create a pull request:
- Use a descriptive title
- Explain what changes you made and why
- Reference any related issues
- Include screenshots for UI changes
Respond to feedback:
- Address review comments promptly
- Ask questions if feedback is unclear
- Update your branch as needed

Pull Request Template

When creating a PR, include:

## Description
Brief description of changes

## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update

## Testing
- [ ] Unit tests pass
- [ ] Integration tests pass
- [ ] Manual testing completed

## Checklist
- [ ] Code follows project style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No breaking changes (or documented)

Release Process

For maintainers:

Version Bump: Update version in Cargo.toml
Changelog: Update CHANGELOG.md with changes
Tag Release: Create and push a git tag
Build Documentation: Ensure docs are up to date
Publish: Publish to crates.io if applicable

Getting Help

Issues: Check existing issues or create a new one
Discussions: Use GitHub Discussions for questions
Documentation: Refer to the full documentation at sectorflabs.com/reservoir

Define the endpoint in src/api/
Add routing logic
Implement request/response handling
Add tests
Update documentation

Adding a New AI Provider

Implement provider trait
Add configuration options
Update model routing logic
Add tests with mock responses
Document the new provider

Database Schema Changes

Create migration script in migrations/
Update data model documentation
Test migration on sample data
Ensure backward compatibility

Thank you for contributing to Reservoir!

Sector F Labs - Reservoir