Chat Completions Endpoint

The Chat Completions endpoint is Reservoir's core API, providing full OpenAI API compatibility with intelligent context enrichment. This endpoint automatically enhances your conversations with relevant historical context while maintaining the same request/response format as OpenAI's Chat Completions API.

Endpoint URL

POST /v1/partition/{partition}/instance/{instance}/chat/completions

URL Parameters

Parameter	Description	Example
`partition`	Top-level organization boundary	`alice`, `project_name`, `$USER`
`instance`	Specific context within partition	`coding`, `research`, `session_123`

Example URLs

# User-specific coding assistant
POST /v1/partition/alice/instance/coding/chat/completions

# Project-specific documentation bot  
POST /v1/partition/docs_project/instance/support/chat/completions

# Personal research assistant
POST /v1/partition/$USER/instance/research/chat/completions

# Default partition/instance (if not specified)
POST /v1/chat/completions  # Uses partition=default, instance=default

Request Format

Headers

Content-Type: application/json
Authorization: Bearer YOUR_API_KEY

Request Body

Reservoir accepts the standard OpenAI Chat Completions request format:

{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user", 
      "content": "How do I implement error handling in async functions?"
    }
  ]
}

Supported Models

OpenAI Models:

gpt-4.1
gpt-4-turbo
gpt-4o
gpt-4o-mini
gpt-3.5-turbo
gpt-4o-search-preview

Local Models (via Ollama):

llama3.1:8b
llama3.1:70b
mistral:7b
codellama:latest
Any Ollama-supported model

Message Roles

Role	Description	Usage
`user`	User input messages	Questions, requests, instructions
`assistant`	LLM responses	Previous LLM responses in conversation
`system`	System instructions	Behavior modification, context setting

Context Enrichment Process

When you send a request, Reservoir automatically enhances it with relevant context:

1. Message Analysis

// Your original request
{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "How do I handle database timeouts?"
    }
  ]
}

2. Context Discovery

Reservoir finds relevant context through:

Semantic Search: Messages similar to "database timeouts"
Recent History: Last 15 messages from same partition/instance
Synapse Connections: Related discussions via SYNAPSE relationships

3. Context Injection

// Enriched request sent to the Language Model
{
  "model": "gpt-4", 
  "messages": [
    {
      "role": "system",
      "content": "The following is the result of a semantic search of the most related messages by cosine similarity to previous conversations"
    },
    {
      "role": "user",
      "content": "What's the best way to configure database connection pools?"
    },
    {
      "role": "assistant", 
      "content": "For database connection pools, consider these settings..."
    },
    {
      "role": "system",
      "content": "The following are the most recent messages in the conversation in chronological order"
    },
    {
      "role": "user",
      "content": "I'm working on optimizing database queries"
    },
    {
      "role": "assistant",
      "content": "Here are some query optimization techniques..."
    },
    {
      "role": "user",
      "content": "How do I handle database timeouts?"  // Your original message
    }
  ]
}

Response Format

Reservoir returns responses in the standard OpenAI Chat Completions format:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion", 
  "created": 1677858242,
  "model": "gpt-4",
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 7,
    "total_tokens": 20
  },
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "To handle database timeouts, you should implement retry logic with exponential backoff..."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ]
}

Configuration and Model Selection

Environment Variables

Configure different LLM providers:

# OpenAI (default)
export OPENAI_API_KEY="your-openai-api-key"
export RSV_OPENAI_BASE_URL="https://api.openai.com/v1/chat/completions"

# Ollama (local)
export RSV_OLLAMA_BASE_URL="http://localhost:11434/v1/chat/completions"

# Mistral
export MISTRAL_API_KEY="your-mistral-api-key"
export RSV_MISTRAL_BASE_URL="https://api.mistral.ai/v1/chat/completions"

# Gemini
export GEMINI_API_KEY="your-gemini-api-key"

Model Detection

Reservoir automatically routes requests based on model name:

OpenAI models: gpt-* → OpenAI API
Local models: llama*, mistral*, etc. → Ollama API
Mistral models: mistral-* → Mistral API

Error Handling

Token Limit Errors

If your message exceeds model token limits:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Your last message is too long. It contains approximately 5000 tokens, which exceeds the maximum limit of 4096. Please shorten your message."
      },
      "finish_reason": "length",
      "index": 0
    }
  ]
}

API Connection Errors

{
  "error": {
    "message": "Failed to connect to OpenAI API: Connection timeout. Check your API key and network connection. Using model 'gpt-4' at 'https://api.openai.com/v1/chat/completions'"
  }
}

Invalid Model Errors

{
  "error": {
    "message": "Invalid OpenAI model name: 'gpt-5'. Valid models are: ['gpt-4.1', 'gpt-4-turbo', 'gpt-4o', 'gpt-4o-mini', 'gpt-3.5-turbo', 'gpt-4o-search-preview']"
  }
}

Usage Examples

Basic Request

curl -X POST "http://localhost:3017/v1/partition/alice/instance/coding/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": "Explain async/await in Python"
      }
    ]
  }'

With System Message

curl -X POST "http://localhost:3017/v1/partition/docs/instance/writing/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "system",
        "content": "You are a technical documentation expert. Provide clear, concise explanations."
      },
      {
        "role": "user",
        "content": "How should I document API endpoints?"
      }
    ]
  }'

Local Model (Ollama)

curl -X POST "http://localhost:3017/v1/partition/alice/instance/local/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [
      {
        "role": "user",
        "content": "What are the benefits of using local LLMs?"
      }
    ]
  }'

Integration Examples

Python with OpenAI Library

import openai

# Configure to use Reservoir instead of OpenAI directly
openai.api_base = "http://localhost:3017/v1/partition/alice/instance/coding"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "How do I optimize this database query?"}
    ]
)

print(response.choices[0].message.content)

JavaScript/Node.js

const OpenAI = require('openai');

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: 'http://localhost:3017/v1/partition/myapp/instance/support'
});

async function chat(message) {
  const completion = await openai.chat.completions.create({
    messages: [{ role: 'user', content: message }],
    model: 'gpt-4',
  });

  return completion.choices[0].message.content;
}

Streaming Responses

Reservoir supports streaming responses when the underlying model supports it:

import openai

openai.api_base = "http://localhost:3017/v1/partition/alice/instance/chat"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain machine learning"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Advanced Features

Web Search Integration

Some models support web search capabilities:

{
  "model": "gpt-4o-search-preview",
  "messages": [
    {
      "role": "user",
      "content": "What are the latest developments in AI?"
    }
  ],
  "web_search_options": {
    "enabled": true
  }
}

Message Storage

All messages (user and assistant) are automatically stored with:

Embeddings: For semantic search and context enrichment
Timestamps: For chronological ordering
Partition/Instance: For data organization
Trace IDs: For linking request/response pairs

Context Control

Control context enrichment via configuration:

# Adjust context size
reservoir config --set semantic_context_size=20
reservoir config --set recent_context_size=15

# View current settings
reservoir config --get semantic_context_size

Performance Considerations

Token Management

Reservoir automatically manages token limits for each model
Context is intelligently truncated when necessary
Priority given to most relevant and recent content

Caching

Embeddings are cached to avoid recomputation
Vector indices are optimized for fast similarity search
Connection pooling for database efficiency

Latency

Typical latency: 200-500ms for context enrichment
Parallel processing of semantic search and recent history
Optimized Neo4j queries for fast retrieval

The Chat Completions endpoint provides the full power of Reservoir's context enrichment while maintaining complete compatibility with existing OpenAI-based applications, making it easy to add conversational memory to any LLM application.

Sector F Labs - Reservoir