Chat Completions Endpoint

The Chat Completions endpoint is Reservoir's core API, providing full OpenAI API compatibility with intelligent context enrichment. This endpoint automatically enhances your conversations with relevant historical context while maintaining the same request/response format as OpenAI's Chat Completions API.

Endpoint URL

POST /v1/partition/{partition}/instance/{instance}/chat/completions

URL Parameters

ParameterDescriptionExample
partitionTop-level organization boundaryalice, project_name, $USER
instanceSpecific context within partitioncoding, research, session_123

Example URLs

# User-specific coding assistant
POST /v1/partition/alice/instance/coding/chat/completions

# Project-specific documentation bot  
POST /v1/partition/docs_project/instance/support/chat/completions

# Personal research assistant
POST /v1/partition/$USER/instance/research/chat/completions

# Default partition/instance (if not specified)
POST /v1/chat/completions  # Uses partition=default, instance=default

Request Format

Headers

Content-Type: application/json
Authorization: Bearer YOUR_API_KEY

Request Body

Reservoir accepts the standard OpenAI Chat Completions request format:

{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user", 
      "content": "How do I implement error handling in async functions?"
    }
  ]
}

Supported Models

OpenAI Models:

  • gpt-4.1
  • gpt-4-turbo
  • gpt-4o
  • gpt-4o-mini
  • gpt-3.5-turbo
  • gpt-4o-search-preview

Local Models (via Ollama):

  • llama3.1:8b
  • llama3.1:70b
  • mistral:7b
  • codellama:latest
  • Any Ollama-supported model

Message Roles

RoleDescriptionUsage
userUser input messagesQuestions, requests, instructions
assistantLLM responsesPrevious LLM responses in conversation
systemSystem instructionsBehavior modification, context setting

Context Enrichment Process

When you send a request, Reservoir automatically enhances it with relevant context:

1. Message Analysis

// Your original request
{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "How do I handle database timeouts?"
    }
  ]
}

2. Context Discovery

Reservoir finds relevant context through:

  • Semantic Search: Messages similar to "database timeouts"
  • Recent History: Last 15 messages from same partition/instance
  • Synapse Connections: Related discussions via SYNAPSE relationships

3. Context Injection

// Enriched request sent to the Language Model
{
  "model": "gpt-4", 
  "messages": [
    {
      "role": "system",
      "content": "The following is the result of a semantic search of the most related messages by cosine similarity to previous conversations"
    },
    {
      "role": "user",
      "content": "What's the best way to configure database connection pools?"
    },
    {
      "role": "assistant", 
      "content": "For database connection pools, consider these settings..."
    },
    {
      "role": "system",
      "content": "The following are the most recent messages in the conversation in chronological order"
    },
    {
      "role": "user",
      "content": "I'm working on optimizing database queries"
    },
    {
      "role": "assistant",
      "content": "Here are some query optimization techniques..."
    },
    {
      "role": "user",
      "content": "How do I handle database timeouts?"  // Your original message
    }
  ]
}

Response Format

Reservoir returns responses in the standard OpenAI Chat Completions format:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion", 
  "created": 1677858242,
  "model": "gpt-4",
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 7,
    "total_tokens": 20
  },
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "To handle database timeouts, you should implement retry logic with exponential backoff..."
      },
      "finish_reason": "stop",
      "index": 0
    }
  ]
}

Configuration and Model Selection

Environment Variables

Configure different LLM providers:

# OpenAI (default)
export OPENAI_API_KEY="your-openai-api-key"
export RSV_OPENAI_BASE_URL="https://api.openai.com/v1/chat/completions"

# Ollama (local)
export RSV_OLLAMA_BASE_URL="http://localhost:11434/v1/chat/completions"

# Mistral
export MISTRAL_API_KEY="your-mistral-api-key"
export RSV_MISTRAL_BASE_URL="https://api.mistral.ai/v1/chat/completions"

# Gemini
export GEMINI_API_KEY="your-gemini-api-key"

Model Detection

Reservoir automatically routes requests based on model name:

  • OpenAI models: gpt-* → OpenAI API
  • Local models: llama*, mistral*, etc. → Ollama API
  • Mistral models: mistral-* → Mistral API

Error Handling

Token Limit Errors

If your message exceeds model token limits:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Your last message is too long. It contains approximately 5000 tokens, which exceeds the maximum limit of 4096. Please shorten your message."
      },
      "finish_reason": "length",
      "index": 0
    }
  ]
}

API Connection Errors

{
  "error": {
    "message": "Failed to connect to OpenAI API: Connection timeout. Check your API key and network connection. Using model 'gpt-4' at 'https://api.openai.com/v1/chat/completions'"
  }
}

Invalid Model Errors

{
  "error": {
    "message": "Invalid OpenAI model name: 'gpt-5'. Valid models are: ['gpt-4.1', 'gpt-4-turbo', 'gpt-4o', 'gpt-4o-mini', 'gpt-3.5-turbo', 'gpt-4o-search-preview']"
  }
}

Usage Examples

Basic Request

curl -X POST "http://localhost:3017/v1/partition/alice/instance/coding/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": "Explain async/await in Python"
      }
    ]
  }'

With System Message

curl -X POST "http://localhost:3017/v1/partition/docs/instance/writing/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "system",
        "content": "You are a technical documentation expert. Provide clear, concise explanations."
      },
      {
        "role": "user",
        "content": "How should I document API endpoints?"
      }
    ]
  }'

Local Model (Ollama)

curl -X POST "http://localhost:3017/v1/partition/alice/instance/local/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [
      {
        "role": "user",
        "content": "What are the benefits of using local LLMs?"
      }
    ]
  }'

Integration Examples

Python with OpenAI Library

import openai

# Configure to use Reservoir instead of OpenAI directly
openai.api_base = "http://localhost:3017/v1/partition/alice/instance/coding"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "How do I optimize this database query?"}
    ]
)

print(response.choices[0].message.content)

JavaScript/Node.js

const OpenAI = require('openai');

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: 'http://localhost:3017/v1/partition/myapp/instance/support'
});

async function chat(message) {
  const completion = await openai.chat.completions.create({
    messages: [{ role: 'user', content: message }],
    model: 'gpt-4',
  });

  return completion.choices[0].message.content;
}

Streaming Responses

Reservoir supports streaming responses when the underlying model supports it:

import openai

openai.api_base = "http://localhost:3017/v1/partition/alice/instance/chat"

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain machine learning"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Advanced Features

Web Search Integration

Some models support web search capabilities:

{
  "model": "gpt-4o-search-preview",
  "messages": [
    {
      "role": "user",
      "content": "What are the latest developments in AI?"
    }
  ],
  "web_search_options": {
    "enabled": true
  }
}

Message Storage

All messages (user and assistant) are automatically stored with:

  • Embeddings: For semantic search and context enrichment
  • Timestamps: For chronological ordering
  • Partition/Instance: For data organization
  • Trace IDs: For linking request/response pairs

Context Control

Control context enrichment via configuration:

# Adjust context size
reservoir config --set semantic_context_size=20
reservoir config --set recent_context_size=15

# View current settings
reservoir config --get semantic_context_size

Performance Considerations

Token Management

  • Reservoir automatically manages token limits for each model
  • Context is intelligently truncated when necessary
  • Priority given to most relevant and recent content

Caching

  • Embeddings are cached to avoid recomputation
  • Vector indices are optimized for fast similarity search
  • Connection pooling for database efficiency

Latency

  • Typical latency: 200-500ms for context enrichment
  • Parallel processing of semantic search and recent history
  • Optimized Neo4j queries for fast retrieval

The Chat Completions endpoint provides the full power of Reservoir's context enrichment while maintaining complete compatibility with existing OpenAI-based applications, making it easy to add conversational memory to any LLM application.