Token Management

Reservoir intelligently manages token limits to ensure optimal context enrichment while staying within model constraints. The system automatically calculates token usage, prioritizes the most relevant context, and truncates content when necessary to fit within API limits.

Context Token Management

Automatic Context Sizing

Reservoir dynamically adjusts context size based on:

Model Token Limits: Respects each model's maximum context window
Content Priority: Prioritizes most relevant and recent context
Message Truncation: Intelligently cuts content when limits are exceeded
Reserve Allocation: Maintains buffer for user input and model response

Token Calculation

The system estimates token usage using standard approximations:

English Text: ~4 characters per token
Code Content: ~3 characters per token (more tokens due to syntax)
Special Characters: Variable token usage
Embeddings: Not included in context token count

Context Building Strategy

flowchart TD
    A["User Message Arrives"] --> B["Calculate Available Tokens"]
    B --> C["Get Semantic Context"]
    C --> D["Get Recent History"]
    D --> E["Combine Context Sources"]
    E --> F{"Within Token Limit?"}
    F -->|Yes| G["Use Full Context"]
    F -->|No| H["Prioritize and Truncate"]
    H --> I["Recent Messages Priority"]
    I --> J["High Similarity Priority"]  
    J --> K["Truncate Oldest/Lowest Score"]
    K --> G
    G --> L["Send to Model"]

Token Limits by Model

OpenAI Models

Model	Context Window	Reservoir Reserve	Available for Context
GPT-3.5-turbo	4,096 tokens	1,024 tokens	~3,000 tokens
GPT-4	8,192 tokens	2,048 tokens	~6,000 tokens
GPT-4-turbo	128,000 tokens	8,000 tokens	~120,000 tokens
GPT-4o	128,000 tokens	8,000 tokens	~120,000 tokens

Local Models (Ollama)

Model	Context Window	Reservoir Reserve	Available for Context
Llama 3.1 8B	32,768 tokens	2,048 tokens	~30,000 tokens
Llama 3.1 70B	32,768 tokens	2,048 tokens	~30,000 tokens
Mistral 7B	32,768 tokens	2,048 tokens	~30,000 tokens
CodeLlama	16,384 tokens	1,024 tokens	~15,000 tokens

Context Prioritization

Priority Order

When token limits are exceeded, Reservoir prioritizes context in this order:

User's Current Message: Always included (highest priority)
Recent History: Last 15 messages from same partition/instance
High Similarity Matches: Messages with similarity score > 0.85
Synapse Connections: Messages connected via SYNAPSE relationships
Older Context: Historical messages (first to be truncated)

Similarity-Based Prioritization

Context is ranked by relevance:

Priority Score = (Similarity Score × 0.7) + (Recency Score × 0.3)

Where:
- Similarity Score: 0.0-1.0 from semantic search
- Recency Score: 0.0-1.0 based on message age

Truncation Strategy

When content must be truncated:

Message-Level Truncation: Remove entire messages (preserves coherence)
LIFO for Semantic: Last-In-First-Out for semantic matches
FIFO for Recent: First-In-First-Out for chronological history
Preserve Pairs: Keep user/assistant pairs together when possible

Configuration Options

Context Size Limits

Configure via environment variables or config file:

# Set maximum semantic context messages
reservoir config --set semantic_context_size=20

# Set recent history limit
reservoir config --set recent_context_size=15

# Set token reserve buffer
reservoir config --set token_reserve=2048

Model-Specific Overrides

# In reservoir.toml
[models.gpt-4-turbo]
max_context_tokens = 120000
reserve_tokens = 8000
semantic_context_size = 50

[models.gpt-3.5-turbo]
max_context_tokens = 4096
reserve_tokens = 1024
semantic_context_size = 10

Token Usage Monitoring

Built-in Monitoring

Reservoir automatically tracks:

Input Tokens: Context + user message tokens
Reserve Usage: How much buffer is being used
Truncation Events: When content is cut due to limits
Model Utilization: Percentage of context window used

Usage Examples

# View recent messages with estimated token usage
reservoir view 10 | while read -r line; do
    echo "$line (est. tokens: $((${#line}/4)))"
done

# Estimate total context size
TOTAL_CHARS=$(reservoir view 15 | wc -c)
echo "Estimated tokens: $((TOTAL_CHARS/4))"

# Check if context might be truncated for a model
CONTEXT_SIZE=$(($(reservoir view 15 | wc -c) / 4))
echo "Context tokens: $CONTEXT_SIZE"
echo "Fits in GPT-3.5: $([ $CONTEXT_SIZE -lt 3000 ] && echo 'Yes' || echo 'No')"

Optimization Strategies

Reduce Context Size

Adjust Semantic Context

# Reduce semantic matches
reservoir config --set semantic_context_size=10

# Increase similarity threshold (fewer matches)
# Note: This requires code modification currently

Limit Recent History

# Reduce recent message count
reservoir config --set recent_context_size=8

Improve Context Quality

Use Higher Similarity Threshold

Fewer but more relevant semantic matches
Better context quality with less noise
Requires code-level configuration changes

Partition Strategy

Use specific partitions for focused contexts
Separate unrelated discussions
Improves relevance within token limits

# Focused partition for coding discussions
echo "Python async/await question" | reservoir ingest --partition alice --instance coding

# Separate partition for general chat
echo "Weather discussion" | reservoir ingest --partition alice --instance general

Model-Specific Considerations

Small Context Models (GPT-3.5)

Optimization Strategy:

Prioritize recent messages heavily
Limit semantic context to top 5-10 matches
Use aggressive truncation
Consider shorter message summaries

# Configuration for small context models
reservoir config --set semantic_context_size=5
reservoir config --set recent_context_size=8

Large Context Models (GPT-4-turbo)

Utilization Strategy:

Include extensive semantic context
Preserve longer conversation history
Enable deeper synapse exploration
Allow for more comprehensive context

# Configuration for large context models
reservoir config --set semantic_context_size=30
reservoir config --set recent_context_size=25

Advanced Token Management

Dynamic Context Adjustment

Reservoir can adjust context based on content type:

Code-Heavy Contexts: Reduce character-to-token ratio assumption Natural Language: Use standard ratios Mixed Content: Apply weighted calculations

Future Enhancements

Planned Features:

Semantic Summarization: Summarize older context instead of truncating
Token-Aware Similarity: Consider token cost in similarity ranking
Model-Aware Optimization: Automatic settings per model
Context Compression: Compress historical context intelligently

Custom Token Strategies

Per-Partition Settings

# Different strategies for different use cases
reservoir config --set partitions.coding.semantic_context_size=20
reservoir config --set partitions.research.recent_context_size=30

Content-Type Awareness

# Adjust for code vs text heavy partitions
reservoir config --set partitions.coding.token_multiplier=1.3
reservoir config --set partitions.writing.token_multiplier=0.9

Troubleshooting Token Issues

Common Problems

Context Too Large

# Symptoms: API errors about token limits
# Solution: Reduce context sizes
reservoir config --set semantic_context_size=10
reservoir config --set recent_context_size=5

Context Too Small

# Symptoms: Poor context quality, missing relevant information
# Solution: Increase context sizes (if model supports it)
reservoir config --set semantic_context_size=25
reservoir config --set recent_context_size=20

Frequent Truncation

# Symptoms: Important context being cut off
# Solution: Use larger context model or adjust priorities

Diagnostic Commands

# Estimate current context size
SEMANTIC_SIZE=$(reservoir search --semantic "test" | wc -c)
RECENT_SIZE=$(reservoir view 15 | wc -c)
TOTAL_SIZE=$((SEMANTIC_SIZE + RECENT_SIZE))
echo "Total context estimate: $((TOTAL_SIZE/4)) tokens"

# Check truncation frequency
# (This would require log analysis)
grep -i "truncat" /var/log/reservoir.log | wc -l

Token management in Reservoir ensures optimal AI performance by providing the right amount of relevant context while respecting model limitations, creating an intelligent balance between comprehensive memory and computational efficiency.

Sector F Labs - Reservoir