Data Management
Reservoir provides comprehensive data management capabilities for backing up, migrating, and organizing your conversation data. The system supports full data export/import, individual message management, and flexible partitioning strategies.
Export and Import
Export All Data
Export your entire conversation history as JSON for backup or migration:
# Export all messages to stdout
reservoir export
# Save to file with timestamp
reservoir export > backup_$(date +%Y%m%d_%H%M%S).json
# Export and compress for storage
reservoir export | gzip > reservoir_backup.json.gz
Export Format: Each message is exported as a complete MessageNode with all metadata:
[
{
"trace_id": "550e8400-e29b-41d4-a716-446655440000",
"partition": "default",
"instance": "default",
"role": "user",
"content": "How do I implement error handling in async functions?",
"timestamp": "2024-01-15T10:30:00.000Z",
"embedding": [0.123, -0.456, 0.789, ...],
"url": null
},
{
"trace_id": "550e8400-e29b-41d4-a716-446655440001",
"partition": "default",
"instance": "default",
"role": "assistant",
"content": "Here are several approaches to error handling in async functions...",
"timestamp": "2024-01-15T10:30:15.000Z",
"embedding": [0.234, -0.567, 0.890, ...],
"url": null
}
]
Import Data
Import message data from JSON files:
# Import from a backup file
reservoir import backup_20240115.json
# Import from another Reservoir instance
reservoir import exported_conversations.json
# Import compressed backup
gunzip -c reservoir_backup.json.gz | reservoir import /dev/stdin
Import Behavior:
- Validates JSON format and MessageNode structure
- Preserves all metadata including timestamps and embeddings
- Maintains partition/instance organization
- Skips duplicate messages (based on trace_id)
- Rebuilds relationships and synapses
Migration Workflows
Complete System Migration:
# On source system
reservoir export > full_backup.json
# Transfer file to new system
scp full_backup.json user@newserver:/path/to/reservoir/
# On destination system
reservoir import full_backup.json
# Verify migration
reservoir view 10
reservoir search --semantic "test query"
Selective Migration:
# Export from specific partition
reservoir export | jq '.[] | select(.partition=="alice")' > alice_messages.json
# Import to different partition (requires manual editing or processing)
# Edit JSON to change partition names, then import
reservoir import alice_messages.json
Message Management
Manual Message Ingestion
Add messages manually for testing, note-taking, or data entry:
# Add a user message
echo "How do I configure Neo4j for production?" | reservoir ingest
# Add to specific partition/instance
echo "Remember to update dependencies" | reservoir ingest --partition alice --instance notes
# Add assistant message
echo "Here's the production Neo4j configuration..." | reservoir ingest --role assistant
# Ingest from file
cat meeting_notes.txt | reservoir ingest --partition team --instance meetings
Use Cases:
- Documentation: Add important information manually
- Testing: Create test scenarios with known data
- Migration: Import data from other systems
- Notes: Add personal reminders or observations
Viewing Recent Data
Monitor recent activity and verify data integrity:
# View last 10 messages
reservoir view 10
# View from specific partition
reservoir view --partition alice 15
# View from specific instance
reservoir view --partition alice --instance coding 20
# Pipe to other tools for analysis
reservoir view 50 | grep -i "error" | wc -l
Partitioning Strategy
Organizational Structure
Reservoir uses a two-level organizational hierarchy:
- Partition: High-level boundary (user, project, team)
- Instance: Sub-boundary within partition (topic, session, category)
default/
├── default/ # General conversations
├── coding/ # Programming discussions
└── research/ # Research and analysis
alice/
├── personal/ # Personal conversations
├── work/ # Work-related discussions
└── learning/ # Educational content
team/
├── meetings/ # Team meeting notes
├── planning/ # Project planning
└── retrospectives/ # Review sessions
Partition Management
Creating Partitions: Partitions are created automatically when first used:
# Create new partition by using it
echo "Starting new project discussions" | reservoir ingest --partition newproject
# Create instance within partition
echo "Technical architecture discussion" | reservoir ingest --partition newproject --instance architecture
Partition Benefits:
- Isolation: Keep different contexts separate
- Search Scoping: Limit searches to relevant content
- Access Control: Enable future access restrictions
- Organization: Maintain clean separation of concerns
Data Isolation
Partitions provide logical isolation:
- Context Enrichment: Only includes messages from same partition/instance
- Search: Can be scoped to specific partitions
- Export: Can filter by partition (with additional tooling)
- Privacy: Enables separation of personal/professional content
Data Integrity
Backup Strategies
Daily Backups:
#!/bin/bash
# Daily backup script
BACKUP_DIR="/backup/reservoir"
DATE=$(date +%Y%m%d)
TIMESTAMP=$(date +%H%M%S)
# Create backup directory
mkdir -p "$BACKUP_DIR/$DATE"
# Export data
reservoir export > "$BACKUP_DIR/$DATE/reservoir_$TIMESTAMP.json"
# Compress older backups
find "$BACKUP_DIR" -name "*.json" -mtime +7 -exec gzip {} \;
# Clean old backups (keep 30 days)
find "$BACKUP_DIR" -name "*.json.gz" -mtime +30 -delete
# Log backup
echo "$(date): Backup completed - $BACKUP_DIR/$DATE/reservoir_$TIMESTAMP.json" >> /var/log/reservoir_backup.log
Incremental Exports:
# Export recent messages (last 24 hours)
reservoir view 1000 | jq -r '.[] | select(.timestamp > "'$(date -d '1 day ago' -Iseconds)'")' > incremental_backup.json
Data Validation
Verify Data Integrity:
# Check message count
TOTAL_MESSAGES=$(reservoir export | jq length)
echo "Total messages: $TOTAL_MESSAGES"
# Verify embeddings
EMBEDDED_COUNT=$(reservoir export | jq '[.[] | select(.embedding != null)] | length')
echo "Messages with embeddings: $EMBEDDED_COUNT"
# Check partition distribution
reservoir export | jq -r '.[] | .partition' | sort | uniq -c
Recovery Procedures
Restore from Backup:
# Stop Reservoir (if running as service)
systemctl stop reservoir
# Clear existing data (WARNING: destructive)
# This requires manual Neo4j database clearing
# Import backup
reservoir import /backup/reservoir/20240115/reservoir_full.json
# Verify restoration
reservoir view 10
reservoir search --semantic "test"
# Restart service
systemctl start reservoir
Advanced Data Operations
Data Analysis
Export for Analysis:
# Export specific fields for analysis
reservoir export | jq -r '.[] | [.timestamp, .partition, .role, (.content | length)] | @csv' > message_stats.csv
# Analyze conversation patterns
reservoir export | jq -r '.[] | .partition' | sort | uniq -c | sort -nr
# Find most active time periods
reservoir export | jq -r '.[] | .timestamp[0:10]' | sort | uniq -c | sort -nr
Data Transformation
Format Conversion:
# Convert to CSV format
reservoir export | jq -r '.[] | [.timestamp, .partition, .instance, .role, .content] | @csv' > conversations.csv
# Extract just message content
reservoir export | jq -r '.[] | .content' > all_messages.txt
# Create markdown format
reservoir export | jq -r '.[] | "## " + .timestamp + " (" + .role + ")\n\n" + .content + "\n"' > conversations.md
Embedding Management
Replay Embeddings: When embedding models change or for data recovery:
# Replay embeddings for all messages
reservoir replay
# Replay for specific model/partition
reservoir replay bge-large-en-v15
# Monitor embedding progress
# (Check logs for embedding generation status)
tail -f /var/log/reservoir.log | grep -i embedding
Best Practices
Regular Maintenance
- Schedule Regular Backups: Daily exports with compression
- Monitor Disk Usage: Embeddings require significant storage
- Validate Data Integrity: Regular checks for missing embeddings
- Clean Old Logs: Rotate and archive log files
- Test Recovery: Periodically test backup restoration
Storage Optimization
- Compress Backups: Use gzip for long-term storage
- Archive Old Data: Move historical data to cold storage
- Monitor Neo4j Storage: Regular database maintenance
- Embedding Efficiency: Consider embedding model size vs. quality
Security Considerations
- Encrypt Backups: Sensitive conversation data should be encrypted
- Access Controls: Limit access to export/import capabilities
- Audit Trails: Log all data management operations
- Data Retention: Define policies for data lifecycle management
Data management in Reservoir is designed to be straightforward while providing enterprise-grade capabilities for backup, migration, and organization of your conversation data.