Ollama Integration
Setup and configuration guide for local LLM inference using Ollama.
Table of contents
- Installation
- Model Management
- Configuration
- Health Check
- Troubleshooting
- Performance Tuning
- Related Resources
Installation
macOS / Linux
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Start service
ollama serve
Windows
Download from ollama.com
Model Management
Pull Models
# Pull popular models
ollama pull llama3.1:8b
ollama pull qwen2.5
ollama pull deepseek-coder
# List installed models
ollama list
Recommended Models
| Model | Size | Use Case |
|---|---|---|
| llama3.2 | 3B | Fast general Q&A |
| llama3.1:8b | 8B | Balanced performance |
| qwen2.5 | 7B | Chinese-optimized |
| deepseek-coder | 6.7B | Code generation |
Configuration
Default Settings
OLLAMA_CONFIG = {
"url": "http://localhost:11434",
"default_model": "llama3.1:8b",
"timeout": 30,
"generation_options": {
"temperature": 0.7,
"top_p": 0.9,
"top_k": 40
}
}
Custom Endpoint
# If Ollama runs on different port/host
ollama_url = "http://192.168.1.100:11434"
Health Check
Connection Test
# Test API
curl http://localhost:11434/api/tags
# Expected response:
# {"models": [...]}
Web UI Test
- Navigate to RAG tab
- Click “Check Ollama Connection”
- Should show: “✅ Connected, models: […]”
Troubleshooting
Connection Failed
# Check if Ollama is running
ps aux | grep ollama
# Restart Ollama
killall ollama
ollama serve
Model Not Found
# Pull the model
ollama pull llama3.1:8b
# Verify it's installed
ollama list
Performance Tuning
GPU Acceleration
Ollama automatically uses GPU if available:
# Check GPU usage (Linux/Windows)
nvidia-smi
# macOS: Uses Metal automatically
Memory Management
# Limit model memory
export OLLAMA_MAX_LOADED_MODELS=1
# Unload unused models
ollama rm model-name