Ollama Integration

Setup and configuration guide for local LLM inference using Ollama.

Installation
1. macOS / Linux
2. Windows
Model Management
1. Pull Models
2. Recommended Models
Configuration
1. Default Settings
2. Custom Endpoint
Health Check
1. Connection Test
2. Web UI Test
Troubleshooting
1. Connection Failed
2. Model Not Found
Performance Tuning
1. GPU Acceleration
2. Memory Management
Related Resources

Installation

macOS / Linux

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Start service
ollama serve

Windows

Download from ollama.com

Model Management

Pull Models

# Pull popular models
ollama pull llama3.1:8b
ollama pull qwen2.5
ollama pull deepseek-coder

# List installed models
ollama list

Recommended Models

Model	Size	Use Case
llama3.2	3B	Fast general Q&A
llama3.1:8b	8B	Balanced performance
qwen2.5	7B	Chinese-optimized
deepseek-coder	6.7B	Code generation

Configuration

Default Settings

OLLAMA_CONFIG = {
    "url": "http://localhost:11434",
    "default_model": "llama3.1:8b",
    "timeout": 30,
    "generation_options": {
        "temperature": 0.7,
        "top_p": 0.9,
        "top_k": 40
    }
}

Custom Endpoint

# If Ollama runs on different port/host
ollama_url = "http://192.168.1.100:11434"

Health Check

Connection Test

# Test API
curl http://localhost:11434/api/tags

# Expected response:
# {"models": [...]}

Web UI Test

Navigate to RAG tab
Click “Check Ollama Connection”
Should show: “✅ Connected, models: […]”

Troubleshooting

Connection Failed

# Check if Ollama is running
ps aux | grep ollama

# Restart Ollama
killall ollama
ollama serve

Model Not Found

# Pull the model
ollama pull llama3.1:8b

# Verify it's installed
ollama list

Performance Tuning

GPU Acceleration

Ollama automatically uses GPU if available:

# Check GPU usage (Linux/Windows)
nvidia-smi

# macOS: Uses Metal automatically

Memory Management

# Limit model memory
export OLLAMA_MAX_LOADED_MODELS=1

# Unload unused models
ollama rm model-name

Ollama Integration

Table of contents

Installation

macOS / Linux

Windows

Model Management

Pull Models

Recommended Models

Configuration

Default Settings

Custom Endpoint

Health Check

Connection Test

Web UI Test

Troubleshooting

Connection Failed

Model Not Found

Performance Tuning

GPU Acceleration

Memory Management

Related Resources