Ollama Integration

Setup and configuration guide for local LLM inference using Ollama.

Table of contents

  1. Installation
    1. macOS / Linux
    2. Windows
  2. Model Management
    1. Pull Models
    2. Recommended Models
  3. Configuration
    1. Default Settings
    2. Custom Endpoint
  4. Health Check
    1. Connection Test
    2. Web UI Test
  5. Troubleshooting
    1. Connection Failed
    2. Model Not Found
  6. Performance Tuning
    1. GPU Acceleration
    2. Memory Management
  7. Related Resources

Installation

macOS / Linux

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Start service
ollama serve

Windows

Download from ollama.com


Model Management

Pull Models

# Pull popular models
ollama pull llama3.1:8b
ollama pull qwen2.5
ollama pull deepseek-coder

# List installed models
ollama list
Model Size Use Case
llama3.2 3B Fast general Q&A
llama3.1:8b 8B Balanced performance
qwen2.5 7B Chinese-optimized
deepseek-coder 6.7B Code generation

Configuration

Default Settings

OLLAMA_CONFIG = {
    "url": "http://localhost:11434",
    "default_model": "llama3.1:8b",
    "timeout": 30,
    "generation_options": {
        "temperature": 0.7,
        "top_p": 0.9,
        "top_k": 40
    }
}

Custom Endpoint

# If Ollama runs on different port/host
ollama_url = "http://192.168.1.100:11434"

Health Check

Connection Test

# Test API
curl http://localhost:11434/api/tags

# Expected response:
# {"models": [...]}

Web UI Test

  1. Navigate to RAG tab
  2. Click “Check Ollama Connection”
  3. Should show: “✅ Connected, models: […]”

Troubleshooting

Connection Failed

# Check if Ollama is running
ps aux | grep ollama

# Restart Ollama
killall ollama
ollama serve

Model Not Found

# Pull the model
ollama pull llama3.1:8b

# Verify it's installed
ollama list

Performance Tuning

GPU Acceleration

Ollama automatically uses GPU if available:

# Check GPU usage (Linux/Windows)
nvidia-smi

# macOS: Uses Metal automatically

Memory Management

# Limit model memory
export OLLAMA_MAX_LOADED_MODELS=1

# Unload unused models
ollama rm model-name