CTR Prediction Models

Machine learning models for predicting click-through rates to improve search ranking and recommendation quality.

Overview
Supported Models
Feature Engineering
1. Feature Types
2. Feature Extraction Pipeline
Training Process
Model Evaluation
1. Evaluation Metrics
2. Evaluation Example
Usage Guide
Best Practices
Troubleshooting
Related Resources

Overview

What is CTR Prediction?

Click-Through Rate (CTR) prediction is the task of estimating the probability that a user will click on a search result or recommendation item. It’s a fundamental component of modern search and recommendation systems.

Key Concept:

CTR = Number of clicks / Number of impressions
Higher CTR indicates better relevance and user satisfaction
CTR prediction enables intelligent ranking beyond simple keyword matching

Why CTR Prediction Matters

Business Value:

User Experience: Show most relevant results first, improving user satisfaction
Engagement: Higher CTR means users find what they’re looking for
Revenue: In advertising systems, CTR directly impacts revenue
Personalization: Adapt ranking to individual user preferences

Technical Benefits:

Beyond TF-IDF: TF-IDF only considers text similarity, CTR considers user behavior
Learning from Feedback: System improves as users interact with results
Handling Cold Start: Predict CTR for new items without historical data
Multi-Factor Ranking: Combine content relevance, user preferences, and context

How CTR Prediction Works

Basic Workflow:

graph LR
    A[User Query] --> B[Retrieve Candidates]
    B --> C[Extract Features]
    C --> D[CTR Model]
    D --> E[Predicted CTR]
    E --> F[Rank Results]
    F --> G[Display to User]
    G --> H[User Clicks?]
    H --> I[Update Training Data]
    I --> D

Key Steps:

Feature Extraction: Extract features from query, document, and user context
Model Prediction: Use trained model to predict CTR for each candidate
Ranking: Sort results by predicted CTR (and other factors)
Feedback Loop: Collect user clicks to improve model over time

Supported Models

The system supports two model types, each with different characteristics:

Model Comparison

Model	Complexity	Interpretability	Performance	Use Case
Logistic Regression	Low	High	Good baseline	Quick prototyping, interpretable results
Wide & Deep	High	Medium	Better accuracy	Production systems, complex patterns

1. Logistic Regression

Overview

What is Logistic Regression? A linear model that uses the logistic function to map features to a probability between 0 and 1. It’s simple, interpretable, and provides a strong baseline for CTR prediction.

Why Use Logistic Regression?

✅ Fast Training: Efficient optimization
✅ Interpretable: Feature coefficients show importance
✅ Stable: Less prone to overfitting
✅ Baseline: Good starting point before trying complex models

Limitations:

❌ Linear Assumptions: Cannot capture complex feature interactions
❌ Manual Feature Engineering: Requires careful feature design

Model Architecture

Mathematical Formulation:

Prediction Function:

Logistic Regression models the probability of a click using the sigmoid function:

\[P(\text{click} = 1 \mid \mathbf{x}) = \sigma(\mathbf{w}^T \mathbf{x} + b) = \frac{1}{1 + e^{-(\mathbf{w}^T \mathbf{x} + b)}}\]

Where:

$\mathbf{x} \in \mathbb{R}^d$: Feature vector (7-dimensional in our system)
$\mathbf{w} \in \mathbb{R}^d$: Learned weight vector (one weight per feature)
$b \in \mathbb{R}$: Bias term (intercept)
$\sigma(z) = \frac{1}{1 + e^{-z}}$: Sigmoid function, maps real numbers to [0, 1]
$\mathbf{w}^T \mathbf{x} + b$: Linear combination of features (logit)

Training Objective:

The model is trained to minimize the binary cross-entropy loss:

\[\mathcal{L} = -\frac{1}{n} \sum_{i=1}^{n} \left[ y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i) \right]\]

Where:

$n$: Number of training samples
$y_i \in {0, 1}$: True label (1 = clicked, 0 = not clicked)
$\hat{y}_i = P(\text{click} = 1 \mid \mathbf{x}_i)$: Predicted probability for sample $i$

Understanding the Formula:

Sigmoid Function: Transforms the linear combination into a probability between 0 and 1
Logit: The term $\mathbf{w}^T \mathbf{x} + b$ represents the log-odds of a click
Loss Function: Penalizes confident wrong predictions more than uncertain ones
Optimization: Gradient descent finds weights that minimize the loss on training data

Intuitive Explanation:

If $\mathbf{w}^T \mathbf{x} + b$ is large and positive → High click probability (close to 1)
If $\mathbf{w}^T \mathbf{x} + b$ is large and negative → Low click probability (close to 0)
If $\mathbf{w}^T \mathbf{x} + b = 0$ → Neutral probability (0.5)
Feature weights indicate how much each feature contributes to the click probability

Feature Engineering

7-Dimensional Feature Vector:

features = {
    "position": 1,              # Rank position (1, 2, 3, ...)
    "query_length": 5,           # Number of words in query
    "doc_length": 1000,          # Document length in characters
    "tfidf_score": 0.85,        # TF-IDF relevance score
    "match_ratio": 0.6,         # Query-document match ratio
    "historical_ctr": 0.12,     # Historical CTR for this doc
    "user_click_history": 0.3   # User's past click rate
}

Implementation

from sklearn.linear_model import LogisticRegression

class CTRModel:
    """Logistic Regression CTR Model"""
    
    def __init__(self):
        self.model = LogisticRegression(
            max_iter=1000,
            solver='lbfgs',
            C=1.0  # Regularization strength
        )
    
    def train(self, X, y):
        """Train on feature matrix X and labels y"""
        self.model.fit(X, y)
        return self
    
    def predict(self, X):
        """Predict CTR probabilities"""
        return self.model.predict_proba(X)[:, 1]  # Return probability of class 1

2. Wide & Deep Neural Network

Overview

What is Wide & Deep? A neural network architecture that combines:

Wide Component: Linear model for memorizing feature interactions
Deep Component: Multi-layer neural network for learning complex patterns

Why Use Wide & Deep?

✅ Best of Both Worlds: Memorization (wide) + Generalization (deep)
✅ Automatic Feature Learning: Deep layers learn complex interactions
✅ Handles Sparse Features: Wide component handles categorical features
✅ Better Accuracy: Typically outperforms linear models

Architecture:

graph TB
    A[Input Features] --> B[Wide Path]
    A --> C[Deep Path]
    
    B --> B1[Linear Layer]
    B1 --> D[Concatenate]
    
    C --> C1[Embedding Layer]
    C1 --> C2[Dense Layer 1]
    C2 --> C3[Dense Layer 2]
    C3 --> D
    
    D --> E[Output Layer]
    E --> F[CTR Prediction]

Model Architecture

Wide Component:

Linear model: $y_{\text{wide}} = \mathbf{w}{\text{wide}}^T \mathbf{x} + b{\text{wide}}$
Captures explicit feature interactions
Good for sparse, categorical features

Deep Component:

Multi-layer feedforward network
Learns implicit feature interactions
Good for dense, continuous features

Combined Output:

\[P(\text{click} = 1 \mid \mathbf{x}) = \sigma(y_{\text{wide}} + y_{\text{deep}})\]

Implementation

import tensorflow as tf
from tensorflow import keras

class WideAndDeepCTRModel:
    """Wide & Deep CTR Model"""
    
    def __init__(self, input_dim, hidden_units=[128, 64]):
        # Wide component (linear)
        wide_input = keras.Input(shape=(input_dim,), name='wide_input')
        wide_output = keras.layers.Dense(1, activation='linear')(wide_input)
        
        # Deep component
        deep_input = keras.Input(shape=(input_dim,), name='deep_input')
        deep = deep_input
        for units in hidden_units:
            deep = keras.layers.Dense(units, activation='relu')(deep)
            deep = keras.layers.Dropout(0.3)(deep)
        deep_output = keras.layers.Dense(1, activation='linear')(deep)
        
        # Combine
        combined = keras.layers.Add()([wide_output, deep_output])
        output = keras.layers.Dense(1, activation='sigmoid')(combined)
        
        self.model = keras.Model(
            inputs=[wide_input, deep_input],
            outputs=output
        )
        
        self.model.compile(
            optimizer='adam',
            loss='binary_crossentropy',
            metrics=['accuracy', 'AUC']
        )
    
    def train(self, X, y, epochs=10, batch_size=32):
        """Train the model"""
        self.model.fit(
            [X, X],  # Same input for wide and deep
            y,
            epochs=epochs,
            batch_size=batch_size,
            validation_split=0.2
        )
        return self
    
    def predict(self, X):
        """Predict CTR probabilities"""
        return self.model.predict([X, X]).flatten()

Feature Engineering

Feature Types

1. Position Features

Rank position in search results
Higher positions typically have higher CTR
Example: position = 1, 2, 3, ...

2. Query Features

Query length, complexity
Query type (informational, navigational, transactional)
Example: query_length = 5, query_type = "informational"

3. Document Features

Document length, quality score
Historical performance metrics
Example: doc_length = 1000, historical_ctr = 0.12

4. Interaction Features

Query-document match score (TF-IDF)
Semantic similarity
Example: tfidf_score = 0.85, match_ratio = 0.6

5. User Features (if available)

User click history
User preferences
Example: user_click_rate = 0.3

Feature Extraction Pipeline

def extract_features(query, doc, position, historical_data):
    """Extract 7-dimensional feature vector"""
    
    features = {
        # Position feature
        "position": position,
        
        # Query features
        "query_length": len(query.split()),
        
        # Document features
        "doc_length": len(doc['content']),
        "historical_ctr": historical_data.get(doc['id'], 0.0),
        
        # Interaction features
        "tfidf_score": compute_tfidf(query, doc),
        "match_ratio": compute_match_ratio(query, doc),
        
        # User features (if available)
        "user_click_history": get_user_click_rate(user_id)
    }
    
    return np.array([
        features["position"],
        features["query_length"],
        features["doc_length"],
        features["tfidf_score"],
        features["match_ratio"],
        features["historical_ctr"],
        features["user_click_history"]
    ])

Training Process

Data Collection

Click Feedback Collection:

System logs user interactions (clicks, impressions)
Each interaction becomes a training sample
Positive samples: clicked items (label = 1)
Negative samples: shown but not clicked (label = 0)

Sample Format:

{
    "query": "machine learning",
    "doc_id": "doc_123",
    "position": 2,
    "features": [2, 2, 1000, 0.85, 0.6, 0.12, 0.3],
    "label": 1  # clicked
}

Training Workflow

sequenceDiagram
    participant User
    participant System
    participant DataService
    participant ModelService
    participant Trainer
    
    User->>System: Search query
    System->>System: Display results
    User->>System: Click on result
    System->>DataService: Log interaction
    DataService->>DataService: Accumulate samples
    
    Note over DataService,Trainer: When enough data collected
    DataService->>Trainer: Training dataset
    Trainer->>Trainer: Train model
    Trainer->>ModelService: Update model
    ModelService->>System: New model ready

Training Steps

Collect Training Data: Gather click feedback over time
Feature Extraction: Extract features for all samples
Data Splitting: Split into train/validation/test sets
Model Training: Train model on training set
Evaluation: Evaluate on validation/test sets
Deployment: Deploy trained model to production

Model Evaluation

Evaluation Metrics

1. Accuracy

Overall correctness: (TP + TN) / (TP + TN + FP + FN)
Limitation: Can be misleading with imbalanced data

2. Precision

Of predicted clicks, how many were actual clicks: TP / (TP + FP)
Use Case: When false positives are costly

3. Recall

Of actual clicks, how many were predicted: TP / (TP + FN)
Use Case: When we want to catch all clicks

4. AUC-ROC

Area under ROC curve
Best Metric: Measures ranking quality, handles imbalanced data well
Target: AUC > 0.7 for good performance

5. Log Loss

Logarithmic loss for probability predictions
Use Case: Penalizes confident wrong predictions

Evaluation Example

from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score

def evaluate_model(model, X_test, y_test):
    """Evaluate CTR model"""
    
    y_pred = model.predict(X_test)
    y_pred_binary = (y_pred > 0.5).astype(int)
    
    metrics = {
        "accuracy": accuracy_score(y_test, y_pred_binary),
        "precision": precision_score(y_test, y_pred_binary),
        "recall": recall_score(y_test, y_pred_binary),
        "auc": roc_auc_score(y_test, y_pred),
        "log_loss": log_loss(y_test, y_pred)
    }
    
    return metrics

# Example output
{
    "accuracy": 0.85,
    "precision": 0.78,
    "recall": 0.82,
    "auc": 0.88,
    "log_loss": 0.32
}

Usage Guide

Training a Model

In the Web UI:

Navigate to “📊 Data Collection & Training” tab
Review collected samples and statistics
Select model type:
- Logistic Regression: Fast, interpretable
- Wide & Deep: Better accuracy, more complex
Configure training parameters:
- Training epochs
- Learning rate
- Batch size
Click “Train CTR Model”
View training progress and results

Using Trained Model

In Search Interface:

Navigate to “🔍 Online Retrieval & Ranking” tab
Enter search query
Select ranking mode: CTR (instead of TF-IDF)
Click “🔬 Execute Search”
Results are ranked by predicted CTR

Model Comparison

Compare TF-IDF vs CTR Ranking:

Perform same query with both ranking modes
Compare result order and relevance
CTR ranking typically shows more user-relevant results

Best Practices

Model Selection

Use Logistic Regression When:

✅ Quick prototyping needed
✅ Interpretability is important
✅ Limited training data (< 10K samples)
✅ Simple feature interactions sufficient

Use Wide & Deep When:

✅ Production system with sufficient data (> 10K samples)
✅ Complex feature interactions expected
✅ Maximum accuracy needed
✅ Can handle longer training time

Feature Engineering Tips

Include Position: Position is a strong signal for CTR
Normalize Features: Scale features to similar ranges
Handle Missing Values: Use default values or imputation
Feature Interactions: Consider explicit interaction features for linear models
Temporal Features: Include time-based features if available

Training Tips

Collect Enough Data: Aim for at least 1K positive samples
Handle Imbalance: Use class weights or sampling techniques
Cross-Validation: Use k-fold CV to assess generalization
Regularization: Prevent overfitting with L1/L2 regularization
Monitor Metrics: Track AUC and log loss during training

Troubleshooting

Low Model Performance

Problem: AUC < 0.6 or poor ranking quality

Solutions:

More Training Data: Collect more click feedback
Better Features: Add more relevant features
Feature Quality: Check for data quality issues
Model Complexity: Try Wide & Deep if using Logistic Regression
Hyperparameter Tuning: Adjust learning rate, regularization

Overfitting

Problem: High training accuracy but low validation accuracy

Solutions:

Regularization: Increase L1/L2 regularization strength
More Data: Collect more training samples
Feature Selection: Remove irrelevant features
Early Stopping: Stop training when validation loss stops improving

Cold Start Problem

Problem: New documents have no historical CTR

Solutions:

Default CTR: Use average CTR as default
Content Features: Rely more on content similarity (TF-IDF)
Gradual Transition: Start with TF-IDF, transition to CTR as data accumulates

Model Evaluation - Detailed evaluation metrics
Interpretability Analysis - Understanding model decisions
AutoML Optimization - Hyperparameter tuning
Wide & Deep Paper - Original research paper

CTR Prediction Models

Table of contents

Overview

What is CTR Prediction?

Why CTR Prediction Matters

How CTR Prediction Works

Supported Models

Model Comparison

1. Logistic Regression

Overview

Model Architecture

Feature Engineering

Implementation

2. Wide & Deep Neural Network

Overview

Model Architecture

Implementation

Feature Engineering

Feature Types

Feature Extraction Pipeline

Training Process

Data Collection

Training Workflow

Training Steps

Model Evaluation

Evaluation Metrics

Evaluation Example

Usage Guide

Training a Model

Using Trained Model

Model Comparison

Best Practices

Model Selection

Feature Engineering Tips

Training Tips

Troubleshooting

Low Model Performance

Overfitting

Cold Start Problem

Related Resources