System Architecture

Detailed architecture design of the search and recommendation system.

Design Principles
1. Goals
2. Technology Selection
Layered Architecture
1. Layer Descriptions
Component Design
1. Inverted Index Component
2. CTR Model Component
Process Flows
1. Online Retrieval Flow
2. Offline Training Flow
Scalability Design
1. Horizontal Scaling
2. Vertical Scaling
Monitoring & Operations
1. Metrics System
2. Disaster Recovery
Implementation Roadmap
1. Development Phases
2. Risk Control
Further Reading

Design Principles

Goals

Performance: Fast retrieval response with good concurrency handling
Effectiveness: Improve user click-through rate (CTR) and satisfaction
Maintainability: Modular design supporting rapid algorithm iteration
Scalability: Linear scaling for data volume and concurrency

Technology Selection

Maturity: Choose battle-tested open-source technologies
Performance First: In-memory computation over disk I/O
Interpretability: Business-interpretable ranking algorithms
Progressive Upgrade: Support smooth transition from traditional to intelligent recommendation

Layered Architecture

graph TB
    subgraph "Application Layer"
        A[User Query] --> B[Search Interface]
        B --> B1[Query Preprocessing]
        B1 --> B2[Parameter Validation]
    end
    
    subgraph "Service Layer"
        B2 --> C[Index Service]
        C --> C1[Retrieval Orchestration]
        C1 --> C2[Result Aggregation]
        
        D[Model Service] --> D1[Feature Engineering]
        D1 --> D2[CTR Prediction]
        
        E[Data Service] --> E1[Click Collection]
        E1 --> E2[Sample Generation]
    end
    
    subgraph "Algorithm Layer"
        C2 --> F[Recall Algorithm]
        F --> F1[TF-IDF Calculation]
        F1 --> F2[Candidate Documents]
        
        F2 --> G[Ranking Algorithm]
        G --> G1{Ranking Strategy}
        G1 -->|Traditional| G2[Relevance Ranking]
        G1 -->|Intelligent| D1
        D2 --> G3[CTR Reranking]
    end
    
    subgraph "Storage Layer"
        H[Inverted Index] --> F1
        I[Document Storage] --> F1
        J[Model Storage] --> D
        K[Click Logs] --> E
    end
    
    G2 --> L[Return Results]
    G3 --> L
    L --> M[User Feedback]
    M --> E1

Layer Descriptions

Application Layer: Request handling, parameter validation, interface adaptation
Service Layer: Business logic orchestration, component coordination
Algorithm Layer: Core retrieval and ranking algorithm implementation
Storage Layer: Data persistence, index storage

Component Design

Inverted Index Component

TF-IDF Scoring Formula:

The system uses TF-IDF (Term Frequency-Inverse Document Frequency) to score document relevance:

\[\text{TF-IDF}(t, d) = \text{TF}(t, d) \times \text{IDF}(t)\]

Where:

$\text{TF}(t, d) = \frac{\text{count of term } t \text{ in document } d}{\text{total terms in document } d}$: Term frequency (normalized by document length)
$\text{IDF}(t) = \log \frac{N}{df(t)}$: Inverse document frequency
- $N$: Total number of documents
- $df(t)$: Number of documents containing term $t$

Query-Document Score:

For a query $q$ with terms ${t_1, t_2, …, t_n}$, the relevance score for document $d$ is:

\[\text{Score}(q, d) = \sum_{t \in q} \text{TF-IDF}(t, d)\]

Intuitive Explanation:

TF (Term Frequency): Terms that appear more frequently in a document are more relevant
IDF (Inverse Document Frequency): Rare terms across the corpus are more discriminative
Combined: TF-IDF balances local relevance (TF) with global discriminativeness (IDF)
Query Matching: Documents containing more query terms with higher TF-IDF scores rank higher

Core Data Structures:

class InvertedIndex:
    self.index = defaultdict(set)          # term -> document ID set
    self.doc_lengths = {}                  # document ID -> length
    self.documents = {}                    # document ID -> content
    self.term_freq = defaultdict(dict)     # term -> {doc_id: frequency}
    self.doc_freq = defaultdict(int)       # term -> document frequency

Performance Optimizations:

Memory layout optimization using sets
Query optimization with short-circuit evaluation
Hot query result caching
Horizontal sharding support

CTR Model Component

Feature Engineering (7-dimensional vector):

Position Feature: Absolute position in results
Document Length: Total document length
Query Length: Query string length
Summary Length: Summary text length
Match Score: Query-summary word overlap
Query Historical CTR: Past CTR for this query
Document Historical CTR: Past CTR for this document

Model Architecture Options:

Logistic Regression: Fast training, interpretable (baseline)
Wide & Deep: Feature interaction support (production)
Deep Models: DIN/DSSM etc. (future extension)

Process Flows

Online Retrieval Flow

sequenceDiagram
    participant U as User
    participant SI as SearchInterface
    participant IS as IndexService
    participant II as InvertedIndex
    participant MS as ModelService
    
    U->>SI: Search Request(query)
    SI->>SI: Parameter Validation
    SI->>IS: search(query, top_k)
    
    IS->>II: Recall Candidates
    II->>II: TF-IDF Calculation
    II-->>IS: Candidate List
    
    alt CTR Ranking Mode
        IS->>MS: Feature Construction & Prediction
        MS->>MS: CTR Model Inference
        MS-->>IS: CTR Scores
        IS->>IS: Reranking
    end
    
    IS-->>SI: Final Results
    SI-->>U: Display Results
    
    U->>SI: Click Feedback
    SI->>DataService: Log Click Event

Offline Training Flow

sequenceDiagram
    participant DS as DataService
    participant TS as TrainingService
    participant MS as ModelService
    
    DS->>DS: Collect Click Logs
    DS->>TS: Trigger Model Training
    
    TS->>TS: Sample Construction & Feature Engineering
    TS->>TS: Model Training & Evaluation
    TS->>MS: Model Deployment
    
    MS->>MS: Hot Model Update
    Note over MS: Seamless service switchover

Scalability Design

Horizontal Scaling

# Index sharding strategy
def get_shard_id(term: str, shard_num: int) -> int:
    return hash(term) % shard_num

# Distributed retrieval coordination
class DistributedIndexService:
    def search(self, query: str) -> List:
        # 1. Query dispatch to all shards
        shard_results = self.query_all_shards(query)
        # 2. Merge and rerank results
        return self.merge_and_rank(shard_results)

Vertical Scaling

Memory Management: LRU cache + tiered storage
Compute Optimization: Multithreading + SIMD vectorization
Model Compression: Quantization + distillation

Monitoring & Operations

Metrics System

class SearchMetrics:
    # Business Metrics
    ctr: float           # Click-through rate
    mrr: float           # Mean reciprocal rank
    ndcg_at_10: float    # Normalized discounted cumulative gain
    
    # System Metrics
    avg_latency: float   # Average latency
    p95_latency: float   # 95th percentile latency
    qps: float           # Queries per second
    
    # Data Metrics
    index_coverage: float        # Index coverage rate
    click_data_quality: float    # Click data quality score

Disaster Recovery

Failure Type	Detection	Handling Strategy	Recovery
Index Corruption	Query anomaly detection	Auto rebuild + backup restore	Quick recovery
Model Failure	Prediction anomaly	Downgrade to TF-IDF	Real-time fallback
Out of Memory	System monitoring	LRU cleanup + rate limiting	Auto handling
Service Overload	QPS monitoring	Circuit breaker + downgrade	Real-time response