Scaling

Scalability Design¶

Horizontal Scaling¶

Each service can be scaled independently:

graph LR
    subgraph "Load Balancer"
        LB[Nginx/HAProxy]
    end

    subgraph "API Gateway Cluster"
        GW1[Gateway 1]
        GW2[Gateway 2]
        GW3[Gateway N]
    end

    subgraph "Queue Worker Cluster"
        QW1[Worker 1]
        QW2[Worker 2]
        QW3[Worker N]
    end

    subgraph "Indexing Service Cluster"
        IS1[Indexer 1]
        IS2[Indexer 2]
        IS3[Indexer N]
    end

    LB --> GW1
    LB --> GW2
    LB --> GW3

    GW1 -.-> QW1
    GW2 -.-> QW2
    GW3 -.-> QW3

    QW1 -.-> IS1
    QW2 -.-> IS2
    QW3 -.-> IS3

Performance Characteristics¶

Based on recent benchmarking:

API Gateway: Handles 245+ RPS with median 1.2s latency
Queue Worker: Processes thousands of messages per minute
Indexing Service: Maintains search indices for millions of documents

Scaling Strategies¶

API Gateway Scaling:

Add more instances behind a load balancer
Optimize for I/O-bound webhook processing
Scale based on HTTP request volume

Queue Worker Scaling:

Increase worker instances to handle queue backlog
Monitor queue depth and processing time
Scale based on notification volume

Indexing Service Scaling:

Scale for write-heavy workloads to MongoDB and Elasticsearch
Batch operations for improved throughput
Scale based on indexing latency and volume