Skip to content

Scaling

Scalability Design

Horizontal Scaling

Each service can be scaled independently:

graph LR
    subgraph "Load Balancer"
        LB[Nginx/HAProxy]
    end

    subgraph "API Gateway Cluster"
        GW1[Gateway 1]
        GW2[Gateway 2]
        GW3[Gateway N]
    end

    subgraph "Queue Worker Cluster"
        QW1[Worker 1]
        QW2[Worker 2]
        QW3[Worker N]
    end

    subgraph "Indexing Service Cluster"
        IS1[Indexer 1]
        IS2[Indexer 2]
        IS3[Indexer N]
    end

    LB --> GW1
    LB --> GW2
    LB --> GW3

    GW1 -.-> QW1
    GW2 -.-> QW2
    GW3 -.-> QW3

    QW1 -.-> IS1
    QW2 -.-> IS2
    QW3 -.-> IS3

Performance Characteristics

Based on recent benchmarking:

  • API Gateway: Handles 245+ RPS with median 1.2s latency
  • Queue Worker: Processes thousands of messages per minute
  • Indexing Service: Maintains search indices for millions of documents

Scaling Strategies

API Gateway Scaling:

  • Add more instances behind a load balancer
  • Optimize for I/O-bound webhook processing
  • Scale based on HTTP request volume

Queue Worker Scaling:

  • Increase worker instances to handle queue backlog
  • Monitor queue depth and processing time
  • Scale based on notification volume

Indexing Service Scaling:

  • Scale for write-heavy workloads to MongoDB and Elasticsearch
  • Batch operations for improved throughput
  • Scale based on indexing latency and volume