Data Flow
Data Flow Architecture¶
The system processes data through a series of well-defined stages:
sequenceDiagram
participant YT as YouTube
participant GW as API Gateway
participant NQ as Notification Queue
participant QW as Queue Worker
participant MQ as Metadata Queue
participant IS as Index Service
participant DB as MongoDB
participant ES as Elasticsearch
YT->>GW: PubSubHubbub Notification
GW->>GW: Validate & Authenticate
GW->>NQ: Enqueue Notification
GW->>YT: HTTP 200 OK
NQ->>QW: Dequeue Notification
QW->>QW: Parse XML Data
QW->>QW: Extract Metadata
QW->>MQ: Enqueue Metadata
MQ->>IS: Dequeue Metadata
IS->>IS: Metadata enrichment
IS->>DB: Store Video Metadata
IS->>ES: Update Search Index
IS->>IS: Update Channel Stats
Processing Stages¶
- Notification Receipt: YouTube sends PubSubHubbub notifications to the API Gateway
- Validation: The gateway validates the notification signature and source
- Queuing: Valid notifications are queued for asynchronous processing
- Extraction: The Queue Worker parses XML and extracts video metadata
- Enrichment: Additional processing and data validation occurs
- Storage: The Indexing Service persists data to MongoDB
- Indexing: Video metadata is indexed in Elasticsearch for search
- Statistics: Channel-level statistics are updated