Distributed Stream Processing System
Designed and implemented a fault-tolerant, exactly-once stream processing engine that demystifies Apache Flink-style internals while meeting real-time correctness and latency requirements.
Contribution: Implemented Chandy-Lamport distributed snapshots for exactly-once semantics, built a self-healing control plane with heartbeat-based recovery (<60s) from S3/GCS, and deployed HA workloads on GKE with autoscaling and persistent state.
Outcome: Achieved 10–50ms end-to-end latency for stateless pipelines and 50–200ms for stateful windows; delivered production-ready orchestration with observability dashboards and durable checkpointing.