Projects

Applied research, engineered for impact

A selection of projects spanning ML research, infrastructure, and product-focused builds.

Distributed Systems & Infrastructure

Distributed Stream Processing System

Designed and implemented a fault-tolerant, exactly-once stream processing engine that demystifies Apache Flink-style internals while meeting real-time correctness and latency requirements.

Contribution: Implemented Chandy-Lamport distributed snapshots for exactly-once semantics, built a self-healing control plane with heartbeat-based recovery (<60s) from S3/GCS, and deployed HA workloads on GKE with autoscaling and persistent state.

Outcome: Achieved 10–50ms end-to-end latency for stateless pipelines and 50–200ms for stateful windows; delivered production-ready orchestration with observability dashboards and durable checkpointing.

PythonFastAPIgRPCProtocol BuffersApache KafkaRocksDBPostgreSQLMinIOS3GCSPrometheusGrafanaDocker ComposeKubernetesGKE

GitHub YouTube Demo

Machine Learning & NLP

Political QA Clarity Detection (SemEval Task 6)

Built an end-to-end NLP system to detect clarity and evasiveness in political interview responses using QEvasion question–answer pairs.

Contribution: Framed the task as multi-class classification (Clear Reply, Ambivalent Reply, Clear Non-Reply), ran EDA on class imbalance and answer length patterns, and evaluated lexical baselines alongside transformer models and hybrid ensembles using Macro F1.

Outcome: TF-IDF + Linear SVM delivered a strong baseline; RoBERTa generalized best on hidden evaluation, while simple ensembling showed limited gains without careful calibration.

Pythonscikit-learnHugging Face TransformersPyTorchRoBERTaBERTDeBERTaTF-IDFLinear SVMLogistic RegressionPandasMatplotlibGoogle Colab

GitHub YouTube Demo

EEG Cognitive State Classification

Classify cognitive states from EEG signals during rest vs. mental arithmetic using deep learning and spectral analysis.

Contribution: Loaded PhysioNet mental arithmetic EEG data, computed band-wise PSD (delta–gamma), engineered features, and trained EEGNet and TSception models with accuracy/precision/recall/F1 evaluation.

Outcome: EEGNet achieved 91.78% accuracy and 91.69% F1, outperforming TSception (80.78% accuracy, 79.73% F1), demonstrating stronger EEG-specific modeling.

PythonMNEEEGNetTSceptionJupyter NotebookPhysioNetPower Spectral DensityNumPyPandasMatplotlib

GitHub

Shakespeare Text Generator

Generate Shakespeare-style text using a character-level RNN with LSTM layers trained on a subset of Shakespeare’s works.

Contribution: Preprocessed text into fixed-length character sequences, trained an LSTM-based model with softmax outputs, and tuned sampling via temperature to control creativity.

Outcome: Produced coherent, Shakespeare-like text samples using a lightweight RNN architecture and character-level generation.

PythonTensorFlowKerasLSTMRNNOne-hot encodingRMSprop

GitHub