Personal project
Built an end-to-end machine learning platform on the full 7-million-review Yelp dataset that powers two services: a business recommendation engine and a sentiment classifier, served through one REST API. The work spans large-scale data processing, model training, API serving, containerization, and automated testing.
Links
Stack
System architecture. Tap to enlarge.
Overview
A production-style machine learning platform built on the full Yelp Open Dataset. It takes raw review data all the way to a served API, with two models: one that recommends businesses and one that scores the sentiment of review text.
Approach
Results
The recommender reached Recall@10 of 5.5 percent, 6.2 times a most-popular baseline, with a bias-augmented variant at RMSE 1.17. The sentiment classifier reached 86.3 percent accuracy and a macro-F1 of 0.73 with class weighting. The exported inference path runs at p99 0.11 ms with 100 percent parity to the training model.
Engineering
PySpark for ETL and model training, FastAPI for serving, MLflow for experiment tracking, and Docker Compose for orchestration, with a Pytest suite and GitHub Actions CI. Every headline number is backed by committed, provenance-stamped benchmark outputs.