ML · Full-Stack · Personal Project
Baseball
Oracle.

An ML-powered prediction engine for MLB games — trained on 13,000+ games across five seasons, achieving ~60% accuracy versus the 52.5% baseline. Paired with a live FanDuel odds scraper, a bet tracker, and a full production deployment on Railway.

How It Works

From raw data
to a prediction.

Every morning at 9:15 AM, the system automatically collects the day's matchups, enriches them with historical and real-time data, and produces win probabilities for every game on the schedule.

01
📡
Collect
Game schedules, pitcher stats, batting splits, bullpen ERA and fatigue, umpire tendencies, and ballpark-specific weather are pulled from the MLB Stats API and Open-Meteo via a scheduled cron job.
02
🧠
Train
A Python trainer builds 53 features — rolling team stats, Pythagorean win%, pitcher FIP/K9/BB9/HR9, LHP/RHP splits, park factors, H2H records — then trains both a RandomForest and XGBoost model, keeping the winner via walk-forward CV.
03
Predict
Node.js walks the exported decision tree JSON directly — no Python required at inference time. Predictions surface in the app with calibrated win probabilities, Kelly-sized bet recommendations based on live FanDuel moneyline odds.

Key Architecture Decision

Training and inference are completely decoupled. Python trains the model and exports the decision trees as plain JSON files. Node.js then walks those trees at request time using a custom pure-JS inference engine (rfPredict.js) — meaning the production server has zero Python dependencies and prediction latency stays low.

Python (train)
trees.json
rfPredict.js
Prediction
Model Performance

The numbers.

Honest accuracy estimates using temporal walk-forward cross-validation — the model only ever trains on past data and predicts forward, the same way it runs in production.

~60%
Prediction Accuracy
vs. the 52.5% coin-flip baseline for MLB game outcomes
13K+
Training Games
2021–2026 seasons with a strict temporal train/test split to prevent data leakage
53
Model Features
Team rolling stats, pitcher FIP/K9/BB9, bullpen fatigue, weather, park factors, umpire tendencies, H2H records
Features

Built to use,
not just demo.

Beyond the model, Baseball Oracle is a full product — deployed, accessible on mobile, and designed for daily use during the MLB season.

🔮

Daily Predictions

Win probabilities auto-update each morning for every game on the schedule. The model competes RF vs. XGBoost on every training run and keeps the better-performing model.

💸

FanDuel Odds Scraper

Playwright with stealth plugin scrapes live moneyline odds from FanDuel, which are fed into a Kelly criterion calculator to size bets proportionally to edge.

📊

Bet Tracker

Log bets directly in the app with client-side localStorage — no accounts, no server round-trips. Track outcomes, ROI, and bet history across the season.

📱

PWA — Installable

Fully installable as a home-screen app via a service worker and manifest. Mobile-first responsive design with bottom navigation for quick daily access.

Tech Stack

What it's
built with.

Languages
JavaScript (Node.js) Python HTML / CSS
Machine Learning
scikit-learn XGBoost NumPy Platt Scaling Walk-Forward CV
Backend
Express Commander (CLI) node-cron child_process
Data & APIs
MLB Stats API Open-Meteo FanDuel (Playwright scrape) Polymarket
Infrastructure
Railway Docker GitHub (private) PWA / Service Worker