Baseball Oracle — theConnorChannel

ML · Full-Stack · Personal Project

Baseball
Oracle.

An ML-powered prediction engine for MLB games — trained on 13,000+ games across five seasons, achieving ~60% accuracy versus the 52.5% baseline. Paired with a live FanDuel odds scraper, a bet tracker, and a full production deployment on Railway.

Live App ↗ ← All Projects

How It Works

From raw data
to a prediction.

Every morning at 9:15 AM, the system automatically collects the day's matchups, enriches them with historical and real-time data, and produces win probabilities for every game on the schedule.

📡

Collect

Game schedules, pitcher stats, batting splits, bullpen ERA and fatigue, umpire tendencies, and ballpark-specific weather are pulled from the MLB Stats API and Open-Meteo via a scheduled cron job.

🧠

Train

A Python trainer builds 53 features — rolling team stats, Pythagorean win%, pitcher FIP/K9/BB9/HR9, LHP/RHP splits, park factors, H2H records — then trains both a RandomForest and XGBoost model, keeping the winner via walk-forward CV.

⚾

Predict

Node.js walks the exported decision tree JSON directly — no Python required at inference time. Predictions surface in the app with calibrated win probabilities, Kelly-sized bet recommendations based on live FanDuel moneyline odds.

Key Architecture Decision

Training and inference are completely decoupled. Python trains the model and exports the decision trees as plain JSON files. Node.js then walks those trees at request time using a custom pure-JS inference engine (rfPredict.js) — meaning the production server has zero Python dependencies and prediction latency stays low.

Python (train)

→

trees.json

→

rfPredict.js

→

Prediction

Model Performance

The numbers.

Honest accuracy estimates using temporal walk-forward cross-validation — the model only ever trains on past data and predicts forward, the same way it runs in production.

~60%

Prediction Accuracy

vs. the 52.5% coin-flip baseline for MLB game outcomes

13K+

Training Games

2021–2026 seasons with a strict temporal train/test split to prevent data leakage

Model Features

Team rolling stats, pitcher FIP/K9/BB9, bullpen fatigue, weather, park factors, umpire tendencies, H2H records

Features

Built to use,
not just demo.

Beyond the model, Baseball Oracle is a full product — deployed, accessible on mobile, and designed for daily use during the MLB season.

🔮

Daily Predictions

Win probabilities auto-update each morning for every game on the schedule. The model competes RF vs. XGBoost on every training run and keeps the better-performing model.

💸

FanDuel Odds Scraper

Playwright with stealth plugin scrapes live moneyline odds from FanDuel, which are fed into a Kelly criterion calculator to size bets proportionally to edge.

📊

Bet Tracker

Log bets directly in the app with client-side localStorage — no accounts, no server round-trips. Track outcomes, ROI, and bet history across the season.

📱

PWA — Installable

Fully installable as a home-screen app via a service worker and manifest. Mobile-first responsive design with bottom navigation for quick daily access.

Tech Stack

What it's
built with.

Languages

JavaScript (Node.js) Python HTML / CSS

Machine Learning

scikit-learn XGBoost NumPy Platt Scaling Walk-Forward CV

Backend

Express Commander (CLI) node-cron child_process

Data & APIs

MLB Stats API Open-Meteo FanDuel (Playwright scrape) Polymarket

Infrastructure

Railway Docker GitHub (private) PWA / Service Worker

From raw datato a prediction.