Web Scraping for Sports, Betting & Fantasy: How AI Agents Track Odds, Stats & Player Data in 2026
The global sports betting market surpassed $85 billion in 2025, with the US alone generating $15B+ in handle after legalization swept through 38 states. Fantasy sports adds another $22B. And behind every successful bettor, DFS player, and sports analytics startup sits one common need: better data, faster.
Yet professional sports data is shockingly expensive. Sportradar charges $1M+/year for real-time feeds. Stats Perform (Opta) commands $500K-$2M annually. Even basic historical data packages from providers like SportsDataIO start at $500-$2,000/month. All while much of the underlying data โ odds, box scores, injury reports, roster moves โ is published freely across dozens of public websites.
In this guide, you'll build an AI-powered sports intelligence system that scrapes betting odds across sportsbooks, tracks player stats and injury reports, monitors line movements, generates fantasy projections, and uses GPT-4o to find edges that humans miss.
Why AI Agents Are Transforming Sports Analytics
Sports data has characteristics that make it perfect for AI agent automation:
- Time-critical: Odds move in seconds after injury news breaks. A 30-second edge in line shopping can mean the difference between +EV and -EV. Manual monitoring of 10+ sportsbooks is physically impossible.
- Massive volume: The NFL alone generates 50,000+ player-game stat combinations per season. Add NBA, MLB, NHL, soccer, tennis, and you're looking at millions of data points per week.
- Fragmented sources: Odds live on DraftKings, FanDuel, BetMGM, and 20+ other books. Stats span ESPN, Basketball-Reference, FBRef, and league official sites. Injury reports come from team Twitter accounts, beat reporters, and official league transactions.
- Pattern-rich: Line movements correlate with sharp money. Injury impact varies by position and scheme. Weather affects totals. These patterns are discoverable โ if you have the data.
Architecture: The 6-Step Sports Intelligence Pipeline
- Source Discovery โ Identify sportsbooks, stats sites, injury feeds, and transaction wires to monitor
- AI-Powered Extraction โ Use Mantis WebPerception API to scrape and structure sports data from complex, JS-heavy pages
- SQLite Storage โ Store historical odds, player stats, injuries, and line movements locally
- Edge Detection โ Flag line discrepancies, sharp money movements, injury impacts, and value bets
- GPT-4o Analysis โ AI generates projections, identifies correlations, and produces betting/fantasy insights
- Slack/Discord Alerts โ Real-time notifications for odds changes, injury news, and detected edges
Step 1: Define Your Sports Data Models
First, create Pydantic schemas for structured sports data extraction:
from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime
from enum import Enum
class Sport(str, Enum):
NFL = "nfl"
NBA = "nba"
MLB = "mlb"
NHL = "nhl"
NCAAF = "ncaaf"
NCAAB = "ncaab"
SOCCER = "soccer"
TENNIS = "tennis"
MMA = "mma"
class BettingOdds(BaseModel):
"""Betting odds from a sportsbook for a specific market."""
sportsbook: str # "draftkings", "fanduel", "betmgm", "caesars"
sport: Sport
event_name: str # "Chiefs vs Eagles" or "Lakers vs Celtics"
event_date: str
market_type: str # "spread", "moneyline", "total", "player_prop"
selection: str # "Chiefs -3.5", "Over 47.5", "Mahomes Over 2.5 TDs"
odds_american: int # -110, +150, etc.
odds_decimal: Optional[float] = None
line: Optional[float] = None # Spread or total number
implied_probability: Optional[float] = None
previous_odds: Optional[int] = None
previous_line: Optional[float] = None
movement_direction: Optional[str] = None # "steam", "reverse", "stable"
scraped_at: str
class PlayerStats(BaseModel):
"""Player performance statistics from a game or season."""
player_name: str
team: str
sport: Sport
season: str # "2025-26"
game_date: Optional[str] = None
opponent: Optional[str] = None
minutes_played: Optional[float] = None
# Universal stats
points: Optional[float] = None
assists: Optional[float] = None
rebounds: Optional[float] = None
# Sport-specific
passing_yards: Optional[float] = None
rushing_yards: Optional[float] = None
touchdowns: Optional[int] = None
strikeouts: Optional[int] = None
era: Optional[float] = None
goals: Optional[int] = None
shots_on_target: Optional[int] = None
# Advanced
usage_rate: Optional[float] = None
true_shooting_pct: Optional[float] = None
war: Optional[float] = None
epa_per_play: Optional[float] = None
xg: Optional[float] = None # Expected goals (soccer)
source: str
scraped_at: str
class InjuryReport(BaseModel):
"""Player injury status from official or media sources."""
player_name: str
team: str
sport: Sport
injury_type: str # "knee", "hamstring", "concussion", "illness"
status: str # "out", "doubtful", "questionable", "probable", "day-to-day"
details: Optional[str] = None
game_date: Optional[str] = None
estimated_return: Optional[str] = None
impact_rating: Optional[str] = None # "high", "medium", "low"
source: str # "official_report", "beat_reporter", "team_announcement"
reported_at: str
class LineMovement(BaseModel):
"""Tracked line movement over time for a specific market."""
sport: Sport
event_name: str
event_date: str
market_type: str
sportsbook: str
opening_line: float
current_line: float
opening_odds: int
current_odds: int
movement_size: float # Absolute change
movement_pct: float
tickets_pct: Optional[float] = None # % of tickets on this side
money_pct: Optional[float] = None # % of money on this side
sharp_indicator: bool # True if reverse line movement detected
timestamp: str
Step 2: Scrape Betting Odds Across Sportsbooks
Use the Mantis WebPerception API to extract real-time odds from multiple sportsbooks:
import requests
import json
import sqlite3
from datetime import datetime
MANTIS_API_KEY = "your-mantis-api-key"
BASE_URL = "https://api.mantisapi.com/v1"
def scrape_sportsbook_odds(sportsbook: str, sport: str, url: str) -> list[BettingOdds]:
"""Scrape current betting odds from a sportsbook."""
# Step 1: Capture the odds page with JS rendering
# Sportsbooks are heavily JS-rendered with dynamic updates
response = requests.post(
f"{BASE_URL}/scrape",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": url,
"render_js": True,
"wait_for": "[class*='odds'], [class*='line'], [class*='spread']",
"timeout": 30000
}
)
page_data = response.json()
# Step 2: AI-powered extraction of odds
extraction = requests.post(
f"{BASE_URL}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"content": page_data["content"],
"schema": BettingOdds.model_json_schema(),
"prompt": f"""Extract ALL betting odds from this {sportsbook} {sport} page.
For each game/event, capture:
- Both sides of the spread with odds (e.g., Chiefs -3.5 (-110), Eagles +3.5 (-110))
- Both moneyline odds (e.g., Chiefs -175, Eagles +150)
- Total (over/under) with odds for both sides
- Any featured player props if visible
Convert all odds to American format.
Calculate implied probability from odds.
Note the event date/time.""",
"multiple": True
}
)
odds = [BettingOdds(**o) for o in extraction.json()["data"]]
return odds
# Monitor major US sportsbooks
sportsbook_urls = {
"DraftKings": {
"nfl": "https://sportsbook.draftkings.com/leagues/football/nfl",
"nba": "https://sportsbook.draftkings.com/leagues/basketball/nba",
"mlb": "https://sportsbook.draftkings.com/leagues/baseball/mlb",
"nhl": "https://sportsbook.draftkings.com/leagues/hockey/nhl",
},
"FanDuel": {
"nfl": "https://sportsbook.fanduel.com/football/nfl",
"nba": "https://sportsbook.fanduel.com/basketball/nba",
"mlb": "https://sportsbook.fanduel.com/baseball/mlb",
"nhl": "https://sportsbook.fanduel.com/hockey/nhl",
},
"BetMGM": {
"nfl": "https://sports.betmgm.com/en/sports/football/nfl",
"nba": "https://sports.betmgm.com/en/sports/basketball/nba",
"mlb": "https://sports.betmgm.com/en/sports/baseball/mlb",
"nhl": "https://sports.betmgm.com/en/sports/hockey/nhl",
},
"Caesars": {
"nfl": "https://www.caesars.com/sportsbook-and-casino/sports/football/nfl",
"nba": "https://www.caesars.com/sportsbook-and-casino/sports/basketball/nba",
"mlb": "https://www.caesars.com/sportsbook-and-casino/sports/baseball/mlb",
"nhl": "https://www.caesars.com/sportsbook-and-casino/sports/hockey/nhl",
},
}
# Scrape all books for today's games
for book, sports in sportsbook_urls.items():
for sport, url in sports.items():
try:
odds = scrape_sportsbook_odds(book, sport, url)
print(f"โ
{book} {sport.upper()}: {len(odds)} odds captured")
except Exception as e:
print(f"โ {book} {sport}: {e}")
Step 3: Track Player Stats & Performance
Scrape player statistics from multiple sources for comprehensive coverage:
def scrape_player_stats(sport: str, source: str) -> list[PlayerStats]:
"""Scrape player stats from sports reference sites."""
stat_sources = {
"nba": [
{
"url": "https://www.basketball-reference.com/leagues/NBA_2026_per_game.html",
"source": "basketball_reference",
"prompt": """Extract per-game statistics for all NBA players:
- Player name, team, games played, minutes
- Points, rebounds, assists, steals, blocks
- Field goal %, 3-point %, free throw %
- Turnovers, personal fouls
- Usage rate and true shooting % if available"""
},
{
"url": "https://www.nba.com/stats/players/traditional",
"source": "nba_official",
"prompt": """Extract official NBA player statistics:
- All traditional stats (PTS, REB, AST, STL, BLK)
- Minutes per game
- Shooting splits (FG%, 3P%, FT%)
- Plus/minus"""
}
],
"nfl": [
{
"url": "https://www.pro-football-reference.com/years/2025/passing.htm",
"source": "pfr",
"prompt": """Extract NFL passing statistics:
- Player name, team, games
- Completions, attempts, completion %
- Passing yards, TDs, interceptions
- Passer rating, QBR if available
- Yards per attempt, sack data"""
},
{
"url": "https://www.pro-football-reference.com/years/2025/rushing.htm",
"source": "pfr",
"prompt": """Extract NFL rushing statistics:
- Player name, team, games
- Rushing attempts, yards, TDs
- Yards per carry, longest run
- Fumbles and fumbles lost"""
}
],
"mlb": [
{
"url": "https://www.baseball-reference.com/leagues/majors/2026-standard-batting.shtml",
"source": "baseball_reference",
"prompt": """Extract MLB batting statistics:
- Player name, team, games, plate appearances
- AVG, OBP, SLG, OPS
- Home runs, RBIs, stolen bases
- WAR if available"""
}
],
"soccer": [
{
"url": "https://fbref.com/en/comps/9/stats/Premier-League-Stats",
"source": "fbref",
"prompt": """Extract Premier League player statistics:
- Player name, team, position, games, minutes
- Goals, assists, xG, xAG
- Shots, shots on target
- Progressive passes, progressive carries
- Tackles, interceptions"""
}
]
}
all_stats = []
for src in stat_sources.get(sport, []):
try:
response = requests.post(
f"{BASE_URL}/scrape",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={"url": src["url"], "render_js": True, "timeout": 30000}
)
extraction = requests.post(
f"{BASE_URL}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"content": response.json()["content"],
"schema": PlayerStats.model_json_schema(),
"prompt": src["prompt"],
"multiple": True
}
)
stats = [PlayerStats(**s) for s in extraction.json()["data"]]
all_stats.extend(stats)
print(f"โ
{src['source']}: {len(stats)} player records")
except Exception as e:
print(f"โ {src['source']}: {e}")
return all_stats
Step 4: Monitor Injuries & Roster Moves
Injury information is the single biggest edge in sports betting โ injuries move lines more than anything else:
def scrape_injury_reports(sport: str) -> list[InjuryReport]:
"""Scrape injury reports from official and media sources."""
injury_sources = {
"nba": [
{
"url": "https://www.espn.com/nba/injuries",
"prompt": """Extract ALL NBA injury reports:
- Player name and team
- Injury type (knee, ankle, back, illness, etc.)
- Status: Out, Doubtful, Questionable, Day-to-Day, Probable
- Injury details and estimated return timeline
- Date of report/update"""
},
{
"url": "https://www.cbssports.com/nba/injuries/",
"prompt": """Extract NBA injury information:
- Player name, team, position
- Injury description
- Status and expected return
- Date updated"""
}
],
"nfl": [
{
"url": "https://www.espn.com/nfl/injuries",
"prompt": """Extract ALL NFL injury reports:
- Player name, team, position
- Injury type
- Practice participation status (Full, Limited, DNP)
- Game status (Out, Doubtful, Questionable)
- Week and opponent"""
}
],
"mlb": [
{
"url": "https://www.espn.com/mlb/injuries",
"prompt": """Extract ALL MLB injury reports including IL placements:
- Player name, team, position
- Injury type and IL designation (10-day, 15-day, 60-day)
- Expected return date
- Rehab assignment details if any"""
}
]
}
all_injuries = []
for src in injury_sources.get(sport, []):
try:
response = requests.post(
f"{BASE_URL}/scrape",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={"url": src["url"], "render_js": True, "timeout": 30000}
)
extraction = requests.post(
f"{BASE_URL}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"content": response.json()["content"],
"schema": InjuryReport.model_json_schema(),
"prompt": src["prompt"],
"multiple": True
}
)
injuries = [InjuryReport(**i) for i in extraction.json()["data"]]
all_injuries.extend(injuries)
except Exception as e:
print(f"โ Injury source error: {e}")
return all_injuries
def assess_injury_impact(injury: InjuryReport, conn) -> dict:
"""Assess the betting impact of an injury using historical data."""
cursor = conn.cursor()
# Get player's recent stats to gauge importance
cursor.execute("""
SELECT AVG(points), AVG(assists), AVG(rebounds), AVG(minutes_played)
FROM player_stats
WHERE player_name = ? AND sport = ?
ORDER BY game_date DESC LIMIT 10
""", (injury.player_name, injury.sport))
recent_stats = cursor.fetchone()
# Get team's record with vs without this player
cursor.execute("""
SELECT
COUNT(CASE WHEN minutes_played > 0 THEN 1 END) as games_played,
COUNT(CASE WHEN minutes_played = 0 OR minutes_played IS NULL THEN 1 END) as games_missed
FROM player_stats
WHERE player_name = ? AND sport = ? AND season = '2025-26'
""", (injury.player_name, injury.sport))
availability = cursor.fetchone()
impact = {
"player": injury.player_name,
"team": injury.team,
"status": injury.status,
"avg_stats": recent_stats,
"games_played": availability[0] if availability else 0,
"games_missed": availability[1] if availability else 0,
"estimated_line_impact": "TBD"
}
return impact
Step 5: AI-Powered Edge Detection & Analysis
Use GPT-4o to find betting edges, generate fantasy projections, and produce actionable insights:
from openai import OpenAI
client = OpenAI()
def find_betting_edges(odds_data: list, injuries: list, stats: list):
"""Use AI to identify potential betting edges across markets."""
analysis = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": """You are a professional sports analyst and quantitative
betting researcher. Analyze odds, injury, and statistical data to:
1. LINE SHOPPING: Identify the best available odds across sportsbooks
for each market. Flag lines where one book is significantly off
from consensus (potential value).
2. INJURY IMPACT: Assess how current injuries should move lines vs
how they actually have moved. Flag potential under-reactions.
3. SHARP vs PUBLIC: When ticket% and money% diverge, that indicates
sharp action. Flag reverse line movements.
4. STATISTICAL EDGES: Identify players whose recent performance
suggests their props are mispriced (e.g., a player averaging 28 PPG
last 5 games with an O/U of 23.5).
5. CORRELATION PLAYS: Identify same-game parlay correlations that
books may not properly account for.
Be specific with numbers. Provide expected value calculations.
Always note this is analysis, not gambling advice."""
}, {
"role": "user",
"content": f"""Analyze today's sports betting landscape:
CURRENT ODDS (across sportsbooks):
{json.dumps([o.model_dump() for o in odds_data[:50]], indent=2)}
ACTIVE INJURIES:
{json.dumps([i.model_dump() for i in injuries[:30]], indent=2)}
RECENT PLAYER STATS:
{json.dumps([s.model_dump() for s in stats[:40]], indent=2)}
Find edges, line discrepancies, and value opportunities."""
}]
)
return analysis.choices[0].message.content
def generate_fantasy_projections(stats: list, injuries: list, sport: str):
"""Generate AI-powered fantasy sports projections."""
projections = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": f"""You are an expert {sport.upper()} fantasy sports analyst.
Generate player projections based on:
1. RECENT FORM: Weight recent games more heavily (last 5 > last 15 > season)
2. MATCHUP: Consider opponent's defensive rankings at each position
3. INJURY CONTEXT: Adjust for teammates out (more opportunity) or
player limitations (reduced minutes/snap count)
4. PACE & GAME ENVIRONMENT: High totals = more fantasy points
5. HOME/AWAY SPLITS: Some players perform significantly differently
Output specific stat projections and DFS salary value ratings.
Format as a ranked list with confidence levels."""
}, {
"role": "user",
"content": f"""Generate fantasy projections for today's {sport.upper()} slate:
PLAYER STATS (recent):
{json.dumps([s.model_dump() for s in stats[:30]], indent=2)}
INJURIES AFFECTING SLATE:
{json.dumps([i.model_dump() for i in injuries[:20]], indent=2)}
Provide specific stat projections, DFS value ratings, and stack recommendations."""
}]
)
return projections.choices[0].message.content
def detect_line_movement(conn, current_odds: list) -> list[LineMovement]:
"""Detect significant line movements by comparing to stored odds."""
movements = []
cursor = conn.cursor()
for odds in current_odds:
cursor.execute("""
SELECT line, odds_american, scraped_at
FROM betting_odds
WHERE sportsbook = ? AND event_name = ? AND market_type = ? AND selection = ?
ORDER BY scraped_at ASC LIMIT 1
""", (odds.sportsbook, odds.event_name, odds.market_type, odds.selection))
opening = cursor.fetchone()
if opening and odds.line is not None:
open_line, open_odds, open_time = opening
if open_line is not None:
movement = abs(odds.line - open_line)
if movement >= 0.5: # Significant movement threshold
# Check for reverse line movement (sharp indicator)
cursor.execute("""
SELECT tickets_pct, money_pct
FROM betting_odds
WHERE event_name = ? AND market_type = ?
ORDER BY scraped_at DESC LIMIT 1
""", (odds.event_name, odds.market_type))
action = cursor.fetchone()
sharp = False
if action and action[0] and action[1]:
# Reverse line movement: line moves opposite to public tickets
if action[0] > 60 and odds.line < open_line:
sharp = True
elif action[0] < 40 and odds.line > open_line:
sharp = True
movements.append(LineMovement(
sport=odds.sport,
event_name=odds.event_name,
event_date=odds.event_date,
market_type=odds.market_type,
sportsbook=odds.sportsbook,
opening_line=open_line,
current_line=odds.line,
opening_odds=open_odds,
current_odds=odds.odds_american,
movement_size=movement,
movement_pct=(movement / abs(open_line)) * 100 if open_line != 0 else 0,
tickets_pct=action[0] if action else None,
money_pct=action[1] if action else None,
sharp_indicator=sharp,
timestamp=datetime.now().isoformat()
))
return movements
Step 6: Real-Time Alerting
Send alerts to Slack or Discord when edges are detected:
import os
SLACK_WEBHOOK = os.environ.get("SLACK_WEBHOOK_URL")
DISCORD_WEBHOOK = os.environ.get("DISCORD_WEBHOOK_URL")
def send_sports_alert(edges: str, movements: list, injuries: list):
"""Send sports intelligence alerts."""
blocks = [{
"type": "header",
"text": {"type": "plain_text", "text": "๐ Sports Intelligence Alert"}
}]
# Sharp line movements
sharp_moves = [m for m in movements if m.sharp_indicator]
if sharp_moves:
move_text = "\n".join([
f"โข ๐ฅ {m.event_name}: {m.market_type} moved {m.opening_line} โ {m.current_line} "
f"({m.sportsbook}) โ SHARP ACTION (public {m.tickets_pct:.0f}% vs money {m.money_pct:.0f}%)"
for m in sharp_moves[:5]
])
blocks.append({
"type": "section",
"text": {"type": "mrkdwn", "text": f"*๐ฏ Sharp Money Detected:*\n{move_text}"}
})
# High-impact injuries
high_impact = [i for i in injuries if i.status in ("out", "doubtful")]
if high_impact:
injury_text = "\n".join([
f"โข ๐ฅ {i.player_name} ({i.team}) โ {i.status.upper()}: {i.injury_type}"
for i in high_impact[:8]
])
blocks.append({
"type": "section",
"text": {"type": "mrkdwn", "text": f"*๐ฅ Key Injuries:*\n{injury_text}"}
})
# AI edge analysis (truncated)
if edges:
summary = edges[:2000] + "..." if len(edges) > 2000 else edges
blocks.append({
"type": "section",
"text": {"type": "mrkdwn", "text": f"*๐ค AI Edge Analysis:*\n{summary}"}
})
requests.post(SLACK_WEBHOOK, json={"blocks": blocks})
# Also send to Discord for the sports community
if DISCORD_WEBHOOK:
discord_msg = "## ๐ Sports Intelligence Alert\n\n"
if sharp_moves:
discord_msg += "**Sharp Money:**\n" + "\n".join([
f"- {m.event_name}: {m.opening_line} โ {m.current_line} (SHARP)"
for m in sharp_moves[:5]
]) + "\n\n"
requests.post(DISCORD_WEBHOOK, json={"content": discord_msg[:2000]})
๐ Start Building Your Sports Intelligence Agent
Track odds across sportsbooks, monitor injuries, and detect edges automatically. Free tier includes 100 API calls/month.
Get Your API Key โCost Comparison: Traditional vs. AI Agent
| Platform | Annual Cost | Coverage | Customization |
|---|---|---|---|
| Sportradar | $100K - $2M+ | Real-time feeds, all major leagues | API access, fixed schemas |
| Stats Perform (Opta) | $500K - $2M | Soccer, cricket, deep event data | API + widgets |
| SportsDataIO | $6K - $24K | US sports, odds, projections | REST API |
| Action Network Pro | $600 - $2,400 | Odds, sharp action, basic analytics | Dashboard only |
| AI Agent + Mantis | $348 - $3,588 | Any public source, real-time | Fully customizable, your models |
Sportradar and Stats Perform own the real-time, in-play data market โ that's genuinely hard to replicate. But for pre-game odds comparison, injury monitoring, line movement tracking, and historical stats, an AI agent checking public sources every few minutes delivers tremendous value at a fraction of the cost.
Use Cases by Segment
1. Sports Bettors & Syndicates
Line shop across 10+ sportsbooks simultaneously to always get the best number. Track line movements from open to close to identify sharp vs. public action. Monitor injury news and quantify expected line impact before books adjust. Build custom models using historical odds and results data that would cost $50K+ from a data provider.
2. Fantasy Sports Players (DFS & Season-Long)
Generate ownership projections by scraping DFS optimizer sites and forums. Track late-breaking injury news that affects player value minutes before lock. Build custom projection models using multi-source stat data. Monitor lineup percentages to find contrarian plays in GPP tournaments.
3. Sports Media & Content Creators
Automate data-driven content: "This week's biggest line movements" or "Injury report breakdown." Generate real-time odds comparison graphics for social media. Track betting market consensus to identify the games where sharps and public disagree most โ great content hooks.
4. Sports Analytics Startups
Bootstrap your data layer without $1M+ Sportradar contracts. Build MVP products using scraped public data, then upgrade to official feeds once you have revenue. Focus your funding on model development and UX instead of data acquisition. Validate product-market fit before committing to expensive data partnerships.
Advanced: Multi-Book Arbitrage & Value Detection
Combine odds from multiple sportsbooks to find guaranteed profit opportunities:
def find_arbitrage_opportunities(all_odds: list) -> list:
"""Find arbitrage opportunities across sportsbooks."""
# Group odds by event and market
markets = {}
for odds in all_odds:
key = f"{odds.event_name}|{odds.market_type}"
if key not in markets:
markets[key] = []
markets[key].append(odds)
arb_opportunities = []
for market_key, odds_list in markets.items():
# For two-way markets (spread, moneyline, total)
# Find best odds on each side across all books
sides = {}
for odds in odds_list:
selection_side = odds.selection.split()[0] if odds.selection else "unknown"
if selection_side not in sides:
sides[selection_side] = []
sides[selection_side].append(odds)
if len(sides) == 2:
side_keys = list(sides.keys())
best_a = max(sides[side_keys[0]], key=lambda x: x.odds_american)
best_b = max(sides[side_keys[1]], key=lambda x: x.odds_american)
# Convert to implied probability
prob_a = american_to_implied(best_a.odds_american)
prob_b = american_to_implied(best_b.odds_american)
total_implied = prob_a + prob_b
if total_implied < 1.0: # Arbitrage exists!
profit_pct = (1.0 / total_implied - 1.0) * 100
arb_opportunities.append({
"market": market_key,
"profit_pct": profit_pct,
"side_a": {
"selection": best_a.selection,
"sportsbook": best_a.sportsbook,
"odds": best_a.odds_american,
"stake_pct": (prob_b / (prob_a + prob_b)) * 100
},
"side_b": {
"selection": best_b.selection,
"sportsbook": best_b.sportsbook,
"odds": best_b.odds_american,
"stake_pct": (prob_a / (prob_a + prob_b)) * 100
}
})
return sorted(arb_opportunities, key=lambda x: -x["profit_pct"])
def american_to_implied(odds: int) -> float:
"""Convert American odds to implied probability."""
if odds > 0:
return 100 / (odds + 100)
else:
return abs(odds) / (abs(odds) + 100)
def find_expected_value_bets(odds: list, model_probabilities: dict) -> list:
"""Compare model probabilities to market odds to find +EV bets."""
ev_bets = []
for o in odds:
market_key = f"{o.event_name}|{o.selection}"
if market_key in model_probabilities:
model_prob = model_probabilities[market_key]
implied_prob = american_to_implied(o.odds_american)
edge = model_prob - implied_prob
if edge > 0.03: # 3%+ edge threshold
# Kelly Criterion for optimal bet sizing
decimal_odds = (o.odds_american / 100 + 1) if o.odds_american > 0 else (100 / abs(o.odds_american) + 1)
kelly = (model_prob * decimal_odds - 1) / (decimal_odds - 1)
ev_bets.append({
"selection": o.selection,
"event": o.event_name,
"sportsbook": o.sportsbook,
"odds": o.odds_american,
"model_probability": model_prob,
"implied_probability": implied_prob,
"edge": edge,
"kelly_fraction": kelly,
"half_kelly": kelly / 2 # Conservative sizing
})
return sorted(ev_bets, key=lambda x: -x["edge"])
Compliance & Responsible Gambling
Sports data scraping carries unique legal and ethical considerations:
- Public odds data: Sportsbook odds displayed on public-facing websites are generally considered public information. However, some books explicitly prohibit scraping in their Terms of Service. Respect robots.txt and rate limits.
- Stats are facts: Sports statistics (scores, box scores, player stats) are factual data and generally not copyrightable. However, proprietary advanced metrics and compiled databases may have protections.
- Sportsbook ToS: Many sportsbooks prohibit automated odds scraping. Using scraped odds for personal analysis is different from redistributing them commercially. Be aware of the distinction.
- State regulations: Sports betting legality varies by state. Ensure your use complies with local gambling regulations. Some states restrict certain types of betting analysis tools.
- Responsible gambling: Any system that facilitates sports betting should include responsible gambling resources. Set loss limits, track ROI honestly, and never bet more than you can afford to lose.
- Rate limiting: Sportsbook websites handle heavy traffic but aggressive scraping can trigger IP bans. Use reasonable intervals (30+ seconds between requests to the same domain) and cache effectively.
- No insider information: Using non-public injury or team information for betting purposes may violate sports integrity regulations. Only use publicly available data.
Getting Started
Ready to build your sports intelligence system? Here's the quick start:
- Get a Mantis API key at mantisapi.com โ free tier includes 100 API calls/month
- Start with one sport โ Pick NFL, NBA, or MLB and scrape odds from 3-4 sportsbooks
- Add injury monitoring โ Scrape ESPN and CBS Sports injury reports every 30 minutes
- Track line movements โ Store odds snapshots and detect significant movements
- Layer in AI analysis โ GPT-4o finds edges, generates projections, and identifies sharp action
- Scale across sports โ Once your pipeline works for one sport, adding others is straightforward
๐ Build Your Sports Intelligence Agent
Track odds, injuries, and line movements across every major sport. Free tier includes 100 API calls/month.
Get Your API Key โFurther Reading
- The Complete Guide to Web Scraping with AI Agents in 2026
- Web Scraping for Price Monitoring: Build an AI-Powered Price Tracker
- Web Scraping for Market Research: Analyze Competitors, Trends & Opportunities
- Structured Data Extraction with AI: Clean Data from Any Page
- Web Scraping for Financial Data: Track Markets, SEC Filings & Earnings
- Web Scraping for Media & Entertainment: Track Content, Streaming & Ad Data