Web Scraping for Gaming & Esports: How AI Agents Track Game Data, Player Stats, Market Prices & Competitive Intelligence in 2026
The global gaming industry generates over $200 billion annually, with esports alone surpassing $2 billion in revenue and reaching 600+ million viewers worldwide by 2026. From Steam marketplace economies worth billions to professional esports leagues with multi-million dollar prize pools, the gaming ecosystem generates massive amounts of actionable data โ most of it scattered across dozens of platforms, APIs, and community sites.
Traditional gaming analytics platforms like Newzoo ($15K-60K/year), Sensor Tower ($5K-25K/mo), and Esports Charts ($2K-10K/mo) provide valuable insights but at enterprise price points that exclude indie studios, content creators, and smaller esports organizations. AI agents powered by web scraping can deliver 80-90% of these insights at 95% lower cost.
In this guide, you'll learn how to build AI-powered gaming intelligence systems that automatically track Steam market prices, esports tournament results, player performance analytics, game review sentiment, streaming metrics, and competitive release calendars.
Why AI Agents Are Transforming Gaming Intelligence
The gaming industry's data landscape is uniquely fragmented:
- Marketplace data across Steam, Epic Games Store, PlayStation Store, Xbox Marketplace, Nintendo eShop โ each with different structures
- Esports stats scattered across HLTV (CS2), OP.GG/U.GG (League of Legends), Liquipedia (multi-game), Tracker.gg (Valorant, Apex), FACEIT, ESEA
- Streaming metrics on Twitch, YouTube Gaming, Kick โ viewer counts, streamer analytics, game popularity trends
- Community sentiment across Reddit (r/gaming, game-specific subs), Steam reviews, Metacritic, OpenCritic, Discord communities
- Market data โ Steam Community Market prices, skin trading sites (CSFloat, Buff163), game key resellers
An AI agent can unify all these sources, detect patterns humans would miss, and deliver real-time intelligence that would take a team of analysts days to compile manually.
Step 1: Steam Marketplace & Game Store Intelligence
The Steam Community Market processes millions of transactions daily, with some CS2 skins trading for over $100,000. Monitoring price movements, detecting arbitrage opportunities, and tracking new listings requires continuous data collection.
Defining Your Data Models
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional
from enum import Enum
class GamePlatform(str, Enum):
STEAM = "steam"
EPIC = "epic"
PLAYSTATION = "playstation"
XBOX = "xbox"
NINTENDO = "nintendo"
class GameListing(BaseModel):
"""Game store listing with pricing and metadata"""
title: str
platform: GamePlatform
price_current: float
price_original: Optional[float] = None
discount_pct: Optional[int] = None
release_date: Optional[str] = None
developer: str
publisher: str
genres: list[str] = []
tags: list[str] = []
review_score: Optional[float] = None
review_count: Optional[int] = None
peak_concurrent: Optional[int] = None
url: str
scraped_at: datetime = Field(default_factory=datetime.utcnow)
class MarketItem(BaseModel):
"""Steam Community Market or skin marketplace item"""
item_name: str
game: str
price_usd: float
price_7d_avg: Optional[float] = None
price_30d_avg: Optional[float] = None
volume_24h: Optional[int] = None
listings_count: int
float_value: Optional[float] = None # For CS2 skins
rarity: Optional[str] = None
wear: Optional[str] = None
stickers: list[str] = []
source: str # steam_market, csfloat, buff163
url: str
scraped_at: datetime = Field(default_factory=datetime.utcnow)
class PriceAlert(BaseModel):
"""Triggered when market conditions match criteria"""
item_name: str
alert_type: str # price_drop, price_spike, arbitrage, new_listing
current_price: float
reference_price: float
change_pct: float
opportunity_score: float # 0-100
source: str
details: str
Multi-Platform Game Price Monitoring
import requests
from datetime import datetime
MANTIS_API = "https://api.mantisapi.com/v1"
API_KEY = "your_mantis_api_key"
def scrape_steam_game(app_id: str) -> GameListing:
"""Scrape a Steam game page for pricing and metadata"""
url = f"https://store.steampowered.com/app/{app_id}"
response = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": url,
"extract": {
"type": "schema",
"schema": GameListing.model_json_schema()
},
"javascript": True, # Steam uses JS rendering
"cookies": {"birthtime": "0", "mature_content": "1"}
}
)
return GameListing(**response.json()["data"])
def scrape_steam_market_item(market_hash_name: str) -> MarketItem:
"""Scrape Steam Community Market item pricing"""
url = f"https://steamcommunity.com/market/listings/730/{market_hash_name}"
response = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": url,
"extract": {
"type": "schema",
"schema": MarketItem.model_json_schema()
},
"javascript": True
}
)
return MarketItem(**response.json()["data"])
def monitor_sale_events():
"""Track major sale events across platforms"""
sale_pages = [
"https://store.steampowered.com/specials",
"https://store.epicgames.com/en-US/free-games",
"https://www.playstation.com/en-us/ps-store/deals/",
"https://www.xbox.com/en-US/games/deals"
]
deals = []
for url in sale_pages:
response = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": url,
"extract": {
"type": "list",
"schema": GameListing.model_json_schema(),
"max_items": 50
},
"javascript": True
}
)
deals.extend(response.json()["data"])
return deals
Step 2: Esports Tournament & Player Analytics
Professional esports generates terabytes of performance data across leagues like LEC/LCS (League of Legends), BLAST/ESL (CS2), VCT (Valorant), OWL (Overwatch), and CDL (Call of Duty). Tracking player stats, team performance, roster changes, and tournament results is critical for teams, analysts, bettors, and content creators.
Esports Data Models
class EsportsPlayer(BaseModel):
"""Professional esports player profile"""
username: str
real_name: Optional[str] = None
team: str
game: str
role: str # entry, awp, igl, support, etc.
country: str
rating: Optional[float] = None # e.g., HLTV 2.0 rating
kd_ratio: Optional[float] = None
headshot_pct: Optional[float] = None
maps_played: Optional[int] = None
winrate: Optional[float] = None
earnings_usd: Optional[float] = None
source: str
profile_url: str
scraped_at: datetime = Field(default_factory=datetime.utcnow)
class TournamentResult(BaseModel):
"""Esports tournament or match result"""
tournament_name: str
game: str
tier: str # S-tier, A-tier, B-tier
prize_pool_usd: Optional[float] = None
team_1: str
team_2: str
score: str # e.g., "2-1", "16-12"
map_name: Optional[str] = None
date: str
mvp: Optional[str] = None
vod_url: Optional[str] = None
source: str
scraped_at: datetime = Field(default_factory=datetime.utcnow)
class RosterChange(BaseModel):
"""Team roster move (transfer, bench, retire)"""
player: str
from_team: Optional[str] = None
to_team: Optional[str] = None
game: str
move_type: str # transfer, bench, retire, loan, stand-in
confirmed: bool
reported_by: str
source_url: str
date: str
scraped_at: datetime = Field(default_factory=datetime.utcnow)
Multi-Source Esports Scraping
def scrape_hltv_rankings() -> list[EsportsPlayer]:
"""Scrape HLTV world rankings for CS2"""
response = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": "https://www.hltv.org/ranking/teams",
"extract": {
"type": "list",
"schema": {
"team_name": "string",
"ranking": "integer",
"points": "integer",
"players": "list[string]",
"change": "integer"
},
"max_items": 30
},
"javascript": True,
"headers": {
"User-Agent": "Mozilla/5.0 (compatible; research bot)"
}
}
)
return response.json()["data"]
def scrape_tournament_results(game: str = "cs2") -> list[TournamentResult]:
"""Scrape recent tournament results from Liquipedia"""
game_paths = {
"cs2": "counterstrike",
"valorant": "valorant",
"lol": "leagueoflegends",
"dota2": "dota2"
}
url = f"https://liquipedia.net/{game_paths[game]}/Tournaments"
response = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": url,
"extract": {
"type": "list",
"schema": TournamentResult.model_json_schema(),
"max_items": 20
}
}
)
return [TournamentResult(**t) for t in response.json()["data"]]
def track_roster_moves() -> list[RosterChange]:
"""Monitor roster changes from multiple sources"""
sources = [
"https://www.hltv.org/news/archive",
"https://www.vlr.gg/news",
"https://dotesports.com/league-of-legends",
"https://liquipedia.net/counterstrike/Portal:Transfers"
]
all_moves = []
for url in sources:
response = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": url,
"extract": {
"type": "list",
"schema": RosterChange.model_json_schema(),
"filter": "roster changes, transfers, benchings"
}
}
)
all_moves.extend(response.json()["data"])
return all_moves
Step 3: Streaming & Content Creator Analytics
Twitch alone averages 2.5+ million concurrent viewers, with top streamers generating millions in revenue. Tracking streaming trends reveals which games are gaining or losing cultural relevance โ often weeks before sales data reflects it.
class StreamMetrics(BaseModel):
"""Streaming platform metrics for a game or streamer"""
name: str # game title or streamer name
entity_type: str # game or streamer
platform: str # twitch, youtube, kick
current_viewers: int
peak_viewers_24h: Optional[int] = None
avg_viewers_7d: Optional[int] = None
hours_watched_7d: Optional[int] = None
active_channels: Optional[int] = None
top_streamers: list[str] = []
viewer_change_pct: Optional[float] = None # vs previous period
scraped_at: datetime = Field(default_factory=datetime.utcnow)
class ContentTrend(BaseModel):
"""Gaming content trend detected across platforms"""
topic: str
game: str
trend_type: str # viral_clip, new_meta, controversy, update_hype
platforms_detected: list[str]
estimated_reach: Optional[int] = None
sentiment: float # -1 to 1
velocity: float # how fast it's growing
first_seen: datetime
peak_time: Optional[datetime] = None
key_creators: list[str] = []
example_urls: list[str] = []
def track_game_streaming_metrics() -> list[StreamMetrics]:
"""Track top games by viewership across streaming platforms"""
response = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": "https://sullygnome.com/Games/30Days/Watched",
"extract": {
"type": "list",
"schema": StreamMetrics.model_json_schema(),
"max_items": 50
},
"javascript": True
}
)
return [StreamMetrics(**g) for g in response.json()["data"]]
def detect_trending_content() -> list[ContentTrend]:
"""Identify viral gaming content and trending topics"""
subreddits = [
"https://www.reddit.com/r/gaming/top/?t=day",
"https://www.reddit.com/r/pcgaming/top/?t=day",
"https://www.reddit.com/r/Games/top/?t=day",
"https://www.reddit.com/r/GlobalOffensive/top/?t=day",
"https://www.reddit.com/r/leagueoflegends/top/?t=day"
]
trends = []
for url in subreddits:
response = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": url,
"extract": {
"type": "list",
"schema": {
"title": "string",
"upvotes": "integer",
"comments": "integer",
"url": "string",
"flair": "string"
},
"max_items": 10
}
}
)
trends.extend(response.json()["data"])
return trends
Step 4: Game Review Sentiment & Launch Intelligence
Steam reviews alone generate millions of user reviews per month. Monitoring sentiment shifts โ especially during game launches, major updates, or controversies โ provides early signals for player retention and revenue trajectory.
class GameReviewSentiment(BaseModel):
"""Aggregated review sentiment for a game"""
game_title: str
platform: str
overall_score: float # 0-100
recent_score: float # last 30 days
total_reviews: int
recent_reviews_count: int
positive_pct: float
negative_pct: float
common_praises: list[str] # top positive themes
common_complaints: list[str] # top negative themes
review_bombed: bool # detected review bombing
sentiment_trend: str # improving, stable, declining
scraped_at: datetime = Field(default_factory=datetime.utcnow)
class GameLaunchTracker(BaseModel):
"""Upcoming game release tracking"""
title: str
developer: str
publisher: str
release_date: str
platforms: list[str]
genre: str
hype_score: Optional[float] = None # based on wishlists, social buzz
steam_wishlists: Optional[int] = None
pre_order_available: bool
beta_access: bool
trailer_views: Optional[int] = None
subreddit_subscribers: Optional[int] = None
source: str
def analyze_game_sentiment(app_id: str) -> GameReviewSentiment:
"""Analyze Steam review sentiment with AI"""
response = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": f"https://store.steampowered.com/app/{app_id}",
"extract": {
"type": "schema",
"schema": GameReviewSentiment.model_json_schema(),
"instructions": "Analyze the review section. Identify common themes in positive and negative reviews. Detect if there's review bombing (sudden spike in negative reviews). Calculate sentiment trend."
},
"javascript": True
}
)
return GameReviewSentiment(**response.json()["data"])
def track_upcoming_releases() -> list[GameLaunchTracker]:
"""Monitor upcoming game releases and hype levels"""
sources = [
"https://store.steampowered.com/explore/upcoming",
"https://www.metacritic.com/browse/game/all/upcoming/date",
"https://howlongtobeat.com/steam/most_anticipated"
]
releases = []
for url in sources:
response = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": url,
"extract": {
"type": "list",
"schema": GameLaunchTracker.model_json_schema(),
"max_items": 20
},
"javascript": True
}
)
releases.extend(response.json()["data"])
return releases
Step 5: In-Game Economy & Virtual Asset Intelligence
Virtual economies in games like CS2, Dota 2, and various MMOs represent billions of dollars in traded value. The CS2 skin market alone processes an estimated $2+ billion annually across Steam Market, third-party sites, and P2P trades.
class SkinMarketAnalysis(BaseModel):
"""Cross-platform skin/item price analysis"""
item_name: str
game: str
steam_price: Optional[float] = None
buff163_price: Optional[float] = None
csfloat_price: Optional[float] = None
skinport_price: Optional[float] = None
price_spread_pct: float # max difference across platforms
arbitrage_opportunity: bool
estimated_profit: Optional[float] = None
volume_trend: str # increasing, stable, decreasing
price_trend_7d: str # up, stable, down
rarity_tier: str
collection: Optional[str] = None
class EconomyAlert(BaseModel):
"""Alert for significant in-game economy events"""
game: str
alert_type: str # case_drop, operation_launch, trade_hold_change, market_crash, new_collection
description: str
affected_items: list[str]
estimated_impact: str # price increase/decrease prediction
confidence: float
source_url: str
detected_at: datetime = Field(default_factory=datetime.utcnow)
def cross_platform_arbitrage_scan() -> list[SkinMarketAnalysis]:
"""Scan for price discrepancies across skin marketplaces"""
# Get top traded items from Steam Market
steam_items = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": "https://steamcommunity.com/market/search?appid=730&q=&sort_column=quantity&sort_dir=desc",
"extract": {
"type": "list",
"schema": {
"item_name": "string",
"price": "float",
"quantity": "integer",
"url": "string"
},
"max_items": 100
},
"javascript": True
}
).json()["data"]
# Cross-reference with Buff163 and CSFloat
analyses = []
for item in steam_items[:50]: # Top 50 by volume
buff_price = requests.post(
f"{MANTIS_API}/scrape",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"url": f"https://buff.163.com/market/csgo#tab=selling&search={item['item_name']}",
"extract": {"type": "schema", "schema": {"lowest_price": "float", "listings": "integer"}},
"javascript": True
}
).json()["data"]
spread = abs(item["price"] - buff_price.get("lowest_price", item["price"])) / item["price"] * 100
analyses.append(SkinMarketAnalysis(
item_name=item["item_name"],
game="CS2",
steam_price=item["price"],
buff163_price=buff_price.get("lowest_price"),
price_spread_pct=spread,
arbitrage_opportunity=spread > 10,
estimated_profit=item["price"] * spread / 100 if spread > 10 else None,
volume_trend="stable",
price_trend_7d="stable",
rarity_tier="varies"
))
return analyses
Step 6: AI-Powered Gaming Intelligence Engine
The real power comes from combining all data sources into an intelligent analysis engine that identifies patterns, predicts trends, and generates actionable insights.
from openai import OpenAI
client = OpenAI()
class GamingIntelligenceBrief(BaseModel):
"""Daily gaming intelligence summary"""
date: str
market_summary: str
top_trending_games: list[dict] # game, reason, metrics
esports_highlights: list[dict] # event, result, significance
upcoming_events: list[dict] # event, date, expected_impact
market_opportunities: list[dict] # opportunity, confidence, action
content_trends: list[dict] # trend, platform, velocity
risk_alerts: list[dict] # risk, game, severity
def generate_daily_brief(
market_data: list[MarketItem],
esports_results: list[TournamentResult],
stream_metrics: list[StreamMetrics],
reviews: list[GameReviewSentiment],
upcoming: list[GameLaunchTracker],
roster_moves: list[RosterChange]
) -> GamingIntelligenceBrief:
"""Generate AI-powered daily gaming intelligence brief"""
context = f"""
MARKET DATA (top movers):
{[{"item": m.item_name, "price": m.price_usd, "volume": m.volume_24h} for m in market_data[:20]]}
ESPORTS RESULTS (last 24h):
{[{"tournament": r.tournament_name, "teams": f"{r.team_1} vs {r.team_2}", "score": r.score} for r in esports_results[:10]]}
STREAMING (top games):
{[{"game": s.name, "viewers": s.current_viewers, "change": s.viewer_change_pct} for s in stream_metrics[:15]]}
REVIEW SENTIMENT (notable shifts):
{[{"game": r.game_title, "recent_score": r.recent_score, "trend": r.sentiment_trend} for r in reviews if r.sentiment_trend != "stable"]}
UPCOMING RELEASES (next 30 days):
{[{"title": u.title, "date": u.release_date, "wishlists": u.steam_wishlists} for u in upcoming[:10]]}
ROSTER MOVES:
{[{"player": m.player, "from": m.from_team, "to": m.to_team, "game": m.game} for m in roster_moves[:10]]}
"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": """You are an expert gaming industry analyst.
Analyze the data and produce a comprehensive daily intelligence brief.
Focus on: market-moving events, emerging trends, investment opportunities,
content creator angles, and risk factors. Be specific with numbers."""},
{"role": "user", "content": f"Generate today's gaming intelligence brief:\n{context}"}
],
response_format={"type": "json_object"}
)
return GamingIntelligenceBrief(**json.loads(response.choices[0].message.content))
# Schedule the complete pipeline
def run_gaming_intelligence_pipeline():
"""Run the full pipeline and deliver alerts"""
# Collect data from all sources
market_data = cross_platform_arbitrage_scan()
esports = scrape_tournament_results("cs2") + scrape_tournament_results("valorant")
streams = track_game_streaming_metrics()
reviews = [analyze_game_sentiment(app_id) for app_id in TRACKED_GAMES]
upcoming = track_upcoming_releases()
roster = track_roster_moves()
# Generate AI brief
brief = generate_daily_brief(market_data, esports, streams, reviews, upcoming, roster)
# Send alerts for high-priority items
for opp in brief.market_opportunities:
if opp["confidence"] > 0.8:
send_slack_alert(f"๐ฎ Gaming Opportunity: {opp['opportunity']}\nAction: {opp['action']}")
for alert in brief.risk_alerts:
if alert["severity"] == "high":
send_slack_alert(f"โ ๏ธ Gaming Risk: {alert['risk']}\nGame: {alert['game']}")
return brief
Enterprise Alternatives vs. AI Agent Approach
| Platform | Cost | Strengths | Limitations |
|---|---|---|---|
| Newzoo | $15K-60K/yr | Market sizing, forecasts, global coverage | Quarterly updates, limited real-time data |
| Sensor Tower | $5K-25K/mo | Mobile game analytics, download estimates | Mobile-focused, no esports/PC depth |
| Esports Charts | $2K-10K/mo | Esports viewership, tournament data | Streaming only, no marketplace/economy data |
| SteamDB | Free (limited) | Steam player counts, price history | Steam only, no cross-platform, no API |
| data.ai | $10K-50K/mo | Mobile app intelligence, revenue estimates | Mobile-centric, enterprise pricing |
| AI Agent + Mantis | $29-299/mo | Cross-platform, real-time, customizable | Requires setup, no proprietary panel data |
Use Cases by Audience
1. Game Studios & Publishers
- Monitor competitor launches, pricing, and player reception in real-time
- Track community sentiment during and after game updates
- Analyze streaming metrics to gauge organic marketing effectiveness
- Detect review bombing early and prepare response strategies
2. Esports Organizations & Teams
- Scout player talent by tracking stats across multiple platforms
- Monitor roster moves and free agent availability
- Analyze opponent strategies from match history and VODs
- Track sponsorship value through streaming viewership metrics
3. Gaming Investors & Analysts
- Predict game revenue trajectories from early player count and sentiment data
- Track in-game economy health as indicator of player engagement
- Monitor streaming trends for early signals of breakout hits
- Analyze publisher pipeline and competitive landscape
4. Content Creators & Streamers
- Identify trending games and content opportunities before saturation
- Track which content formats perform best across platforms
- Monitor competitor streamers for scheduling and content strategy
- Detect viral moments and trending topics for rapid content creation
Build Your Gaming Intelligence Agent
Start tracking game data, esports stats, and market prices with Mantis API. 100 free API calls/month.
Get Started Free โGetting Started
- Define your scope โ Which games, esports, and marketplaces matter to your use case?
- Set up data models โ Use the Pydantic schemas above as starting points and customize for your needs
- Build the scraping pipeline โ Start with 2-3 sources and expand as you validate the data quality
- Add AI analysis โ Use GPT-4o to correlate signals across sources and generate actionable insights
- Automate alerts โ Set up Slack/Discord webhooks for price movements, roster changes, and sentiment shifts
- Iterate on value โ Track which alerts drive actual decisions and refine your pipeline accordingly