Web Scraping for Education & EdTech: How AI Agents Track Courses, Pricing, Reviews & Academic Data in 2026
The global education market exceeds $7 trillion annually, with the EdTech segment alone projected to surpass $400 billion by 2028. Online learning platforms like Coursera, Udemy, and edX host millions of courses, while universities compete for shrinking enrollment pools. Yet most education organizations still track competitors manually โ browsing course catalogs, checking pricing pages, and reading reviews one by one.
Enterprise education analytics platforms like EAB, Keystone Academic Solutions, and Lightcast (formerly Burning Glass) charge $3,000โ$30,000+ per month for market intelligence. But the underlying data โ course catalogs, pricing, student reviews, enrollment trends โ is publicly available across hundreds of platforms. AI agents powered by web scraping APIs can build equivalent intelligence at a fraction of the cost.
In this guide, you'll build a complete education intelligence system using Python, the Mantis WebPerception API, and GPT-4o โ covering course catalog monitoring, pricing intelligence, student sentiment analysis, and competitive landscape tracking.
Why Education Organizations Need Web Scraping
The education market is more competitive than ever. Online learning has eliminated geographic barriers, meaning a university in Ohio competes with Coursera, Udemy, and universities worldwide for the same student. The organizations that win are the ones with real-time market intelligence:
- Course catalog monitoring โ Track competitor course launches, curriculum changes, and program expansions across Coursera, Udemy, edX, LinkedIn Learning, Skillshare, and university catalogs
- Pricing intelligence โ Monitor tuition changes, subscription pricing, course discounts, financial aid offerings, and promotional campaigns
- Student review analysis โ Aggregate reviews from Class Central, Course Report, RateMyProfessors, Trustpilot, and Reddit to understand student satisfaction
- Enrollment trend tracking โ Monitor waitlists, seat availability, cohort sizes, and program capacity signals
- Accreditation & compliance โ Track accreditation status changes, regulatory updates, and institutional announcements
- Skills demand mapping โ Scrape job postings to identify which skills employers demand, then align curriculum accordingly
- Faculty & instructor tracking โ Monitor instructor movements, new hires, and teaching assignments across institutions
Build Education Intelligence Agents with Mantis
Scrape course catalogs, pricing, reviews, and enrollment data from any education platform with one API call. AI-powered extraction handles every platform format.
Get Free API Key โArchitecture: The 6-Step Education Intelligence Pipeline
- Course catalog scraping โ Monitor course offerings, descriptions, prerequisites, and learning outcomes across platforms
- Pricing & discount tracking โ Track tuition, subscription tiers, promotional pricing, and financial aid changes
- Student review monitoring โ Aggregate sentiment from review platforms, forums, and social media
- Enrollment & demand signals โ Track waitlists, seat counts, cohort sizes, and program capacity
- GPT-4o competitive analysis โ Identify curriculum gaps, pricing opportunities, and market positioning insights
- Alert delivery โ Route competitor launches, price changes, and review spikes to product and marketing teams via Slack
Step 1: Define Your Education Data Models
from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime
from enum import Enum
class PlatformType(str, Enum):
MOOC = "mooc" # Coursera, edX, Udemy
UNIVERSITY = "university"
BOOTCAMP = "bootcamp"
CORPORATE_LEARNING = "corporate_learning"
K12 = "k12"
CERTIFICATION = "certification"
class ContentFormat(str, Enum):
SELF_PACED = "self_paced"
COHORT_BASED = "cohort_based"
LIVE_ONLINE = "live_online"
HYBRID = "hybrid"
IN_PERSON = "in_person"
class CourseListing(BaseModel):
"""Course or program from an education platform."""
title: str
provider: str # "coursera", "udemy", "edx", "harvard", etc.
platform_type: PlatformType
url: str
category: str # "data-science", "web-development", "business", etc.
subcategory: Optional[str]
description: str
instructor: Optional[str]
institution: Optional[str] # partner university/company
format: ContentFormat
duration_hours: Optional[float]
duration_weeks: Optional[int]
skill_level: Optional[str] # "beginner", "intermediate", "advanced"
skills_taught: List[str]
prerequisites: List[str]
has_certificate: bool
certificate_type: Optional[str] # "professional certificate", "MicroMasters", "degree"
language: str = "English"
rating: Optional[float]
review_count: Optional[int]
enrollment_count: Optional[int]
last_updated: Optional[str]
scraped_at: datetime
class CoursePricing(BaseModel):
"""Pricing details for a course or program."""
course_title: str
provider: str
url: str
pricing_model: str # "one_time", "subscription", "per_course", "tuition"
price_usd: Optional[float]
original_price_usd: Optional[float] # before discount
discount_pct: Optional[float]
subscription_monthly: Optional[float]
subscription_annual: Optional[float]
financial_aid_available: bool
free_trial_days: Optional[int]
free_audit_available: bool
refund_policy: Optional[str]
promo_code: Optional[str]
promo_expires: Optional[datetime]
currency: str = "USD"
scraped_at: datetime
class StudentReview(BaseModel):
"""Student review from a course or institution."""
course_title: str
provider: str
reviewer_name: Optional[str]
rating: float # 1-5
title: Optional[str]
text: str
pros: List[str]
cons: List[str]
verified_purchase: bool
completion_status: Optional[str] # "completed", "in_progress", "dropped"
review_date: Optional[str]
helpful_votes: Optional[int]
source: str # "class_central", "coursera", "trustpilot", "reddit"
scraped_at: datetime
class EnrollmentData(BaseModel):
"""Enrollment and demand signals for a course or program."""
course_title: str
provider: str
url: str
total_enrolled: Optional[int]
enrolled_this_month: Optional[int]
waitlist_count: Optional[int]
seats_available: Optional[int]
seats_total: Optional[int]
next_cohort_date: Optional[str]
cohort_count: Optional[int] # number of cohorts running
trending_rank: Optional[int] # platform's trending/popular rank
category_rank: Optional[int]
completion_rate_pct: Optional[float]
scraped_at: datetime
Step 2: Scrape Course Catalogs Across Platforms
from mantis import MantisClient
import asyncio
mantis = MantisClient(api_key="your-mantis-api-key")
async def scrape_coursera_catalog(
categories: List[str] = ["data-science", "computer-science", "business", "information-technology"]
) -> List[CourseListing]:
"""
Scrape Coursera's course catalog by category.
Coursera hosts 7,000+ courses from 300+ university and industry partners.
"""
courses = []
for category in categories:
result = await mantis.scrape(
url=f"https://www.coursera.org/courses?query={category}&sortBy=BEST_MATCH",
extract={
"courses": [{
"title": "string",
"institution": "string (partner university or company)",
"instructor": "string or null",
"description": "string",
"rating": "number (1-5) or null",
"review_count": "number or null",
"enrollment_count": "number or null",
"duration": "string",
"level": "string (beginner, intermediate, advanced, mixed)",
"skills": ["string"],
"type": "string (course, specialization, professional certificate, degree)",
"url": "string (relative path)",
"has_financial_aid": "boolean",
"is_free_audit": "boolean"
}],
"total_results": "number",
"page": "number"
}
)
for c in result.get("courses", []):
course = CourseListing(
title=c.get("title", ""),
provider="coursera",
platform_type=PlatformType.MOOC,
url=f"https://www.coursera.org{c.get('url', '')}",
category=category,
description=c.get("description", ""),
instructor=c.get("instructor"),
institution=c.get("institution"),
format=ContentFormat.SELF_PACED,
skill_level=c.get("level"),
skills_taught=c.get("skills", []),
prerequisites=[],
has_certificate=True,
certificate_type=c.get("type"),
rating=c.get("rating"),
review_count=c.get("review_count"),
enrollment_count=c.get("enrollment_count"),
scraped_at=datetime.now()
)
courses.append(course)
return courses
async def scrape_udemy_catalog(
categories: List[str] = ["development", "data-science", "business", "it-and-software"]
) -> List[CourseListing]:
"""
Scrape Udemy's course marketplace.
Udemy hosts 200,000+ courses with instructor-set pricing.
"""
courses = []
for category in categories:
result = await mantis.scrape(
url=f"https://www.udemy.com/courses/{category}/?sort=popularity&ratings=4.0",
extract={
"courses": [{
"title": "string",
"instructor": "string",
"description": "string (subtitle/tagline)",
"rating": "number",
"review_count": "number",
"enrollment_count": "number",
"price_current": "number or null",
"price_original": "number or null",
"duration_hours": "number",
"lecture_count": "number",
"level": "string",
"last_updated": "string",
"url": "string"
}]
}
)
for c in result.get("courses", []):
course = CourseListing(
title=c.get("title", ""),
provider="udemy",
platform_type=PlatformType.MOOC,
url=f"https://www.udemy.com{c.get('url', '')}",
category=category,
description=c.get("description", ""),
instructor=c.get("instructor"),
format=ContentFormat.SELF_PACED,
duration_hours=c.get("duration_hours"),
skill_level=c.get("level"),
skills_taught=[],
prerequisites=[],
has_certificate=True,
certificate_type="completion",
rating=c.get("rating"),
review_count=c.get("review_count"),
enrollment_count=c.get("enrollment_count"),
last_updated=c.get("last_updated"),
scraped_at=datetime.now()
)
courses.append(course)
return courses
async def scrape_edx_catalog(
subjects: List[str] = ["computer-science", "data-science", "business-management"]
) -> List[CourseListing]:
"""
Scrape edX catalog โ courses from MIT, Harvard, Berkeley, and 160+ institutions.
Includes MicroMasters, Professional Certificates, and online degrees.
"""
courses = []
for subject in subjects:
result = await mantis.scrape(
url=f"https://www.edx.org/search?q={subject}&tab=course",
extract={
"courses": [{
"title": "string",
"institution": "string",
"description": "string",
"level": "string (introductory, intermediate, advanced)",
"duration_weeks": "number",
"effort_hours_per_week": "number",
"price": "number or null",
"is_free_audit": "boolean",
"program_type": "string (course, MicroMasters, Professional Certificate, XSeries, degree) or null",
"subject": "string",
"language": "string",
"url": "string"
}]
}
)
for c in result.get("courses", []):
course = CourseListing(
title=c.get("title", ""),
provider="edx",
platform_type=PlatformType.MOOC,
url=f"https://www.edx.org{c.get('url', '')}",
category=subject,
description=c.get("description", ""),
institution=c.get("institution"),
format=ContentFormat.SELF_PACED,
duration_weeks=c.get("duration_weeks"),
skill_level=c.get("level"),
skills_taught=[],
prerequisites=[],
has_certificate=True,
certificate_type=c.get("program_type"),
language=c.get("language", "English"),
scraped_at=datetime.now()
)
courses.append(course)
return courses
Step 3: Track Pricing Changes & Promotional Campaigns
async def track_course_pricing(
courses: List[CourseListing]
) -> List[CoursePricing]:
"""
Scrape detailed pricing for tracked courses.
Udemy runs frequent 80-90% off sales; Coursera offers free audits
with paid certificates; edX has verified track pricing.
"""
pricing_data = []
for course in courses:
result = await mantis.scrape(
url=course.url,
extract={
"price_current": "number or null",
"price_original": "number or null (before discount)",
"discount_pct": "number or null",
"subscription_option": {
"monthly": "number or null",
"annual": "number or null",
"name": "string or null (Coursera Plus, etc.)"
},
"free_audit_available": "boolean",
"free_trial_days": "number or null",
"financial_aid_available": "boolean",
"refund_policy": "string or null",
"promo_banner": "string or null (current promotion text)",
"bundle_pricing": {
"courses_in_bundle": "number or null",
"bundle_price": "number or null"
}
}
)
pricing = CoursePricing(
course_title=course.title,
provider=course.provider,
url=course.url,
pricing_model="subscription" if result.get("subscription_option", {}).get("monthly") else "one_time",
price_usd=result.get("price_current"),
original_price_usd=result.get("price_original"),
discount_pct=result.get("discount_pct"),
subscription_monthly=result.get("subscription_option", {}).get("monthly"),
subscription_annual=result.get("subscription_option", {}).get("annual"),
financial_aid_available=result.get("financial_aid_available", False),
free_trial_days=result.get("free_trial_days"),
free_audit_available=result.get("free_audit_available", False),
refund_policy=result.get("refund_policy"),
scraped_at=datetime.now()
)
pricing_data.append(pricing)
return pricing_data
async def detect_pricing_changes(
current_pricing: List[CoursePricing],
historical_db: str = "education_intelligence.db"
) -> dict:
"""
Compare current pricing against historical data to detect
significant changes, new promotions, and competitive moves.
"""
import sqlite3
conn = sqlite3.connect(historical_db)
alerts = {"price_drops": [], "price_increases": [], "new_promos": [], "free_changes": []}
for pricing in current_pricing:
prev = conn.execute("""
SELECT price_usd, discount_pct, free_audit_available,
subscription_monthly, financial_aid_available
FROM pricing_history
WHERE course_title = ? AND provider = ?
ORDER BY scraped_at DESC LIMIT 1
""", (pricing.course_title, pricing.provider)).fetchone()
if prev:
prev_price, prev_discount, prev_free, prev_sub, prev_aid = prev
# Price drop >20%
if prev_price and pricing.price_usd and pricing.price_usd < prev_price * 0.80:
alerts["price_drops"].append({
"course": pricing.course_title,
"provider": pricing.provider,
"old_price": prev_price,
"new_price": pricing.price_usd,
"drop_pct": round(((prev_price - pricing.price_usd) / prev_price) * 100, 1),
"url": pricing.url
})
# Price increase >10%
if prev_price and pricing.price_usd and pricing.price_usd > prev_price * 1.10:
alerts["price_increases"].append({
"course": pricing.course_title,
"provider": pricing.provider,
"old_price": prev_price,
"new_price": pricing.price_usd,
"increase_pct": round(((pricing.price_usd - prev_price) / prev_price) * 100, 1)
})
# Free audit removed
if prev_free and not pricing.free_audit_available:
alerts["free_changes"].append({
"course": pricing.course_title,
"provider": pricing.provider,
"change": "Free audit REMOVED",
"url": pricing.url
})
# Store current pricing
conn.execute(
"""INSERT INTO pricing_history
(course_title, provider, price_usd, discount_pct, free_audit_available,
subscription_monthly, financial_aid_available, scraped_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
(pricing.course_title, pricing.provider, pricing.price_usd,
pricing.discount_pct, pricing.free_audit_available,
pricing.subscription_monthly, pricing.financial_aid_available,
pricing.scraped_at.isoformat())
)
conn.commit()
conn.close()
return alerts
Step 4: Student Review Monitoring & Sentiment Analysis
async def scrape_student_reviews(
courses: List[CourseListing],
sources: List[str] = ["class_central", "coursera", "trustpilot"]
) -> List[StudentReview]:
"""
Aggregate student reviews from multiple platforms.
Class Central aggregates reviews across MOOCs; Trustpilot covers bootcamps;
Reddit r/learnprogramming and r/datascience have authentic student experiences.
"""
reviews = []
source_configs = {
"class_central": {
"url_template": "https://www.classcentral.com/search?q={}",
"extract": {
"reviews": [{
"reviewer": "string or null",
"rating": "number (1-5)",
"text": "string",
"date": "string",
"course_taken": "string",
"provider": "string"
}]
}
},
"trustpilot": {
"url_template": "https://www.trustpilot.com/review/{}",
"extract": {
"reviews": [{
"reviewer": "string",
"rating": "number (1-5)",
"title": "string",
"text": "string",
"date": "string",
"verified": "boolean"
}]
}
}
}
for course in courses:
for source in sources:
config = source_configs.get(source)
if not config:
continue
query = course.title.replace(" ", "+")
url = config["url_template"].format(query if source != "trustpilot" else course.provider + ".com")
result = await mantis.scrape(
url=url,
extract=config["extract"]
)
for r in result.get("reviews", []):
review = StudentReview(
course_title=course.title,
provider=course.provider,
reviewer_name=r.get("reviewer"),
rating=r.get("rating", 3),
title=r.get("title"),
text=r.get("text", ""),
pros=[],
cons=[],
verified_purchase=r.get("verified", False),
review_date=r.get("date"),
source=source,
scraped_at=datetime.now()
)
reviews.append(review)
return reviews
async def analyze_review_sentiment(reviews: List[StudentReview]) -> dict:
"""
Use GPT-4o to extract sentiment themes from student reviews.
Identifies what students love, hate, and wish for.
"""
from openai import OpenAI
client = OpenAI()
# Group reviews by course
by_course = {}
for r in reviews:
key = f"{r.course_title} ({r.provider})"
by_course.setdefault(key, []).append(r)
analysis = {}
for course_key, course_reviews in by_course.items():
review_texts = "\n".join([
f"Rating: {r.rating}/5 โ {r.text[:500]}"
for r in course_reviews[:30] # limit for token efficiency
])
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": """Analyze these student reviews and extract:
1. OVERALL SENTIMENT (positive/mixed/negative) with confidence score
2. TOP 3 PROS โ what students consistently praise
3. TOP 3 CONS โ what students consistently complain about
4. CURRICULUM GAPS โ skills/topics students wish were covered
5. INSTRUCTOR QUALITY โ teaching effectiveness themes
6. CAREER IMPACT โ do students report career outcomes?
7. VALUE FOR MONEY โ sentiment on pricing vs. value received
8. COMPLETION EXPERIENCE โ common reasons for dropping out
Be specific and quantitative where possible."""
}, {
"role": "user",
"content": f"Course: {course_key}\n{len(course_reviews)} reviews:\n\n{review_texts}"
}],
temperature=0.2
)
analysis[course_key] = {
"review_count": len(course_reviews),
"avg_rating": round(sum(r.rating for r in course_reviews) / len(course_reviews), 2),
"analysis": response.choices[0].message.content
}
return analysis
Step 5: Enrollment Signals & Market Demand
async def track_enrollment_signals(
courses: List[CourseListing]
) -> List[EnrollmentData]:
"""
Scrape enrollment counts, waitlists, and popularity signals.
Many platforms display enrollment numbers publicly.
"""
enrollment_data = []
for course in courses:
result = await mantis.scrape(
url=course.url,
extract={
"total_enrolled": "number or null",
"enrolled_recently": "string or null (e.g., '5,000 enrolled this month')",
"waitlist_info": "string or null",
"seats_available": "number or null",
"seats_total": "number or null",
"next_session_date": "string or null",
"cohort_info": "string or null",
"trending_badge": "boolean",
"bestseller_badge": "boolean",
"category_rank": "number or null",
"completion_rate": "number or null"
}
)
# Parse "enrolled recently" text to number
enrolled_monthly = None
if result.get("enrolled_recently"):
import re
nums = re.findall(r'[\d,]+', result["enrolled_recently"])
if nums:
enrolled_monthly = int(nums[0].replace(",", ""))
enrollment = EnrollmentData(
course_title=course.title,
provider=course.provider,
url=course.url,
total_enrolled=result.get("total_enrolled"),
enrolled_this_month=enrolled_monthly,
waitlist_count=None,
seats_available=result.get("seats_available"),
seats_total=result.get("seats_total"),
next_cohort_date=result.get("next_session_date"),
trending_rank=1 if result.get("trending_badge") else None,
category_rank=result.get("category_rank"),
completion_rate_pct=result.get("completion_rate"),
scraped_at=datetime.now()
)
enrollment_data.append(enrollment)
return enrollment_data
async def map_skills_demand(
job_boards: List[str] = ["linkedin", "indeed"],
roles: List[str] = ["data scientist", "machine learning engineer", "full stack developer"]
) -> dict:
"""
Scrape job postings to identify in-demand skills,
then compare against course curriculum to find gaps.
"""
skills_demand = {}
board_urls = {
"linkedin": "https://www.linkedin.com/jobs/search/?keywords={}",
"indeed": "https://www.indeed.com/jobs?q={}"
}
for role in roles:
all_skills = []
for board in job_boards:
url = board_urls[board].format(role.replace(" ", "+"))
result = await mantis.scrape(
url=url,
extract={
"jobs": [{
"title": "string",
"company": "string",
"required_skills": ["string"],
"preferred_skills": ["string"],
"education_requirement": "string or null",
"experience_years": "number or null",
"salary_range": "string or null",
"certifications_mentioned": ["string"]
}]
}
)
for job in result.get("jobs", []):
all_skills.extend(job.get("required_skills", []))
all_skills.extend(job.get("preferred_skills", []))
# Count skill frequency
from collections import Counter
skill_counts = Counter(s.lower().strip() for s in all_skills)
skills_demand[role] = {
"top_skills": skill_counts.most_common(20),
"total_jobs_analyzed": len(all_skills),
"certifications": [] # populated from job data
}
return skills_demand
Step 6: AI-Powered Competitive Intelligence & Alerts
from openai import OpenAI
openai_client = OpenAI()
async def generate_education_intelligence(
courses: List[CourseListing],
pricing: List[CoursePricing],
reviews_analysis: dict,
enrollment: List[EnrollmentData],
pricing_alerts: dict,
skills_demand: dict
) -> dict:
"""
Generate comprehensive education market intelligence briefing.
"""
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": """You are an education market analytics expert. Analyze the data
and produce an actionable competitive intelligence briefing:
1. MARKET LANDSCAPE
- New course launches by competitors
- Category growth/decline trends
- Platform strategy shifts (pricing model changes, new features)
2. PRICING INTELLIGENCE
- Significant price changes and what they signal
- Promotional patterns (Udemy sales cycles, Coursera Plus promotions)
- Value positioning opportunities
3. STUDENT SENTIMENT
- Courses with improving/declining satisfaction
- Common complaints that represent opportunity for differentiation
- Curriculum gaps students are requesting
4. DEMAND SIGNALS
- Fastest-growing courses by enrollment
- Emerging topics gaining traction
- Skills employers demand that lack quality courses
5. CURRICULUM RECOMMENDATIONS
- Topics to launch courses on (high demand, low competition)
- Existing courses to update based on student feedback
- Certification partnerships to pursue
6. TOP 3 ACTIONS THIS WEEK
- Prioritized with expected impact
Be quantitative and specific."""
}, {
"role": "user",
"content": f"""Courses tracked: {len(courses)} across {len(set(c.provider for c in courses))} platforms
Pricing alerts: {json.dumps(pricing_alerts, default=str)}
Review analysis (sample): {json.dumps(dict(list(reviews_analysis.items())[:5]), default=str)}
Enrollment trends: {json.dumps([e.model_dump() for e in enrollment[:20]], default=str)}
Skills demand: {json.dumps(skills_demand, default=str)}"""
}],
temperature=0.2
)
return {
"briefing": response.choices[0].message.content,
"generated_at": datetime.now().isoformat(),
"data_summary": {
"courses_tracked": len(courses),
"platforms_monitored": len(set(c.provider for c in courses)),
"pricing_changes": sum(len(v) for v in pricing_alerts.values()),
"reviews_analyzed": sum(a["review_count"] for a in reviews_analysis.values()),
"enrollment_signals": len(enrollment)
}
}
async def deliver_education_alerts(pricing_alerts: dict, enrollment: List[EnrollmentData], slack_webhook: str):
"""Route education alerts to product and marketing teams."""
import httpx
msg_parts = []
# Price drops (competitor discounting)
if pricing_alerts.get("price_drops"):
msg_parts.append("๐ *Competitor Price Drops*\n")
for drop in pricing_alerts["price_drops"][:5]:
msg_parts.append(
f"โข *{drop['course']}* ({drop['provider']}): "
f"${drop['old_price']} โ ${drop['new_price']} (โ{drop['drop_pct']}%)\n"
)
# New competitor courses
if pricing_alerts.get("new_courses"):
msg_parts.append("\n๐ *New Competitor Courses*\n")
for course in pricing_alerts["new_courses"][:5]:
msg_parts.append(f"โข *{course['title']}* โ {course['provider']} ({course['category']})\n")
# Trending courses
trending = [e for e in enrollment if e.trending_rank and e.trending_rank <= 10]
if trending:
msg_parts.append("\n๐ฅ *Trending Courses*\n")
for t in trending[:5]:
msg_parts.append(
f"โข *{t.course_title}* ({t.provider}) โ "
f"{t.total_enrolled:,} enrolled, rank #{t.trending_rank}\n"
)
if msg_parts:
await httpx.AsyncClient().post(slack_webhook, json={
"text": "๐ Education Intelligence Update\n\n" + "".join(msg_parts),
"unfurl_links": False
})
Advanced: Curriculum Gap Analysis Engine
Build an engine that identifies high-demand topics that lack quality courses โ the blue ocean for new content:
async def find_curriculum_gaps(
courses: List[CourseListing],
skills_demand: dict,
reviews_analysis: dict
) -> dict:
"""
Cross-reference employer skill demands against available courses
to identify gaps worth filling. The holy grail of EdTech product strategy.
"""
# Build skillโcourse mapping
skill_coverage = {}
for course in courses:
for skill in course.skills_taught:
skill_lower = skill.lower().strip()
if skill_lower not in skill_coverage:
skill_coverage[skill_lower] = []
skill_coverage[skill_lower].append({
"title": course.title,
"provider": course.provider,
"rating": course.rating,
"enrollment": course.enrollment_count
})
gaps = []
for role, data in skills_demand.items():
for skill, demand_count in data["top_skills"]:
existing = skill_coverage.get(skill, [])
# High-quality coverage = courses with rating >= 4.0 and enrollment >= 1000
quality_courses = [c for c in existing if (c.get("rating") or 0) >= 4.0 and (c.get("enrollment") or 0) >= 1000]
gap_score = 0
# High demand, no courses
if demand_count >= 10 and len(existing) == 0:
gap_score = 100
# High demand, no quality courses
elif demand_count >= 10 and len(quality_courses) == 0:
gap_score = 80
# High demand, few quality courses (< 3)
elif demand_count >= 10 and len(quality_courses) < 3:
gap_score = 60
# Moderate demand, no coverage
elif demand_count >= 5 and len(existing) == 0:
gap_score = 50
if gap_score >= 50:
gaps.append({
"skill": skill,
"role_context": role,
"demand_mentions": demand_count,
"existing_courses": len(existing),
"quality_courses": len(quality_courses),
"gap_score": gap_score,
"opportunity": "No courses exist" if len(existing) == 0 else
f"{len(existing)} courses but only {len(quality_courses)} quality options"
})
# Add gaps from review analysis (topics students request)
for course_key, analysis in reviews_analysis.items():
# Parse curriculum gaps from the AI analysis
if "curriculum gap" in analysis.get("analysis", "").lower():
gaps.append({
"skill": f"Gap identified in reviews for {course_key}",
"role_context": "student_feedback",
"demand_mentions": analysis["review_count"],
"gap_score": 55,
"opportunity": "Students requesting this in reviews"
})
return {
"gaps": sorted(gaps, key=lambda x: x["gap_score"], reverse=True),
"total_gaps_found": len(gaps),
"top_opportunities": sorted(gaps, key=lambda x: x["gap_score"], reverse=True)[:10]
}
Cost Comparison: AI Agents vs. Education Analytics Platforms
| Platform | Monthly Cost | Best For |
|---|---|---|
| EAB (formerly Education Advisory Board) | $5,000โ$30,000 | Enrollment management, student success, institutional strategy |
| Keystone Academic Solutions | $2,000โ$10,000 | Student recruitment, program marketing, international enrollment |
| Lightcast (formerly Burning Glass) | $3,000โ$15,000 | Labor market analytics, skills mapping, program alignment |
| Course Report | $1,000โ$5,000 | Bootcamp reviews, school profiles, lead generation |
| Class Central (enterprise) | $500โ$3,000 | MOOC aggregation, course discovery, institution profiles |
| Hanover Research | $5,000โ$25,000 | Custom research, competitive benchmarking, market sizing |
| AI Agent + Mantis | $29โ$299 | Course monitoring, pricing, reviews, enrollment โ fully custom |
Honest caveat: Enterprise platforms like EAB and Lightcast have proprietary datasets โ years of historical enrollment data, employer survey panels, and institutional partnerships that provide data no web scraping can replicate. EAB's enrollment predictions are trained on decades of higher education data. The AI agent approach excels at real-time competitive monitoring โ tracking what competitors launch, price, and how students review them โ the fast-moving market intelligence layer that enterprise platforms update quarterly while you update daily. For EdTech startups and smaller institutions, an AI agent covers 80โ90% of competitive intelligence needs at 5% of the cost.
Use Cases by Education Segment
1. Online Learning Platforms (Coursera, Udemy, edX competitors)
Track competitor course launches within hours, not weeks. Monitor pricing strategies โ when does Udemy run sales? What's Coursera pricing for new Professional Certificates? Identify curriculum gaps: skills employers demand but no quality course exists. Analyze student reviews to understand what your competitors do poorly and differentiate.
2. Universities & Colleges
Monitor how online competitors price programs that compete with your degrees. Track enrollment signals to identify growing vs. declining fields. Benchmark your program reviews against competitors. Map labor market demand to inform new program development โ launch programs aligned with employer needs, not faculty preferences.
3. Corporate Learning & Development
Evaluate thousands of courses across platforms to curate the best learning paths for your workforce. Track pricing changes to optimize L&D budgets โ buy Udemy courses during sales, negotiate Coursera for Business rates with usage data. Monitor skill demand trends to keep training current with market needs.
4. EdTech Startups
Validate market opportunity before building โ is there demand for your topic with limited quality supply? Price competitively by understanding the full pricing landscape. Track established players' moves to anticipate market shifts. Build SEO and content strategy around topics with high search volume but poor existing content.
Compliance & Best Practices
- Course catalogs are public โ platforms publish course details, descriptions, and pricing to attract students; this is inherently public marketing material
- FERPA applies to student records, not course data โ enrollment counts, course catalogs, and pricing are not protected student records; individual student academic records are protected
- Reviews are public โ student reviews on Class Central, Trustpilot, and platform review sections are voluntarily published public content
- Rate limiting โ respect platform rate limits; cache course data aggressively since catalogs update weekly, not hourly
- Terms of Service โ some platforms restrict automated access in their ToS; focus on publicly accessible pages and respect robots.txt
- NCES/IPEDS data is public โ US Department of Education statistics on enrollment, graduation rates, and institutional data are public records
Getting Started
- Identify your competitive set โ list the top 20โ30 courses/programs that compete directly with yours
- Set up Mantis API access โ sign up for a free API key (100 calls/month free)
- Start with pricing monitoring โ track competitor pricing weekly; you'll immediately spot patterns (Udemy's sales cycles, Coursera's pricing experiments)
- Add review aggregation โ understanding what students love and hate about competitors is the #1 input for product differentiation
- Map skills demand โ cross-reference job postings against your curriculum to find gaps worth filling
- Automate alerts โ route competitor launches, price drops >20%, and negative review spikes to your product team via Slack