Web Scraping for Education & EdTech: How AI Agents Track Courses, Pricing, Reviews & Academic Data in 2026

Published: March 13, 2026 · 18 min read · By the Mantis Team

The global education market exceeds $7 trillion annually, with the EdTech segment alone projected to surpass $400 billion by 2028. Online learning platforms like Coursera, Udemy, and edX host millions of courses, while universities compete for shrinking enrollment pools. Yet most education organizations still track competitors manually — browsing course catalogs, checking pricing pages, and reading reviews one by one.

Enterprise education analytics platforms like EAB, Keystone Academic Solutions, and Lightcast (formerly Burning Glass) charge $3,000–$30,000+ per month for market intelligence. But the underlying data — course catalogs, pricing, student reviews, enrollment trends — is publicly available across hundreds of platforms. AI agents powered by web scraping APIs can build equivalent intelligence at a fraction of the cost.

In this guide, you'll build a complete education intelligence system using Python, the Mantis WebPerception API, and GPT-4o — covering course catalog monitoring, pricing intelligence, student sentiment analysis, and competitive landscape tracking.

Why Education Organizations Need Web Scraping

The education market is more competitive than ever. Online learning has eliminated geographic barriers, meaning a university in Ohio competes with Coursera, Udemy, and universities worldwide for the same student. The organizations that win are the ones with real-time market intelligence:

Course catalog monitoring — Track competitor course launches, curriculum changes, and program expansions across Coursera, Udemy, edX, LinkedIn Learning, Skillshare, and university catalogs
Pricing intelligence — Monitor tuition changes, subscription pricing, course discounts, financial aid offerings, and promotional campaigns
Student review analysis — Aggregate reviews from Class Central, Course Report, RateMyProfessors, Trustpilot, and Reddit to understand student satisfaction
Enrollment trend tracking — Monitor waitlists, seat availability, cohort sizes, and program capacity signals
Accreditation & compliance — Track accreditation status changes, regulatory updates, and institutional announcements
Skills demand mapping — Scrape job postings to identify which skills employers demand, then align curriculum accordingly
Faculty & instructor tracking — Monitor instructor movements, new hires, and teaching assignments across institutions

Build Education Intelligence Agents with Mantis

Scrape course catalogs, pricing, reviews, and enrollment data from any education platform with one API call. AI-powered extraction handles every platform format.

Get Free API Key →

Architecture: The 6-Step Education Intelligence Pipeline

Course catalog scraping — Monitor course offerings, descriptions, prerequisites, and learning outcomes across platforms
Pricing & discount tracking — Track tuition, subscription tiers, promotional pricing, and financial aid changes
Student review monitoring — Aggregate sentiment from review platforms, forums, and social media
Enrollment & demand signals — Track waitlists, seat counts, cohort sizes, and program capacity
GPT-4o competitive analysis — Identify curriculum gaps, pricing opportunities, and market positioning insights
Alert delivery — Route competitor launches, price changes, and review spikes to product and marketing teams via Slack

Step 1: Define Your Education Data Models

from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime
from enum import Enum

class PlatformType(str, Enum):
    MOOC = "mooc"  # Coursera, edX, Udemy
    UNIVERSITY = "university"
    BOOTCAMP = "bootcamp"
    CORPORATE_LEARNING = "corporate_learning"
    K12 = "k12"
    CERTIFICATION = "certification"

class ContentFormat(str, Enum):
    SELF_PACED = "self_paced"
    COHORT_BASED = "cohort_based"
    LIVE_ONLINE = "live_online"
    HYBRID = "hybrid"
    IN_PERSON = "in_person"

class CourseListing(BaseModel):
    """Course or program from an education platform."""
    title: str
    provider: str  # "coursera", "udemy", "edx", "harvard", etc.
    platform_type: PlatformType
    url: str
    category: str  # "data-science", "web-development", "business", etc.
    subcategory: Optional[str]
    description: str
    instructor: Optional[str]
    institution: Optional[str]  # partner university/company
    format: ContentFormat
    duration_hours: Optional[float]
    duration_weeks: Optional[int]
    skill_level: Optional[str]  # "beginner", "intermediate", "advanced"
    skills_taught: List[str]
    prerequisites: List[str]
    has_certificate: bool
    certificate_type: Optional[str]  # "professional certificate", "MicroMasters", "degree"
    language: str = "English"
    rating: Optional[float]
    review_count: Optional[int]
    enrollment_count: Optional[int]
    last_updated: Optional[str]
    scraped_at: datetime

class CoursePricing(BaseModel):
    """Pricing details for a course or program."""
    course_title: str
    provider: str
    url: str
    pricing_model: str  # "one_time", "subscription", "per_course", "tuition"
    price_usd: Optional[float]
    original_price_usd: Optional[float]  # before discount
    discount_pct: Optional[float]
    subscription_monthly: Optional[float]
    subscription_annual: Optional[float]
    financial_aid_available: bool
    free_trial_days: Optional[int]
    free_audit_available: bool
    refund_policy: Optional[str]
    promo_code: Optional[str]
    promo_expires: Optional[datetime]
    currency: str = "USD"
    scraped_at: datetime

class StudentReview(BaseModel):
    """Student review from a course or institution."""
    course_title: str
    provider: str
    reviewer_name: Optional[str]
    rating: float  # 1-5
    title: Optional[str]
    text: str
    pros: List[str]
    cons: List[str]
    verified_purchase: bool
    completion_status: Optional[str]  # "completed", "in_progress", "dropped"
    review_date: Optional[str]
    helpful_votes: Optional[int]
    source: str  # "class_central", "coursera", "trustpilot", "reddit"
    scraped_at: datetime

class EnrollmentData(BaseModel):
    """Enrollment and demand signals for a course or program."""
    course_title: str
    provider: str
    url: str
    total_enrolled: Optional[int]
    enrolled_this_month: Optional[int]
    waitlist_count: Optional[int]
    seats_available: Optional[int]
    seats_total: Optional[int]
    next_cohort_date: Optional[str]
    cohort_count: Optional[int]  # number of cohorts running
    trending_rank: Optional[int]  # platform's trending/popular rank
    category_rank: Optional[int]
    completion_rate_pct: Optional[float]
    scraped_at: datetime

Step 2: Scrape Course Catalogs Across Platforms

from mantis import MantisClient
import asyncio

mantis = MantisClient(api_key="your-mantis-api-key")

async def scrape_coursera_catalog(
    categories: List[str] = ["data-science", "computer-science", "business", "information-technology"]
) -> List[CourseListing]:
    """
    Scrape Coursera's course catalog by category.
    Coursera hosts 7,000+ courses from 300+ university and industry partners.
    """
    courses = []
    
    for category in categories:
        result = await mantis.scrape(
            url=f"https://www.coursera.org/courses?query={category}&sortBy=BEST_MATCH",
            extract={
                "courses": [{
                    "title": "string",
                    "institution": "string (partner university or company)",
                    "instructor": "string or null",
                    "description": "string",
                    "rating": "number (1-5) or null",
                    "review_count": "number or null",
                    "enrollment_count": "number or null",
                    "duration": "string",
                    "level": "string (beginner, intermediate, advanced, mixed)",
                    "skills": ["string"],
                    "type": "string (course, specialization, professional certificate, degree)",
                    "url": "string (relative path)",
                    "has_financial_aid": "boolean",
                    "is_free_audit": "boolean"
                }],
                "total_results": "number",
                "page": "number"
            }
        )
        
        for c in result.get("courses", []):
            course = CourseListing(
                title=c.get("title", ""),
                provider="coursera",
                platform_type=PlatformType.MOOC,
                url=f"https://www.coursera.org{c.get('url', '')}",
                category=category,
                description=c.get("description", ""),
                instructor=c.get("instructor"),
                institution=c.get("institution"),
                format=ContentFormat.SELF_PACED,
                skill_level=c.get("level"),
                skills_taught=c.get("skills", []),
                prerequisites=[],
                has_certificate=True,
                certificate_type=c.get("type"),
                rating=c.get("rating"),
                review_count=c.get("review_count"),
                enrollment_count=c.get("enrollment_count"),
                scraped_at=datetime.now()
            )
            courses.append(course)
    
    return courses

async def scrape_udemy_catalog(
    categories: List[str] = ["development", "data-science", "business", "it-and-software"]
) -> List[CourseListing]:
    """
    Scrape Udemy's course marketplace.
    Udemy hosts 200,000+ courses with instructor-set pricing.
    """
    courses = []
    
    for category in categories:
        result = await mantis.scrape(
            url=f"https://www.udemy.com/courses/{category}/?sort=popularity&ratings=4.0",
            extract={
                "courses": [{
                    "title": "string",
                    "instructor": "string",
                    "description": "string (subtitle/tagline)",
                    "rating": "number",
                    "review_count": "number",
                    "enrollment_count": "number",
                    "price_current": "number or null",
                    "price_original": "number or null",
                    "duration_hours": "number",
                    "lecture_count": "number",
                    "level": "string",
                    "last_updated": "string",
                    "url": "string"
                }]
            }
        )
        
        for c in result.get("courses", []):
            course = CourseListing(
                title=c.get("title", ""),
                provider="udemy",
                platform_type=PlatformType.MOOC,
                url=f"https://www.udemy.com{c.get('url', '')}",
                category=category,
                description=c.get("description", ""),
                instructor=c.get("instructor"),
                format=ContentFormat.SELF_PACED,
                duration_hours=c.get("duration_hours"),
                skill_level=c.get("level"),
                skills_taught=[],
                prerequisites=[],
                has_certificate=True,
                certificate_type="completion",
                rating=c.get("rating"),
                review_count=c.get("review_count"),
                enrollment_count=c.get("enrollment_count"),
                last_updated=c.get("last_updated"),
                scraped_at=datetime.now()
            )
            courses.append(course)
    
    return courses

async def scrape_edx_catalog(
    subjects: List[str] = ["computer-science", "data-science", "business-management"]
) -> List[CourseListing]:
    """
    Scrape edX catalog — courses from MIT, Harvard, Berkeley, and 160+ institutions.
    Includes MicroMasters, Professional Certificates, and online degrees.
    """
    courses = []
    
    for subject in subjects:
        result = await mantis.scrape(
            url=f"https://www.edx.org/search?q={subject}&tab=course",
            extract={
                "courses": [{
                    "title": "string",
                    "institution": "string",
                    "description": "string",
                    "level": "string (introductory, intermediate, advanced)",
                    "duration_weeks": "number",
                    "effort_hours_per_week": "number",
                    "price": "number or null",
                    "is_free_audit": "boolean",
                    "program_type": "string (course, MicroMasters, Professional Certificate, XSeries, degree) or null",
                    "subject": "string",
                    "language": "string",
                    "url": "string"
                }]
            }
        )
        
        for c in result.get("courses", []):
            course = CourseListing(
                title=c.get("title", ""),
                provider="edx",
                platform_type=PlatformType.MOOC,
                url=f"https://www.edx.org{c.get('url', '')}",
                category=subject,
                description=c.get("description", ""),
                institution=c.get("institution"),
                format=ContentFormat.SELF_PACED,
                duration_weeks=c.get("duration_weeks"),
                skill_level=c.get("level"),
                skills_taught=[],
                prerequisites=[],
                has_certificate=True,
                certificate_type=c.get("program_type"),
                language=c.get("language", "English"),
                scraped_at=datetime.now()
            )
            courses.append(course)
    
    return courses

Step 3: Track Pricing Changes & Promotional Campaigns

async def track_course_pricing(
    courses: List[CourseListing]
) -> List[CoursePricing]:
    """
    Scrape detailed pricing for tracked courses.
    Udemy runs frequent 80-90% off sales; Coursera offers free audits
    with paid certificates; edX has verified track pricing.
    """
    pricing_data = []
    
    for course in courses:
        result = await mantis.scrape(
            url=course.url,
            extract={
                "price_current": "number or null",
                "price_original": "number or null (before discount)",
                "discount_pct": "number or null",
                "subscription_option": {
                    "monthly": "number or null",
                    "annual": "number or null",
                    "name": "string or null (Coursera Plus, etc.)"
                },
                "free_audit_available": "boolean",
                "free_trial_days": "number or null",
                "financial_aid_available": "boolean",
                "refund_policy": "string or null",
                "promo_banner": "string or null (current promotion text)",
                "bundle_pricing": {
                    "courses_in_bundle": "number or null",
                    "bundle_price": "number or null"
                }
            }
        )
        
        pricing = CoursePricing(
            course_title=course.title,
            provider=course.provider,
            url=course.url,
            pricing_model="subscription" if result.get("subscription_option", {}).get("monthly") else "one_time",
            price_usd=result.get("price_current"),
            original_price_usd=result.get("price_original"),
            discount_pct=result.get("discount_pct"),
            subscription_monthly=result.get("subscription_option", {}).get("monthly"),
            subscription_annual=result.get("subscription_option", {}).get("annual"),
            financial_aid_available=result.get("financial_aid_available", False),
            free_trial_days=result.get("free_trial_days"),
            free_audit_available=result.get("free_audit_available", False),
            refund_policy=result.get("refund_policy"),
            scraped_at=datetime.now()
        )
        pricing_data.append(pricing)
    
    return pricing_data

async def detect_pricing_changes(
    current_pricing: List[CoursePricing],
    historical_db: str = "education_intelligence.db"
) -> dict:
    """
    Compare current pricing against historical data to detect
    significant changes, new promotions, and competitive moves.
    """
    import sqlite3
    
    conn = sqlite3.connect(historical_db)
    alerts = {"price_drops": [], "price_increases": [], "new_promos": [], "free_changes": []}
    
    for pricing in current_pricing:
        prev = conn.execute("""
            SELECT price_usd, discount_pct, free_audit_available,
                   subscription_monthly, financial_aid_available
            FROM pricing_history
            WHERE course_title = ? AND provider = ?
            ORDER BY scraped_at DESC LIMIT 1
        """, (pricing.course_title, pricing.provider)).fetchone()
        
        if prev:
            prev_price, prev_discount, prev_free, prev_sub, prev_aid = prev
            
            # Price drop >20%
            if prev_price and pricing.price_usd and pricing.price_usd < prev_price * 0.80:
                alerts["price_drops"].append({
                    "course": pricing.course_title,
                    "provider": pricing.provider,
                    "old_price": prev_price,
                    "new_price": pricing.price_usd,
                    "drop_pct": round(((prev_price - pricing.price_usd) / prev_price) * 100, 1),
                    "url": pricing.url
                })
            
            # Price increase >10%
            if prev_price and pricing.price_usd and pricing.price_usd > prev_price * 1.10:
                alerts["price_increases"].append({
                    "course": pricing.course_title,
                    "provider": pricing.provider,
                    "old_price": prev_price,
                    "new_price": pricing.price_usd,
                    "increase_pct": round(((pricing.price_usd - prev_price) / prev_price) * 100, 1)
                })
            
            # Free audit removed
            if prev_free and not pricing.free_audit_available:
                alerts["free_changes"].append({
                    "course": pricing.course_title,
                    "provider": pricing.provider,
                    "change": "Free audit REMOVED",
                    "url": pricing.url
                })
        
        # Store current pricing
        conn.execute(
            """INSERT INTO pricing_history 
            (course_title, provider, price_usd, discount_pct, free_audit_available,
             subscription_monthly, financial_aid_available, scraped_at)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
            (pricing.course_title, pricing.provider, pricing.price_usd,
             pricing.discount_pct, pricing.free_audit_available,
             pricing.subscription_monthly, pricing.financial_aid_available,
             pricing.scraped_at.isoformat())
        )
    
    conn.commit()
    conn.close()
    return alerts

Step 4: Student Review Monitoring & Sentiment Analysis

async def scrape_student_reviews(
    courses: List[CourseListing],
    sources: List[str] = ["class_central", "coursera", "trustpilot"]
) -> List[StudentReview]:
    """
    Aggregate student reviews from multiple platforms.
    Class Central aggregates reviews across MOOCs; Trustpilot covers bootcamps;
    Reddit r/learnprogramming and r/datascience have authentic student experiences.
    """
    reviews = []
    
    source_configs = {
        "class_central": {
            "url_template": "https://www.classcentral.com/search?q={}",
            "extract": {
                "reviews": [{
                    "reviewer": "string or null",
                    "rating": "number (1-5)",
                    "text": "string",
                    "date": "string",
                    "course_taken": "string",
                    "provider": "string"
                }]
            }
        },
        "trustpilot": {
            "url_template": "https://www.trustpilot.com/review/{}",
            "extract": {
                "reviews": [{
                    "reviewer": "string",
                    "rating": "number (1-5)",
                    "title": "string",
                    "text": "string",
                    "date": "string",
                    "verified": "boolean"
                }]
            }
        }
    }
    
    for course in courses:
        for source in sources:
            config = source_configs.get(source)
            if not config:
                continue
            
            query = course.title.replace(" ", "+")
            url = config["url_template"].format(query if source != "trustpilot" else course.provider + ".com")
            
            result = await mantis.scrape(
                url=url,
                extract=config["extract"]
            )
            
            for r in result.get("reviews", []):
                review = StudentReview(
                    course_title=course.title,
                    provider=course.provider,
                    reviewer_name=r.get("reviewer"),
                    rating=r.get("rating", 3),
                    title=r.get("title"),
                    text=r.get("text", ""),
                    pros=[],
                    cons=[],
                    verified_purchase=r.get("verified", False),
                    review_date=r.get("date"),
                    source=source,
                    scraped_at=datetime.now()
                )
                reviews.append(review)
    
    return reviews

async def analyze_review_sentiment(reviews: List[StudentReview]) -> dict:
    """
    Use GPT-4o to extract sentiment themes from student reviews.
    Identifies what students love, hate, and wish for.
    """
    from openai import OpenAI
    client = OpenAI()
    
    # Group reviews by course
    by_course = {}
    for r in reviews:
        key = f"{r.course_title} ({r.provider})"
        by_course.setdefault(key, []).append(r)
    
    analysis = {}
    
    for course_key, course_reviews in by_course.items():
        review_texts = "\n".join([
            f"Rating: {r.rating}/5 — {r.text[:500]}"
            for r in course_reviews[:30]  # limit for token efficiency
        ])
        
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "system",
                "content": """Analyze these student reviews and extract:
                1. OVERALL SENTIMENT (positive/mixed/negative) with confidence score
                2. TOP 3 PROS — what students consistently praise
                3. TOP 3 CONS — what students consistently complain about
                4. CURRICULUM GAPS — skills/topics students wish were covered
                5. INSTRUCTOR QUALITY — teaching effectiveness themes
                6. CAREER IMPACT — do students report career outcomes?
                7. VALUE FOR MONEY — sentiment on pricing vs. value received
                8. COMPLETION EXPERIENCE — common reasons for dropping out
                
                Be specific and quantitative where possible."""
            }, {
                "role": "user",
                "content": f"Course: {course_key}\n{len(course_reviews)} reviews:\n\n{review_texts}"
            }],
            temperature=0.2
        )
        
        analysis[course_key] = {
            "review_count": len(course_reviews),
            "avg_rating": round(sum(r.rating for r in course_reviews) / len(course_reviews), 2),
            "analysis": response.choices[0].message.content
        }
    
    return analysis

Step 5: Enrollment Signals & Market Demand

async def track_enrollment_signals(
    courses: List[CourseListing]
) -> List[EnrollmentData]:
    """
    Scrape enrollment counts, waitlists, and popularity signals.
    Many platforms display enrollment numbers publicly.
    """
    enrollment_data = []
    
    for course in courses:
        result = await mantis.scrape(
            url=course.url,
            extract={
                "total_enrolled": "number or null",
                "enrolled_recently": "string or null (e.g., '5,000 enrolled this month')",
                "waitlist_info": "string or null",
                "seats_available": "number or null",
                "seats_total": "number or null",
                "next_session_date": "string or null",
                "cohort_info": "string or null",
                "trending_badge": "boolean",
                "bestseller_badge": "boolean",
                "category_rank": "number or null",
                "completion_rate": "number or null"
            }
        )
        
        # Parse "enrolled recently" text to number
        enrolled_monthly = None
        if result.get("enrolled_recently"):
            import re
            nums = re.findall(r'[\d,]+', result["enrolled_recently"])
            if nums:
                enrolled_monthly = int(nums[0].replace(",", ""))
        
        enrollment = EnrollmentData(
            course_title=course.title,
            provider=course.provider,
            url=course.url,
            total_enrolled=result.get("total_enrolled"),
            enrolled_this_month=enrolled_monthly,
            waitlist_count=None,
            seats_available=result.get("seats_available"),
            seats_total=result.get("seats_total"),
            next_cohort_date=result.get("next_session_date"),
            trending_rank=1 if result.get("trending_badge") else None,
            category_rank=result.get("category_rank"),
            completion_rate_pct=result.get("completion_rate"),
            scraped_at=datetime.now()
        )
        enrollment_data.append(enrollment)
    
    return enrollment_data

async def map_skills_demand(
    job_boards: List[str] = ["linkedin", "indeed"],
    roles: List[str] = ["data scientist", "machine learning engineer", "full stack developer"]
) -> dict:
    """
    Scrape job postings to identify in-demand skills,
    then compare against course curriculum to find gaps.
    """
    skills_demand = {}
    
    board_urls = {
        "linkedin": "https://www.linkedin.com/jobs/search/?keywords={}",
        "indeed": "https://www.indeed.com/jobs?q={}"
    }
    
    for role in roles:
        all_skills = []
        
        for board in job_boards:
            url = board_urls[board].format(role.replace(" ", "+"))
            
            result = await mantis.scrape(
                url=url,
                extract={
                    "jobs": [{
                        "title": "string",
                        "company": "string",
                        "required_skills": ["string"],
                        "preferred_skills": ["string"],
                        "education_requirement": "string or null",
                        "experience_years": "number or null",
                        "salary_range": "string or null",
                        "certifications_mentioned": ["string"]
                    }]
                }
            )
            
            for job in result.get("jobs", []):
                all_skills.extend(job.get("required_skills", []))
                all_skills.extend(job.get("preferred_skills", []))
        
        # Count skill frequency
        from collections import Counter
        skill_counts = Counter(s.lower().strip() for s in all_skills)
        
        skills_demand[role] = {
            "top_skills": skill_counts.most_common(20),
            "total_jobs_analyzed": len(all_skills),
            "certifications": []  # populated from job data
        }
    
    return skills_demand

Step 6: AI-Powered Competitive Intelligence & Alerts

from openai import OpenAI

openai_client = OpenAI()

async def generate_education_intelligence(
    courses: List[CourseListing],
    pricing: List[CoursePricing],
    reviews_analysis: dict,
    enrollment: List[EnrollmentData],
    pricing_alerts: dict,
    skills_demand: dict
) -> dict:
    """
    Generate comprehensive education market intelligence briefing.
    """
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "system",
            "content": """You are an education market analytics expert. Analyze the data
            and produce an actionable competitive intelligence briefing:
            
            1. MARKET LANDSCAPE
               - New course launches by competitors
               - Category growth/decline trends
               - Platform strategy shifts (pricing model changes, new features)
            
            2. PRICING INTELLIGENCE
               - Significant price changes and what they signal
               - Promotional patterns (Udemy sales cycles, Coursera Plus promotions)
               - Value positioning opportunities
            
            3. STUDENT SENTIMENT
               - Courses with improving/declining satisfaction
               - Common complaints that represent opportunity for differentiation
               - Curriculum gaps students are requesting
            
            4. DEMAND SIGNALS
               - Fastest-growing courses by enrollment
               - Emerging topics gaining traction
               - Skills employers demand that lack quality courses
            
            5. CURRICULUM RECOMMENDATIONS
               - Topics to launch courses on (high demand, low competition)
               - Existing courses to update based on student feedback
               - Certification partnerships to pursue
            
            6. TOP 3 ACTIONS THIS WEEK
               - Prioritized with expected impact
            
            Be quantitative and specific."""
        }, {
            "role": "user",
            "content": f"""Courses tracked: {len(courses)} across {len(set(c.provider for c in courses))} platforms
            
            Pricing alerts: {json.dumps(pricing_alerts, default=str)}
            
            Review analysis (sample): {json.dumps(dict(list(reviews_analysis.items())[:5]), default=str)}
            
            Enrollment trends: {json.dumps([e.model_dump() for e in enrollment[:20]], default=str)}
            
            Skills demand: {json.dumps(skills_demand, default=str)}"""
        }],
        temperature=0.2
    )
    
    return {
        "briefing": response.choices[0].message.content,
        "generated_at": datetime.now().isoformat(),
        "data_summary": {
            "courses_tracked": len(courses),
            "platforms_monitored": len(set(c.provider for c in courses)),
            "pricing_changes": sum(len(v) for v in pricing_alerts.values()),
            "reviews_analyzed": sum(a["review_count"] for a in reviews_analysis.values()),
            "enrollment_signals": len(enrollment)
        }
    }

async def deliver_education_alerts(pricing_alerts: dict, enrollment: List[EnrollmentData], slack_webhook: str):
    """Route education alerts to product and marketing teams."""
    import httpx
    
    msg_parts = []
    
    # Price drops (competitor discounting)
    if pricing_alerts.get("price_drops"):
        msg_parts.append("📉 *Competitor Price Drops*\n")
        for drop in pricing_alerts["price_drops"][:5]:
            msg_parts.append(
                f"• *{drop['course']}* ({drop['provider']}): "
                f"${drop['old_price']} → ${drop['new_price']} (↓{drop['drop_pct']}%)\n"
            )
    
    # New competitor courses
    if pricing_alerts.get("new_courses"):
        msg_parts.append("\n🆕 *New Competitor Courses*\n")
        for course in pricing_alerts["new_courses"][:5]:
            msg_parts.append(f"• *{course['title']}* — {course['provider']} ({course['category']})\n")
    
    # Trending courses
    trending = [e for e in enrollment if e.trending_rank and e.trending_rank <= 10]
    if trending:
        msg_parts.append("\n🔥 *Trending Courses*\n")
        for t in trending[:5]:
            msg_parts.append(
                f"• *{t.course_title}* ({t.provider}) — "
                f"{t.total_enrolled:,} enrolled, rank #{t.trending_rank}\n"
            )
    
    if msg_parts:
        await httpx.AsyncClient().post(slack_webhook, json={
            "text": "🎓 Education Intelligence Update\n\n" + "".join(msg_parts),
            "unfurl_links": False
        })

Advanced: Curriculum Gap Analysis Engine

Build an engine that identifies high-demand topics that lack quality courses — the blue ocean for new content:

async def find_curriculum_gaps(
    courses: List[CourseListing],
    skills_demand: dict,
    reviews_analysis: dict
) -> dict:
    """
    Cross-reference employer skill demands against available courses
    to identify gaps worth filling. The holy grail of EdTech product strategy.
    """
    # Build skill→course mapping
    skill_coverage = {}
    for course in courses:
        for skill in course.skills_taught:
            skill_lower = skill.lower().strip()
            if skill_lower not in skill_coverage:
                skill_coverage[skill_lower] = []
            skill_coverage[skill_lower].append({
                "title": course.title,
                "provider": course.provider,
                "rating": course.rating,
                "enrollment": course.enrollment_count
            })
    
    gaps = []
    
    for role, data in skills_demand.items():
        for skill, demand_count in data["top_skills"]:
            existing = skill_coverage.get(skill, [])
            
            # High-quality coverage = courses with rating >= 4.0 and enrollment >= 1000
            quality_courses = [c for c in existing if (c.get("rating") or 0) >= 4.0 and (c.get("enrollment") or 0) >= 1000]
            
            gap_score = 0
            
            # High demand, no courses
            if demand_count >= 10 and len(existing) == 0:
                gap_score = 100
            # High demand, no quality courses
            elif demand_count >= 10 and len(quality_courses) == 0:
                gap_score = 80
            # High demand, few quality courses (< 3)
            elif demand_count >= 10 and len(quality_courses) < 3:
                gap_score = 60
            # Moderate demand, no coverage
            elif demand_count >= 5 and len(existing) == 0:
                gap_score = 50
            
            if gap_score >= 50:
                gaps.append({
                    "skill": skill,
                    "role_context": role,
                    "demand_mentions": demand_count,
                    "existing_courses": len(existing),
                    "quality_courses": len(quality_courses),
                    "gap_score": gap_score,
                    "opportunity": "No courses exist" if len(existing) == 0 else 
                                  f"{len(existing)} courses but only {len(quality_courses)} quality options"
                })
    
    # Add gaps from review analysis (topics students request)
    for course_key, analysis in reviews_analysis.items():
        # Parse curriculum gaps from the AI analysis
        if "curriculum gap" in analysis.get("analysis", "").lower():
            gaps.append({
                "skill": f"Gap identified in reviews for {course_key}",
                "role_context": "student_feedback",
                "demand_mentions": analysis["review_count"],
                "gap_score": 55,
                "opportunity": "Students requesting this in reviews"
            })
    
    return {
        "gaps": sorted(gaps, key=lambda x: x["gap_score"], reverse=True),
        "total_gaps_found": len(gaps),
        "top_opportunities": sorted(gaps, key=lambda x: x["gap_score"], reverse=True)[:10]
    }

Cost Comparison: AI Agents vs. Education Analytics Platforms

Platform	Monthly Cost	Best For
EAB (formerly Education Advisory Board)	$5,000–$30,000	Enrollment management, student success, institutional strategy
Keystone Academic Solutions	$2,000–$10,000	Student recruitment, program marketing, international enrollment
Lightcast (formerly Burning Glass)	$3,000–$15,000	Labor market analytics, skills mapping, program alignment
Course Report	$1,000–$5,000	Bootcamp reviews, school profiles, lead generation
Class Central (enterprise)	$500–$3,000	MOOC aggregation, course discovery, institution profiles
Hanover Research	$5,000–$25,000	Custom research, competitive benchmarking, market sizing
AI Agent + Mantis	$29–$299	Course monitoring, pricing, reviews, enrollment — fully custom

Honest caveat: Enterprise platforms like EAB and Lightcast have proprietary datasets — years of historical enrollment data, employer survey panels, and institutional partnerships that provide data no web scraping can replicate. EAB's enrollment predictions are trained on decades of higher education data. The AI agent approach excels at real-time competitive monitoring — tracking what competitors launch, price, and how students review them — the fast-moving market intelligence layer that enterprise platforms update quarterly while you update daily. For EdTech startups and smaller institutions, an AI agent covers 80–90% of competitive intelligence needs at 5% of the cost.

Use Cases by Education Segment

1. Online Learning Platforms (Coursera, Udemy, edX competitors)

Track competitor course launches within hours, not weeks. Monitor pricing strategies — when does Udemy run sales? What's Coursera pricing for new Professional Certificates? Identify curriculum gaps: skills employers demand but no quality course exists. Analyze student reviews to understand what your competitors do poorly and differentiate.

2. Universities & Colleges

Monitor how online competitors price programs that compete with your degrees. Track enrollment signals to identify growing vs. declining fields. Benchmark your program reviews against competitors. Map labor market demand to inform new program development — launch programs aligned with employer needs, not faculty preferences.

3. Corporate Learning & Development

Evaluate thousands of courses across platforms to curate the best learning paths for your workforce. Track pricing changes to optimize L&D budgets — buy Udemy courses during sales, negotiate Coursera for Business rates with usage data. Monitor skill demand trends to keep training current with market needs.

4. EdTech Startups

Validate market opportunity before building — is there demand for your topic with limited quality supply? Price competitively by understanding the full pricing landscape. Track established players' moves to anticipate market shifts. Build SEO and content strategy around topics with high search volume but poor existing content.

Compliance & Best Practices

Course catalogs are public — platforms publish course details, descriptions, and pricing to attract students; this is inherently public marketing material
FERPA applies to student records, not course data — enrollment counts, course catalogs, and pricing are not protected student records; individual student academic records are protected
Reviews are public — student reviews on Class Central, Trustpilot, and platform review sections are voluntarily published public content
Rate limiting — respect platform rate limits; cache course data aggressively since catalogs update weekly, not hourly
Terms of Service — some platforms restrict automated access in their ToS; focus on publicly accessible pages and respect robots.txt
NCES/IPEDS data is public — US Department of Education statistics on enrollment, graduation rates, and institutional data are public records

Getting Started

Identify your competitive set — list the top 20–30 courses/programs that compete directly with yours
Set up Mantis API access — sign up for a free API key (100 calls/month free)
Start with pricing monitoring — track competitor pricing weekly; you'll immediately spot patterns (Udemy's sales cycles, Coursera's pricing experiments)
Add review aggregation — understanding what students love and hate about competitors is the #1 input for product differentiation
Map skills demand — cross-reference job postings against your curriculum to find gaps worth filling
Automate alerts — route competitor launches, price drops >20%, and negative review spikes to your product team via Slack