Web Scraping for Real Estate: How AI Agents Track Properties, Prices & Market Trends in 2026

Published March 10, 2026 · 15 min read · By the Mantis Team

Real estate data is worth billions. Zillow, Redfin, and Realtor.com built empires on aggregating property information — but their APIs are limited, expensive, or locked behind partnerships. Meanwhile, investors, agents, and proptech startups need real-time, comprehensive property data to make decisions.

AI agents change the game. Instead of manually checking listings or paying $500+/month for data feeds, you can build an autonomous agent that scrapes property listings, extracts structured data with AI, tracks price changes, and surfaces investment opportunities — all for a fraction of the cost.

In this guide, you'll build a complete real estate intelligence system using Python, the Mantis WebPerception API, and OpenAI.

Why AI-Powered Real Estate Scraping?

Traditional real estate data collection has serious limitations:

AI extraction solves these problems. Instead of brittle selectors, the AI understands the page content and extracts structured property data regardless of layout changes.

Architecture: The Real Estate Intelligence Stack

Here's what we're building:

  1. Discovery — Scrape listing pages to find properties matching criteria
  2. Extraction — AI extracts structured data: price, beds, baths, sqft, address, features
  3. Storage — SQLite database with full history
  4. Change Detection — Track price drops, status changes, new listings
  5. AI Analysis — LLM evaluates deals, estimates ROI, flags opportunities
  6. Alerts — Slack/email notifications for hot deals

Step 1: Define Property Schemas

First, define what data we want to extract from each listing:

from pydantic import BaseModel
from typing import Optional
from datetime import datetime

class PropertyListing(BaseModel):
    """Structured property listing data."""
    address: str
    city: str
    state: str
    zip_code: str
    price: int
    bedrooms: int
    bathrooms: float
    sqft: Optional[int] = None
    lot_size: Optional[str] = None
    year_built: Optional[int] = None
    property_type: str  # single_family, condo, townhouse, multi_family
    status: str  # active, pending, sold, price_reduced
    days_on_market: Optional[int] = None
    price_per_sqft: Optional[float] = None
    hoa_fee: Optional[int] = None
    description: Optional[str] = None
    features: list[str] = []
    listing_url: str

class PropertySearchResults(BaseModel):
    """Results from a listing search page."""
    properties: list[PropertyListing]
    total_results: Optional[int] = None
    page: int = 1

Step 2: Scrape and Extract Listings

Use the Mantis WebPerception API to scrape listing pages and extract structured data:

import httpx
import json

MANTIS_API_KEY = "your-api-key"
MANTIS_URL = "https://api.mantisapi.com/v1"

async def scrape_listings(search_url: str) -> PropertySearchResults:
    """Scrape a property listing page and extract structured data."""
    async with httpx.AsyncClient(timeout=30) as client:
        # Step 1: Scrape the page
        response = await client.post(
            f"{MANTIS_URL}/scrape",
            headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
            json={
                "url": search_url,
                "render_js": True,
                "wait_for": "networkidle"
            }
        )
        page_data = response.json()

        # Step 2: AI extraction
        extract_response = await client.post(
            f"{MANTIS_URL}/extract",
            headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
            json={
                "html": page_data["html"],
                "schema": PropertySearchResults.model_json_schema(),
                "prompt": "Extract all property listings from this real estate page. Include price, address, beds, baths, sqft, and all available details."
            }
        )
        data = extract_response.json()
        return PropertySearchResults(**data["extracted"])

# Example: Search for homes in Austin, TX under $500K
results = await scrape_listings(
    "https://www.realtor.com/realestateandhomes-search/Austin_TX/price-na-500000"
)
print(f"Found {len(results.properties)} properties")

Step 3: Store and Track Changes

Store listings in SQLite with full history to detect price changes and new listings:

import sqlite3
from datetime import datetime

def init_db(db_path: str = "real_estate.db"):
    """Initialize the real estate database."""
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS listings (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            address TEXT NOT NULL,
            city TEXT NOT NULL,
            state TEXT NOT NULL,
            zip_code TEXT,
            listing_url TEXT UNIQUE,
            property_type TEXT,
            bedrooms INTEGER,
            bathrooms REAL,
            sqft INTEGER,
            year_built INTEGER,
            first_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS price_history (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            listing_id INTEGER REFERENCES listings(id),
            price INTEGER NOT NULL,
            status TEXT NOT NULL,
            days_on_market INTEGER,
            recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS alerts (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            listing_id INTEGER REFERENCES listings(id),
            alert_type TEXT NOT NULL,
            details TEXT,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            sent BOOLEAN DEFAULT FALSE
        )
    """)
    conn.commit()
    return conn

def upsert_listing(conn, prop: PropertyListing) -> dict:
    """Insert or update a listing, track price changes."""
    # Check if listing exists
    existing = conn.execute(
        "SELECT id FROM listings WHERE listing_url = ?",
        (prop.listing_url,)
    ).fetchone()

    if existing:
        listing_id = existing[0]
        # Check last known price
        last_price = conn.execute(
            "SELECT price, status FROM price_history WHERE listing_id = ? ORDER BY recorded_at DESC LIMIT 1",
            (listing_id,)
        ).fetchone()

        changes = {}
        if last_price and last_price[0] != prop.price:
            changes["price_change"] = {
                "old": last_price[0],
                "new": prop.price,
                "diff": prop.price - last_price[0],
                "pct": round((prop.price - last_price[0]) / last_price[0] * 100, 1)
            }
        if last_price and last_price[1] != prop.status:
            changes["status_change"] = {
                "old": last_price[1],
                "new": prop.status
            }

        # Record new price point
        conn.execute(
            "INSERT INTO price_history (listing_id, price, status, days_on_market) VALUES (?, ?, ?, ?)",
            (listing_id, prop.price, prop.status, prop.days_on_market)
        )
        conn.commit()
        return {"action": "updated", "listing_id": listing_id, "changes": changes}
    else:
        # New listing
        cursor = conn.execute(
            """INSERT INTO listings (address, city, state, zip_code, listing_url, property_type, bedrooms, bathrooms, sqft, year_built)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (prop.address, prop.city, prop.state, prop.zip_code, prop.listing_url,
             prop.property_type, prop.bedrooms, prop.bathrooms, prop.sqft, prop.year_built)
        )
        listing_id = cursor.lastrowid
        conn.execute(
            "INSERT INTO price_history (listing_id, price, status, days_on_market) VALUES (?, ?, ?, ?)",
            (listing_id, prop.price, prop.status, prop.days_on_market)
        )
        conn.commit()
        return {"action": "new", "listing_id": listing_id}

Step 4: AI-Powered Deal Analysis

Use GPT-4o to evaluate properties and flag investment opportunities:

from openai import OpenAI

client = OpenAI()

async def analyze_deal(prop: PropertyListing, market_avg_ppsf: float) -> dict:
    """Use AI to analyze a property as an investment opportunity."""
    prompt = f"""Analyze this property listing as a real estate investment:

Property: {prop.address}, {prop.city}, {prop.state} {prop.zip_code}
Price: ${prop.price:,}
Type: {prop.property_type}
Beds/Baths: {prop.bedrooms}bd / {prop.bathrooms}ba
Sqft: {prop.sqft or 'Unknown'}
Year Built: {prop.year_built or 'Unknown'}
Days on Market: {prop.days_on_market or 'Unknown'}
Price/sqft: ${prop.price_per_sqft or 'Unknown'}
HOA: ${prop.hoa_fee or 0}/month
Status: {prop.status}
Features: {', '.join(prop.features[:10])}

Market Context:
- Average price/sqft in this area: ${market_avg_ppsf:.0f}
- This property's price/sqft: ${(prop.price / prop.sqft) if prop.sqft else 0:.0f}

Evaluate:
1. Is this priced below, at, or above market?
2. Estimated monthly rent potential (based on beds/baths/sqft/area)
3. Estimated cap rate and cash-on-cash return (assume 25% down, 7% interest)
4. Red flags or concerns
5. Overall deal rating: HOT_DEAL, GOOD, FAIR, PASS

Return JSON: {{"market_position": "below|at|above", "estimated_rent": 0, "cap_rate": 0.0, "cash_on_cash": 0.0, "red_flags": [], "rating": "HOT_DEAL|GOOD|FAIR|PASS", "summary": "..."}}"""

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )
    return json.loads(response.choices[0].message.content)

# Analyze a property
analysis = await analyze_deal(property_listing, market_avg_ppsf=285.0)
if analysis["rating"] in ("HOT_DEAL", "GOOD"):
    print(f"🔥 {analysis['rating']}: {analysis['summary']}")

Step 5: Automated Alerts

Send alerts when the system detects opportunities:

import httpx

async def send_slack_alert(webhook_url: str, prop: PropertyListing, analysis: dict, changes: dict = None):
    """Send a Slack alert for a hot deal or price drop."""
    emoji = {"HOT_DEAL": "🔥", "GOOD": "✅", "FAIR": "⚠️", "PASS": "❌"}
    blocks = []

    if changes and "price_change" in changes:
        change = changes["price_change"]
        blocks.append({
            "type": "header",
            "text": {"type": "plain_text", "text": f"💰 Price Drop: {prop.address}"}
        })
        blocks.append({
            "type": "section",
            "text": {"type": "mrkdwn", "text":
                f"*{prop.address}*\n{prop.city}, {prop.state} {prop.zip_code}\n\n"
                f"Old: ${change['old']:,} → New: *${change['new']:,}* ({change['pct']}%)\n"
                f"{prop.bedrooms}bd / {prop.bathrooms}ba · {prop.sqft or '?'} sqft\n"
                f"Rating: {emoji.get(analysis['rating'], '❓')} {analysis['rating']}\n"
                f"Est. rent: ${analysis['estimated_rent']:,}/mo · Cap rate: {analysis['cap_rate']}%"
            }
        })
    else:
        blocks.append({
            "type": "header",
            "text": {"type": "plain_text", "text": f"{emoji.get(analysis['rating'], '🏠')} New {analysis['rating']}: {prop.address}"}
        })
        blocks.append({
            "type": "section",
            "text": {"type": "mrkdwn", "text":
                f"*${prop.price:,}* · {prop.bedrooms}bd/{prop.bathrooms}ba · {prop.sqft or '?'} sqft\n"
                f"${prop.price_per_sqft or 0:.0f}/sqft (market avg: $285/sqft)\n"
                f"Est. rent: ${analysis['estimated_rent']:,}/mo · Cap rate: {analysis['cap_rate']}%\n\n"
                f"{analysis['summary']}"
            }
        })

    await httpx.AsyncClient().post(webhook_url, json={"blocks": blocks})

Step 6: Full Pipeline — Automated Daily Runs

Tie it all together into an autonomous monitoring pipeline:

import asyncio

SEARCH_URLS = [
    "https://www.realtor.com/realestateandhomes-search/Austin_TX/price-na-500000",
    "https://www.realtor.com/realestateandhomes-search/Austin_TX/price-na-500000/pg-2",
    # Add more search URLs for your target markets
]

async def run_daily_scan():
    """Run the full property scanning pipeline."""
    conn = init_db()
    new_listings = []
    price_drops = []
    hot_deals = []

    for url in SEARCH_URLS:
        try:
            results = await scrape_listings(url)
            for prop in results.properties:
                result = upsert_listing(conn, prop)

                if result["action"] == "new":
                    new_listings.append(prop)
                elif result["changes"].get("price_change", {}).get("diff", 0) < 0:
                    price_drops.append((prop, result["changes"]))

                # Analyze promising properties
                if (result["action"] == "new" or
                    result["changes"].get("price_change", {}).get("pct", 0) < -3):
                    analysis = await analyze_deal(prop, market_avg_ppsf=285.0)
                    if analysis["rating"] in ("HOT_DEAL", "GOOD"):
                        hot_deals.append((prop, analysis, result.get("changes")))
        except Exception as e:
            print(f"Error scraping {url}: {e}")

    # Send alerts for hot deals
    for prop, analysis, changes in hot_deals:
        await send_slack_alert(SLACK_WEBHOOK, prop, analysis, changes)

    print(f"Scan complete: {len(new_listings)} new, {len(price_drops)} price drops, {len(hot_deals)} hot deals")
    conn.close()

# Run daily via cron or scheduler
asyncio.run(run_daily_scan())

Cost Comparison: Traditional vs. AI Agent Approach

ApproachMonthly CostListings TrackedAI AnalysisCustom Alerts
Zillow Data Feed$500-$2,000Limited by planBasic
Commercial Data Provider$1,000-$5,000Full MLS
PropTech SaaS Tools$200-$500Platform limitsBasicTemplate-based
AI Agent + Mantis API$29-$99UnlimitedFull GPT-4oFully custom

With the Mantis Starter plan ($29/mo for 5,000 API calls), you can track thousands of listings across multiple markets. Add OpenAI costs (~$0.01 per analysis), and the total is a fraction of traditional data providers.

Use Cases

1. Real Estate Investors

Automatically scan markets for underpriced properties. Get AI-powered cap rate estimates and deal ratings before anyone else sees the listing.

2. Proptech Startups

Build property data pipelines without expensive MLS partnerships. Aggregate listings from multiple sources into your platform.

3. Real Estate Agents

Monitor price changes in your farm area. Get alerts when listings drop below market value so you can reach buyers first.

4. Market Researchers

Track housing trends across neighborhoods: median prices, days on market, inventory levels. Generate automated market reports for clients.

Start Tracking Real Estate Data Today

The Mantis WebPerception API gives your agent eyes on the web. Scrape listings, extract structured data, and build your real estate intelligence system in minutes.

Get Your Free API Key →

Best Practices

Respect Rate Limits and Terms of Service

Optimize for Accuracy

Scale Gradually

What's Next?

This real estate intelligence system gives you a foundation. Extend it with:

Ready to build? Check out our Complete Guide to Web Scraping with AI for the fundamentals, or dive into Structured Data Extraction with AI to master the extraction patterns used in this guide.

For price tracking foundations, see our Price Monitoring Guide. And if you're building a full research pipeline, check out Web Scraping for Market Research.