Web Scraping for Real Estate: How AI Agents Track Properties, Prices & Market Trends in 2026
Real estate data is worth billions. Zillow, Redfin, and Realtor.com built empires on aggregating property information — but their APIs are limited, expensive, or locked behind partnerships. Meanwhile, investors, agents, and proptech startups need real-time, comprehensive property data to make decisions.
AI agents change the game. Instead of manually checking listings or paying $500+/month for data feeds, you can build an autonomous agent that scrapes property listings, extracts structured data with AI, tracks price changes, and surfaces investment opportunities — all for a fraction of the cost.
In this guide, you'll build a complete real estate intelligence system using Python, the Mantis WebPerception API, and OpenAI.
Why AI-Powered Real Estate Scraping?
Traditional real estate data collection has serious limitations:
- MLS access is restricted — only licensed agents can access, and data sharing is limited
- Portal APIs are limited — Zillow's API was deprecated, Redfin has no public API
- Data feeds are expensive — commercial real estate data providers charge $500-$5,000/month
- Manual tracking doesn't scale — you can't watch 10,000 listings by hand
- Websites change constantly — CSS selector scrapers break every few weeks
AI extraction solves these problems. Instead of brittle selectors, the AI understands the page content and extracts structured property data regardless of layout changes.
Architecture: The Real Estate Intelligence Stack
Here's what we're building:
- Discovery — Scrape listing pages to find properties matching criteria
- Extraction — AI extracts structured data: price, beds, baths, sqft, address, features
- Storage — SQLite database with full history
- Change Detection — Track price drops, status changes, new listings
- AI Analysis — LLM evaluates deals, estimates ROI, flags opportunities
- Alerts — Slack/email notifications for hot deals
Step 1: Define Property Schemas
First, define what data we want to extract from each listing:
from pydantic import BaseModel
from typing import Optional
from datetime import datetime
class PropertyListing(BaseModel):
"""Structured property listing data."""
address: str
city: str
state: str
zip_code: str
price: int
bedrooms: int
bathrooms: float
sqft: Optional[int] = None
lot_size: Optional[str] = None
year_built: Optional[int] = None
property_type: str # single_family, condo, townhouse, multi_family
status: str # active, pending, sold, price_reduced
days_on_market: Optional[int] = None
price_per_sqft: Optional[float] = None
hoa_fee: Optional[int] = None
description: Optional[str] = None
features: list[str] = []
listing_url: str
class PropertySearchResults(BaseModel):
"""Results from a listing search page."""
properties: list[PropertyListing]
total_results: Optional[int] = None
page: int = 1
Step 2: Scrape and Extract Listings
Use the Mantis WebPerception API to scrape listing pages and extract structured data:
import httpx
import json
MANTIS_API_KEY = "your-api-key"
MANTIS_URL = "https://api.mantisapi.com/v1"
async def scrape_listings(search_url: str) -> PropertySearchResults:
"""Scrape a property listing page and extract structured data."""
async with httpx.AsyncClient(timeout=30) as client:
# Step 1: Scrape the page
response = await client.post(
f"{MANTIS_URL}/scrape",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": search_url,
"render_js": True,
"wait_for": "networkidle"
}
)
page_data = response.json()
# Step 2: AI extraction
extract_response = await client.post(
f"{MANTIS_URL}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"html": page_data["html"],
"schema": PropertySearchResults.model_json_schema(),
"prompt": "Extract all property listings from this real estate page. Include price, address, beds, baths, sqft, and all available details."
}
)
data = extract_response.json()
return PropertySearchResults(**data["extracted"])
# Example: Search for homes in Austin, TX under $500K
results = await scrape_listings(
"https://www.realtor.com/realestateandhomes-search/Austin_TX/price-na-500000"
)
print(f"Found {len(results.properties)} properties")
Step 3: Store and Track Changes
Store listings in SQLite with full history to detect price changes and new listings:
import sqlite3
from datetime import datetime
def init_db(db_path: str = "real_estate.db"):
"""Initialize the real estate database."""
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS listings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
address TEXT NOT NULL,
city TEXT NOT NULL,
state TEXT NOT NULL,
zip_code TEXT,
listing_url TEXT UNIQUE,
property_type TEXT,
bedrooms INTEGER,
bathrooms REAL,
sqft INTEGER,
year_built INTEGER,
first_seen TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS price_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
listing_id INTEGER REFERENCES listings(id),
price INTEGER NOT NULL,
status TEXT NOT NULL,
days_on_market INTEGER,
recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS alerts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
listing_id INTEGER REFERENCES listings(id),
alert_type TEXT NOT NULL,
details TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
sent BOOLEAN DEFAULT FALSE
)
""")
conn.commit()
return conn
def upsert_listing(conn, prop: PropertyListing) -> dict:
"""Insert or update a listing, track price changes."""
# Check if listing exists
existing = conn.execute(
"SELECT id FROM listings WHERE listing_url = ?",
(prop.listing_url,)
).fetchone()
if existing:
listing_id = existing[0]
# Check last known price
last_price = conn.execute(
"SELECT price, status FROM price_history WHERE listing_id = ? ORDER BY recorded_at DESC LIMIT 1",
(listing_id,)
).fetchone()
changes = {}
if last_price and last_price[0] != prop.price:
changes["price_change"] = {
"old": last_price[0],
"new": prop.price,
"diff": prop.price - last_price[0],
"pct": round((prop.price - last_price[0]) / last_price[0] * 100, 1)
}
if last_price and last_price[1] != prop.status:
changes["status_change"] = {
"old": last_price[1],
"new": prop.status
}
# Record new price point
conn.execute(
"INSERT INTO price_history (listing_id, price, status, days_on_market) VALUES (?, ?, ?, ?)",
(listing_id, prop.price, prop.status, prop.days_on_market)
)
conn.commit()
return {"action": "updated", "listing_id": listing_id, "changes": changes}
else:
# New listing
cursor = conn.execute(
"""INSERT INTO listings (address, city, state, zip_code, listing_url, property_type, bedrooms, bathrooms, sqft, year_built)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(prop.address, prop.city, prop.state, prop.zip_code, prop.listing_url,
prop.property_type, prop.bedrooms, prop.bathrooms, prop.sqft, prop.year_built)
)
listing_id = cursor.lastrowid
conn.execute(
"INSERT INTO price_history (listing_id, price, status, days_on_market) VALUES (?, ?, ?, ?)",
(listing_id, prop.price, prop.status, prop.days_on_market)
)
conn.commit()
return {"action": "new", "listing_id": listing_id}
Step 4: AI-Powered Deal Analysis
Use GPT-4o to evaluate properties and flag investment opportunities:
from openai import OpenAI
client = OpenAI()
async def analyze_deal(prop: PropertyListing, market_avg_ppsf: float) -> dict:
"""Use AI to analyze a property as an investment opportunity."""
prompt = f"""Analyze this property listing as a real estate investment:
Property: {prop.address}, {prop.city}, {prop.state} {prop.zip_code}
Price: ${prop.price:,}
Type: {prop.property_type}
Beds/Baths: {prop.bedrooms}bd / {prop.bathrooms}ba
Sqft: {prop.sqft or 'Unknown'}
Year Built: {prop.year_built or 'Unknown'}
Days on Market: {prop.days_on_market or 'Unknown'}
Price/sqft: ${prop.price_per_sqft or 'Unknown'}
HOA: ${prop.hoa_fee or 0}/month
Status: {prop.status}
Features: {', '.join(prop.features[:10])}
Market Context:
- Average price/sqft in this area: ${market_avg_ppsf:.0f}
- This property's price/sqft: ${(prop.price / prop.sqft) if prop.sqft else 0:.0f}
Evaluate:
1. Is this priced below, at, or above market?
2. Estimated monthly rent potential (based on beds/baths/sqft/area)
3. Estimated cap rate and cash-on-cash return (assume 25% down, 7% interest)
4. Red flags or concerns
5. Overall deal rating: HOT_DEAL, GOOD, FAIR, PASS
Return JSON: {{"market_position": "below|at|above", "estimated_rent": 0, "cap_rate": 0.0, "cash_on_cash": 0.0, "red_flags": [], "rating": "HOT_DEAL|GOOD|FAIR|PASS", "summary": "..."}}"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
# Analyze a property
analysis = await analyze_deal(property_listing, market_avg_ppsf=285.0)
if analysis["rating"] in ("HOT_DEAL", "GOOD"):
print(f"🔥 {analysis['rating']}: {analysis['summary']}")
Step 5: Automated Alerts
Send alerts when the system detects opportunities:
import httpx
async def send_slack_alert(webhook_url: str, prop: PropertyListing, analysis: dict, changes: dict = None):
"""Send a Slack alert for a hot deal or price drop."""
emoji = {"HOT_DEAL": "🔥", "GOOD": "✅", "FAIR": "⚠️", "PASS": "❌"}
blocks = []
if changes and "price_change" in changes:
change = changes["price_change"]
blocks.append({
"type": "header",
"text": {"type": "plain_text", "text": f"💰 Price Drop: {prop.address}"}
})
blocks.append({
"type": "section",
"text": {"type": "mrkdwn", "text":
f"*{prop.address}*\n{prop.city}, {prop.state} {prop.zip_code}\n\n"
f"Old: ${change['old']:,} → New: *${change['new']:,}* ({change['pct']}%)\n"
f"{prop.bedrooms}bd / {prop.bathrooms}ba · {prop.sqft or '?'} sqft\n"
f"Rating: {emoji.get(analysis['rating'], '❓')} {analysis['rating']}\n"
f"Est. rent: ${analysis['estimated_rent']:,}/mo · Cap rate: {analysis['cap_rate']}%"
}
})
else:
blocks.append({
"type": "header",
"text": {"type": "plain_text", "text": f"{emoji.get(analysis['rating'], '🏠')} New {analysis['rating']}: {prop.address}"}
})
blocks.append({
"type": "section",
"text": {"type": "mrkdwn", "text":
f"*${prop.price:,}* · {prop.bedrooms}bd/{prop.bathrooms}ba · {prop.sqft or '?'} sqft\n"
f"${prop.price_per_sqft or 0:.0f}/sqft (market avg: $285/sqft)\n"
f"Est. rent: ${analysis['estimated_rent']:,}/mo · Cap rate: {analysis['cap_rate']}%\n\n"
f"{analysis['summary']}"
}
})
await httpx.AsyncClient().post(webhook_url, json={"blocks": blocks})
Step 6: Full Pipeline — Automated Daily Runs
Tie it all together into an autonomous monitoring pipeline:
import asyncio
SEARCH_URLS = [
"https://www.realtor.com/realestateandhomes-search/Austin_TX/price-na-500000",
"https://www.realtor.com/realestateandhomes-search/Austin_TX/price-na-500000/pg-2",
# Add more search URLs for your target markets
]
async def run_daily_scan():
"""Run the full property scanning pipeline."""
conn = init_db()
new_listings = []
price_drops = []
hot_deals = []
for url in SEARCH_URLS:
try:
results = await scrape_listings(url)
for prop in results.properties:
result = upsert_listing(conn, prop)
if result["action"] == "new":
new_listings.append(prop)
elif result["changes"].get("price_change", {}).get("diff", 0) < 0:
price_drops.append((prop, result["changes"]))
# Analyze promising properties
if (result["action"] == "new" or
result["changes"].get("price_change", {}).get("pct", 0) < -3):
analysis = await analyze_deal(prop, market_avg_ppsf=285.0)
if analysis["rating"] in ("HOT_DEAL", "GOOD"):
hot_deals.append((prop, analysis, result.get("changes")))
except Exception as e:
print(f"Error scraping {url}: {e}")
# Send alerts for hot deals
for prop, analysis, changes in hot_deals:
await send_slack_alert(SLACK_WEBHOOK, prop, analysis, changes)
print(f"Scan complete: {len(new_listings)} new, {len(price_drops)} price drops, {len(hot_deals)} hot deals")
conn.close()
# Run daily via cron or scheduler
asyncio.run(run_daily_scan())
Cost Comparison: Traditional vs. AI Agent Approach
| Approach | Monthly Cost | Listings Tracked | AI Analysis | Custom Alerts |
|---|---|---|---|---|
| Zillow Data Feed | $500-$2,000 | Limited by plan | ❌ | Basic |
| Commercial Data Provider | $1,000-$5,000 | Full MLS | ❌ | ❌ |
| PropTech SaaS Tools | $200-$500 | Platform limits | Basic | Template-based |
| AI Agent + Mantis API | $29-$99 | Unlimited | Full GPT-4o | Fully custom |
With the Mantis Starter plan ($29/mo for 5,000 API calls), you can track thousands of listings across multiple markets. Add OpenAI costs (~$0.01 per analysis), and the total is a fraction of traditional data providers.
Use Cases
1. Real Estate Investors
Automatically scan markets for underpriced properties. Get AI-powered cap rate estimates and deal ratings before anyone else sees the listing.
2. Proptech Startups
Build property data pipelines without expensive MLS partnerships. Aggregate listings from multiple sources into your platform.
3. Real Estate Agents
Monitor price changes in your farm area. Get alerts when listings drop below market value so you can reach buyers first.
4. Market Researchers
Track housing trends across neighborhoods: median prices, days on market, inventory levels. Generate automated market reports for clients.
Start Tracking Real Estate Data Today
The Mantis WebPerception API gives your agent eyes on the web. Scrape listings, extract structured data, and build your real estate intelligence system in minutes.
Get Your Free API Key →Best Practices
Respect Rate Limits and Terms of Service
- Throttle requests — Add delays between requests (2-5 seconds) to avoid overloading listing sites
- Cache aggressively — Don't re-scrape listings that haven't changed
- Check robots.txt — Respect crawling restrictions
- Use data responsibly — Don't republish scraped listing photos or agent contact info
Optimize for Accuracy
- Validate extracted data — Cross-check prices and sqft against reasonable ranges
- Handle edge cases — Auction listings, pre-foreclosures, and land-only lots have different data patterns
- Update market averages — Recalculate your area price/sqft benchmarks monthly
Scale Gradually
- Start with one market — Perfect your pipeline before expanding
- Add markets incrementally — Each market may have different listing site layouts
- Monitor extraction quality — Spot-check AI extractions weekly
What's Next?
This real estate intelligence system gives you a foundation. Extend it with:
- Comparable analysis (comps) — Automatically pull and analyze nearby sold properties
- Neighborhood scoring — Scrape school ratings, crime data, walkability scores
- Rental market tracking — Monitor Craigslist and Apartments.com for rent comps
- Multi-market dashboards — Build a Streamlit or Next.js dashboard for your portfolio
Ready to build? Check out our Complete Guide to Web Scraping with AI for the fundamentals, or dive into Structured Data Extraction with AI to master the extraction patterns used in this guide.
For price tracking foundations, see our Price Monitoring Guide. And if you're building a full research pipeline, check out Web Scraping for Market Research.