๐ Table of Contents
- Why Scrape LinkedIn?
- What Data Can You Extract?
- Method 1: Python + BeautifulSoup
- Method 2: Playwright (Headless Browser)
- Method 3: Node.js + Cheerio
- Method 4: Web Scraping API (Easiest)
- Beating LinkedIn's Anti-Bot Detection
- LinkedIn Official API vs Scraping
- Method Comparison
- Real-World Use Cases
- Legal Considerations
- FAQ
Why Scrape LinkedIn?
LinkedIn is the world's largest professional network with over 1 billion members and 67 million company profiles. That professional data powers some of the most valuable business intelligence in B2B:
- Lead generation โ Build targeted prospect lists based on title, company, industry, and location
- Recruiting & talent sourcing โ Find candidates matching specific skills, experience levels, and locations
- Job market analysis โ Track hiring trends, salary ranges, skill demand, and emerging roles
- Competitive intelligence โ Monitor competitor headcount, hiring patterns, and organizational changes
- Sales intelligence โ Enrich CRM records with current titles, companies, and professional details
- AI agent research tools โ Give AI assistants the ability to research people, companies, and job markets
- Market research โ Analyze industry trends, company growth trajectories, and workforce composition
Whether you're building a recruiting tool, a sales prospecting engine, or an AI research agent, LinkedIn data is foundational for any B2B data strategy.
What Data Can You Extract?
LinkedIn has three main data categories, each with different accessibility levels:
Public Profiles
| Data Point | Availability | Notes |
|---|---|---|
| Full Name | Usually public | Some profiles show first name + last initial only |
| Headline | Public | Current role/tagline |
| Current Company & Title | Public | Most recent experience entry |
| Location | Public | Metro area (e.g., "San Francisco Bay Area") |
| About/Summary | Usually public | May be truncated without login |
| Experience History | Usually public | Companies, titles, dates, descriptions |
| Education | Usually public | Schools, degrees, dates |
| Skills | Partially public | Top skills visible; full list may require login |
| Certifications | Usually public | Issuing org, credential ID |
| Profile Photo URL | Varies | Some users restrict to connections only |
Job Listings
| Data Point | Availability | Notes |
|---|---|---|
| Job Title | Public | Accessible via linkedin.com/jobs |
| Company Name | Public | Links to company page |
| Location | Public | Includes remote/hybrid/on-site labels |
| Job Description | Public | Full text on job detail page |
| Posting Date | Public | Relative ("2 days ago") or absolute |
| Seniority Level | Public | Entry, Mid, Senior, Director, etc. |
| Employment Type | Public | Full-time, Part-time, Contract |
| Applicant Count | Sometimes | "Over 100 applicants" shown on some listings |
| Salary Range | Sometimes | Shown when employer provides it |
Company Pages
| Data Point | Availability | Notes |
|---|---|---|
| Company Name | Public | Official name |
| Industry | Public | Self-reported category |
| Employee Count | Public | Range (e.g., "501-1,000") |
| Headquarters | Public | City, state, country |
| Description | Public | About section |
| Website | Public | Company URL |
| Founded Year | Public | When available |
| Specialties | Public | Self-reported tags |
Method 1: Python + BeautifulSoup
Best for scraping public LinkedIn pages that don't require login โ particularly job listings and public profiles. LinkedIn serves some content without JavaScript, making simple HTTP requests viable for basic data extraction.
Install Dependencies
pip install requests beautifulsoup4 lxml
Public Job Listings Scraper
# linkedin_jobs.py import requests from bs4 import BeautifulSoup import json import time import random HEADERS = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " "AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/125.0.0.0 Safari/537.36", "Accept-Language": "en-US,en;q=0.9", "Accept": "text/html,application/xhtml+xml", } def scrape_linkedin_jobs(keyword: str, location: str = "", pages: int = 3) -> list: """Scrape LinkedIn public job listings.""" jobs = [] for page in range(pages): start = page * 25 url = ( f"https://www.linkedin.com/jobs/search/" f"?keywords={keyword.replace(' ', '%20')}" f"&location={location.replace(' ', '%20')}" f"&start={start}" ) resp = requests.get(url, headers=HEADERS, timeout=15) resp.raise_for_status() soup = BeautifulSoup(resp.text, "lxml") for card in soup.select(".base-card"): title_el = card.select_one(".base-search-card__title") company_el = card.select_one( ".base-search-card__subtitle a" ) location_el = card.select_one( ".job-search-card__location" ) link_el = card.select_one("a.base-card__full-link") date_el = card.select_one("time") jobs.append({ "title": ( title_el.text.strip() if title_el else None ), "company": ( company_el.text.strip() if company_el else None ), "location": ( location_el.text.strip() if location_el else None ), "url": link_el["href"] if link_el else None, "posted": ( date_el.get("datetime") if date_el else None ), }) time.sleep(random.uniform(3, 8)) return jobs # Example: search for Python developer jobs results = scrape_linkedin_jobs( "python developer", "San Francisco", pages=2 ) print(f"Found {len(results)} jobs") print(json.dumps(results[:3], indent=2))
Public Profile Scraper
# linkedin_profile.py def scrape_public_profile(profile_url: str) -> dict: """Scrape a public LinkedIn profile page.""" resp = requests.get(profile_url, headers=HEADERS, timeout=15) resp.raise_for_status() soup = BeautifulSoup(resp.text, "lxml") def text(selector): el = soup.select_one(selector) return el.get_text(strip=True) if el else None # Extract experience entries experiences = [] for exp in soup.select(".experience-item"): title = exp.select_one(".experience-item__title") company = exp.select_one(".experience-item__subtitle") duration = exp.select_one(".experience-item__duration") experiences.append({ "title": title.text.strip() if title else None, "company": company.text.strip() if company else None, "duration": duration.text.strip() if duration else None, }) # Extract education education = [] for edu in soup.select(".education__list-item"): school = edu.select_one(".education__item--school") degree = edu.select_one(".education__item--degree") education.append({ "school": school.text.strip() if school else None, "degree": degree.text.strip() if degree else None, }) return { "name": text(".top-card-layout__title"), "headline": text(".top-card-layout__headline"), "location": text(".top-card-layout__first-subline"), "about": text(".summary .core-section-container__content"), "experience": experiences, "education": education, "url": profile_url, } profile = scrape_public_profile( "https://www.linkedin.com/in/example-profile" ) print(json.dumps(profile, indent=2))
The examples above only scrape publicly accessible LinkedIn pages โ no login required. LinkedIn shows significantly more data to logged-in users, but scraping while authenticated carries much higher legal risk and violates LinkedIn's Terms of Service more clearly. For production use, stick to public data or use an API.
Method 2: Playwright (Headless Browser)
LinkedIn relies heavily on JavaScript for rendering profile sections, job details, and infinite scroll. Playwright renders the full page, giving you access to dynamically loaded content.
Install
pip install playwright playwright install chromium
Job Detail Scraper
# playwright_linkedin.py import asyncio from playwright.async_api import async_playwright import json async def scrape_job_detail(job_url: str) -> dict: async with async_playwright() as p: browser = await p.chromium.launch(headless=True) context = await browser.new_context( user_agent=( "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " "AppleWebKit/537.36 (KHTML, like Gecko) " "Chrome/125.0.0.0 Safari/537.36" ), viewport={"width": 1920, "height": 1080}, locale="en-US", ) page = await context.new_page() # Block images and tracking for speed await page.route("**/*.{png,jpg,jpeg,gif,svg}", lambda route: route.abort()) await page.route("**/tracking/**", lambda route: route.abort()) await page.goto(job_url, wait_until="domcontentloaded") await page.wait_for_timeout(3000) # Click "Show more" to expand description try: show_more = page.locator( "button.show-more-less-html__button" ) if await show_more.count() > 0: await show_more.first.click() await page.wait_for_timeout(500) except Exception: pass job = await page.evaluate("""() => { const text = (sel) => { const el = document.querySelector(sel); return el ? el.textContent.trim() : null; }; const criteria = {}; document.querySelectorAll( '.description__job-criteria-item' ).forEach(item => { const label = item.querySelector( '.description__job-criteria-subheader' ); const value = item.querySelector( '.description__job-criteria-text' ); if (label && value) { criteria[label.textContent.trim() .toLowerCase().replace(/ /g, '_')] = value.textContent.trim(); } }); return { title: text('.top-card-layout__title'), company: text('.topcard__org-name-link') || text('.top-card-layout__second-subline a'), location: text('.topcard__flavor--bullet'), description: text( '.show-more-less-html__markup' ), posted: text('.posted-time-ago__text'), applicants: text('.num-applicants__caption'), criteria: criteria, }; }""") await browser.close() return job # Scrape a specific job listing job = asyncio.run(scrape_job_detail( "https://www.linkedin.com/jobs/view/1234567890" )) print(json.dumps(job, indent=2))
Profile Scraper with Infinite Scroll
# playwright_profile.py async def scrape_profile_full(profile_url: str) -> dict: async with async_playwright() as p: browser = await p.chromium.launch(headless=True) context = await browser.new_context( user_agent=( "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36" ), viewport={"width": 1920, "height": 1080}, ) page = await context.new_page() await page.goto(profile_url, wait_until="domcontentloaded") await page.wait_for_timeout(2000) # Scroll to load lazy sections for _ in range(5): await page.evaluate( "window.scrollBy(0, window.innerHeight)" ) await page.wait_for_timeout(800) profile = await page.evaluate("""() => { const text = (sel) => { const el = document.querySelector(sel); return el ? el.textContent.trim() : null; }; const allText = (sel) => { return [...document.querySelectorAll(sel)] .map(el => el.textContent.trim()) .filter(Boolean); }; return { name: text('.top-card-layout__title'), headline: text('.top-card-layout__headline'), location: text('.top-card-layout__first-subline'), about: text( '.summary .core-section-container__content p' ), experience: allText( '.experience-item .experience-item__title' ), education: allText( '.education__item--school-name' ), skills: allText('.skill-categories-card li'), }; }""") await browser.close() return profile
LinkedIn shows an "authwall" modal prompting login after a few page views. Adding ?trk=public_profile to profile URLs and using clean browser contexts (no cookies) can help. Rotating IP addresses between requests also resets LinkedIn's session tracking.
Method 3: Node.js + Cheerio
Lightweight and fast โ ideal for scraping LinkedIn job listings from a Node.js backend or serverless function.
Install
npm install cheerio node-fetch
Job Search Scraper
// linkedin-jobs.mjs import fetch from "node-fetch"; import * as cheerio from "cheerio"; const HEADERS = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " + "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36", "Accept-Language": "en-US,en;q=0.9", Accept: "text/html", }; async function scrapeJobs(keyword, location = "", pages = 3) { const jobs = []; for (let page = 0; page < pages; page++) { const start = page * 25; const url = `https://www.linkedin.com/jobs/search/` + `?keywords=${encodeURIComponent(keyword)}` + `&location=${encodeURIComponent(location)}` + `&start=${start}`; const resp = await fetch(url, { headers: HEADERS }); const html = await resp.text(); const $ = cheerio.load(html); $(".base-card").each((_, card) => { const $card = $(card); jobs.push({ title: $card.find(".base-search-card__title") .text().trim() || null, company: $card.find(".base-search-card__subtitle a") .text().trim() || null, location: $card.find(".job-search-card__location") .text().trim() || null, url: $card.find("a.base-card__full-link") .attr("href") || null, posted: $card.find("time").attr("datetime") || null, }); }); // Polite delay await new Promise((r) => setTimeout(r, 3000 + Math.random() * 5000) ); } return jobs; } // Usage const jobs = await scrapeJobs("machine learning engineer", "New York"); console.log(`Found ${jobs.length} jobs`); console.log(JSON.stringify(jobs.slice(0, 3), null, 2));
Company Page Scraper
// linkedin-company.mjs async function scrapeCompany(companySlug) { const url = `https://www.linkedin.com/company/${companySlug}/about/`; const resp = await fetch(url, { headers: HEADERS }); const html = await resp.text(); const $ = cheerio.load(html); const text = (sel) => $(sel).first().text().trim() || null; // Extract detail items const details = {}; $(".core-section-container__content dt").each((i, el) => { const key = $(el).text().trim().toLowerCase() .replace(/ /g, "_"); const value = $(el).next("dd").text().trim(); if (key && value) details[key] = value; }); return { name: text(".top-card-layout__title"), tagline: text(".top-card-layout__headline"), description: text( ".core-section-container__content p" ), ...details, url: `https://www.linkedin.com/company/${companySlug}`, }; } const company = await scrapeCompany("google"); console.log(JSON.stringify(company, null, 2));
Method 4: Web Scraping API (Easiest)
The most reliable approach for production LinkedIn scraping. A web scraping API handles proxies, anti-detection, JavaScript rendering, and login wall bypassing โ you just send a URL and get structured data back.
Scraping Profiles with Mantis
# One API call โ structured LinkedIn profile data
import requests
resp = requests.post(
"https://api.mantisapi.com/v1/scrape",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"url": "https://www.linkedin.com/in/example-profile",
"extract": {
"name": "person's full name",
"headline": "professional headline",
"location": "geographic location",
"about": "about/summary section",
"current_company": "current employer name",
"current_title": "current job title",
"experience": "work experience (array of: company, title, dates, description)",
"education": "education history (array of: school, degree, dates)",
"skills": "listed skills (array)",
},
"render_js": True,
},
)
profile = resp.json()
print(profile)
Scraping Job Listings with Mantis
# Bulk job search
resp = requests.post(
"https://api.mantisapi.com/v1/scrape",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"url": (
"https://www.linkedin.com/jobs/search/"
"?keywords=AI+engineer&location=Remote"
),
"extract": {
"jobs": (
"array of job listings with: "
"title, company, location, url, posted_date"
),
},
"render_js": True,
},
)
jobs = resp.json().get("jobs", [])
for job in jobs[:5]:
print(f"{job['title']} at {job['company']} โ {job['location']}")
Skip the Login Walls and IP Bans
Mantis handles LinkedIn's anti-bot detection, proxy rotation, authwall bypassing, and JavaScript rendering โ so you don't have to.
View Pricing Get Started FreeNode.js with Mantis
// mantis-linkedin.mjs
const resp = await fetch("https://api.mantisapi.com/v1/scrape", {
method: "POST",
headers: {
Authorization: "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
url: "https://www.linkedin.com/in/example-profile",
extract: {
name: "full name",
headline: "professional headline",
current_role: "current job title and company",
experience: "work history (array)",
skills: "professional skills (array)",
},
render_js: true,
}),
});
const profile = await resp.json();
console.log(profile);
Beating LinkedIn's Anti-Bot Detection
LinkedIn invests heavily in detecting and blocking scrapers. Here's what you're up against:
LinkedIn's Defense Layers
| Defense | What It Does | Countermeasure |
|---|---|---|
| Authwall | Prompts login after a few page views | Clean sessions, ?trk= params, IP rotation |
| IP Rate Limiting | Blocks IPs with too many requests | Rotating residential proxies |
| CAPTCHA Challenges | Serves CAPTCHA on suspicious patterns | CAPTCHA solving services or API |
| Account Restrictions | Limits or suspends accounts scraping while logged in | Don't use personal accounts; use public pages |
| Browser Fingerprinting | Detects headless browsers via JS | Stealth plugins, real browser profiles |
| Session Tracking | Correlates requests across pageviews | Fresh sessions per request batch |
| Honeypot Links | Invisible links that only bots follow | Only follow visible, user-facing links |
Essential Anti-Detection Techniques
# linkedin_stealth.py
import random
import time
import requests
PROXY_POOL = [
"http://user:pass@proxy1.example.com:8080",
"http://user:pass@proxy2.example.com:8080",
"http://user:pass@proxy3.example.com:8080",
]
USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"Chrome/125.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
"Chrome/125.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"Chrome/124.0.0.0 Safari/537.36",
]
def create_linkedin_session():
"""Create a session configured for LinkedIn scraping."""
session = requests.Session()
proxy = random.choice(PROXY_POOL)
session.proxies = {"http": proxy, "https": proxy}
session.headers.update({
"User-Agent": random.choice(USER_AGENTS),
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml",
"Referer": "https://www.google.com/",
"DNT": "1",
})
return session
def is_authwall(html: str) -> bool:
"""Detect LinkedIn's login prompt."""
return any(marker in html.lower() for marker in [
"authwall", "sign in", "join now",
"login-form", "session_key",
])
def scrape_with_retry(url: str, max_retries: int = 3) -> str:
"""Scrape with authwall detection and retry."""
for attempt in range(max_retries):
session = create_linkedin_session()
try:
resp = session.get(url, timeout=15)
if resp.status_code == 429:
wait = 30 * (attempt + 1)
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
continue
if is_authwall(resp.text):
print("Authwall detected โ rotating proxy")
continue
return resp.text
except Exception as e:
print(f"Attempt {attempt + 1} failed: {e}")
time.sleep(random.uniform(5, 15))
raise Exception(f"Failed after {max_retries} retries")
def polite_delay():
"""Longer delays for LinkedIn โ they're aggressive."""
time.sleep(random.uniform(5, 20))
Stick to public pages. LinkedIn's public job listings (linkedin.com/jobs/) and public profiles (linkedin.com/in/) are the safest targets. Scraping while logged in dramatically increases the risk of account suspension and legal action.
LinkedIn Official API vs Scraping
LinkedIn offers several APIs, but none provide the data most scrapers need:
| Feature | LinkedIn API | Web Scraping | Mantis API |
|---|---|---|---|
| Profile Data | Own profile only (authenticated user) | Any public profile | Any public profile |
| Job Listings | Not available via API | Full search + detail pages | Full search + detail pages |
| Company Data | Managed pages only | Any public company page | Any public company page |
| People Search | Not available | Via Google dorks or site search | Via URL-based extraction |
| Rate Limits | Varies (100-500 calls/day typical) | Depends on proxy pool | Based on plan (up to 100K/mo) |
| Setup | App review required (weeks-months) | Immediate (need proxies) | Immediate (API key) |
| Approval | Requires LinkedIn partnership for most endpoints | None | None |
| Cost | Free (very limited) or enterprise pricing | $50-500+/mo (proxies) | $0-299/mo |
| Reliability | High (when approved) | Medium (authwalls, blocks) | High (maintained infrastructure) |
| Legal Risk | None (authorized) | Low for public data (hiQ ruling) | API handles compliance |
LinkedIn's API is designed for marketing and authentication, not data access. You can post on company pages and sign users in, but you cannot search for people, browse job listings, or access other users' profiles. This is why scraping (especially public pages) remains the primary method for LinkedIn data collection.
Method Comparison
| Criteria | Python + BS4 | Playwright | Node.js + Cheerio | Mantis API |
|---|---|---|---|---|
| Setup Time | 5 min | 10 min | 5 min | 2 min |
| JS Rendering | โ | โ | โ | โ |
| Authwall Handling | Manual detection | Manual workaround | Manual detection | Built-in |
| Anti-Detection | Basic | Good (with stealth) | Basic | Built-in |
| Speed | Fast | Slow | Fast | Medium |
| Maintenance | High | High | High | None |
| Scale | Low | Low | Medium | High |
| Cost (5K profiles/mo) | $100-300 (proxies) | $200-500 (proxies + compute) | $100-300 (proxies) | $99 (Pro plan) |
| Best For | Job listings | Full profiles | Serverless jobs | Production |
Real-World Use Cases
1. Recruiter Lead Generation Tool
Build a pipeline that finds candidates matching specific criteria โ title, skills, location โ and enriches them with LinkedIn profile data for outreach.
# recruiter_tool.py import requests import json MANTIS_KEY = "YOUR_API_KEY" def find_candidates( title: str, location: str, skills: list ) -> list: """Find candidates matching criteria via LinkedIn.""" # Use Google to find LinkedIn profiles query = ( f'site:linkedin.com/in/ "{title}" "{location}" ' + " ".join(f'"{s}"' for s in skills[:3]) ) search_resp = requests.post( "https://api.mantisapi.com/v1/scrape", headers={ "Authorization": f"Bearer {MANTIS_KEY}", "Content-Type": "application/json", }, json={ "url": f"https://www.google.com/search?q={query}&num=10", "extract": { "results": ( "array of search results with: " "title, url, snippet" ), }, "render_js": True, }, ) results = search_resp.json().get("results", []) # Enrich each profile candidates = [] for r in results: if "linkedin.com/in/" not in r.get("url", ""): continue profile_resp = requests.post( "https://api.mantisapi.com/v1/scrape", headers={ "Authorization": f"Bearer {MANTIS_KEY}", "Content-Type": "application/json", }, json={ "url": r["url"], "extract": { "name": "full name", "headline": "professional headline", "location": "location", "current_company": "current employer", "current_title": "current job title", "experience_years": "total years of experience (estimate)", "skills": "key skills (array, top 10)", }, "render_js": True, }, ) candidate = profile_resp.json() candidate["linkedin_url"] = r["url"] candidates.append(candidate) return candidates candidates = find_candidates( title="Senior ML Engineer", location="San Francisco", skills=["PyTorch", "LLM", "MLOps"], ) print(f"Found {len(candidates)} candidates") for c in candidates: print(f" {c.get('name')} โ {c.get('headline')}") print(f" {c.get('linkedin_url')}\n")
2. Job Market Analyzer
Track hiring trends, salary ranges, and skill demand across industries โ useful for career platforms, salary benchmarking tools, and workforce analytics.
# job_analyzer.py import requests import json from collections import Counter from datetime import datetime MANTIS_KEY = "YOUR_API_KEY" def analyze_job_market( keyword: str, location: str = "" ) -> dict: """Analyze job listings for trends.""" # Scrape job listings resp = requests.post( "https://api.mantisapi.com/v1/scrape", headers={ "Authorization": f"Bearer {MANTIS_KEY}", "Content-Type": "application/json", }, json={ "url": ( f"https://www.linkedin.com/jobs/search/" f"?keywords={keyword}&location={location}" ), "extract": { "jobs": ( "array of all visible jobs with: title, " "company, location, seniority_level, " "employment_type, posted_date" ), }, "render_js": True, }, ) jobs = resp.json().get("jobs", []) # Analyze trends companies = Counter( j.get("company", "Unknown") for j in jobs ) locations = Counter( j.get("location", "Unknown") for j in jobs ) seniority = Counter( j.get("seniority_level", "Unknown") for j in jobs ) return { "keyword": keyword, "total_listings": len(jobs), "top_hiring_companies": dict(companies.most_common(10)), "top_locations": dict(locations.most_common(10)), "seniority_distribution": dict(seniority), "analyzed_at": datetime.utcnow().isoformat(), } report = analyze_job_market("AI engineer", "United States") print(json.dumps(report, indent=2))
3. AI Agent Talent Sourcing
Give an AI agent the ability to search for professionals and generate outreach recommendations โ a core building block for AI-powered recruiting and sales assistants.
# agent_talent.py โ LangChain tool for LinkedIn research from langchain.tools import tool import requests MANTIS_KEY = "YOUR_API_KEY" @tool def search_linkedin_people(query: str) -> str: """Search for professionals on LinkedIn and return their profiles with current roles and companies.""" # Google X-ray search for LinkedIn profiles resp = requests.post( "https://api.mantisapi.com/v1/scrape", headers={ "Authorization": f"Bearer {MANTIS_KEY}", "Content-Type": "application/json", }, json={ "url": ( "https://www.google.com/search?" f"q=site:linkedin.com/in/+{query}&num=5" ), "extract": { "profiles": ( "array of LinkedIn profiles with: " "name, title, company, linkedin_url" ), }, "render_js": True, }, ) profiles = resp.json().get("profiles", []) if not profiles: return f"No LinkedIn profiles found for '{query}'" result = f"LinkedIn profiles matching '{query}':\n\n" for i, p in enumerate(profiles, 1): result += ( f"{i}. {p.get('name', 'N/A')}\n" f" {p.get('title', 'N/A')} at " f"{p.get('company', 'N/A')}\n" f" {p.get('linkedin_url', '')}\n\n" ) return result @tool def search_linkedin_jobs(query: str) -> str: """Search LinkedIn for job openings and return listings with company, location, and posting date.""" resp = requests.post( "https://api.mantisapi.com/v1/scrape", headers={ "Authorization": f"Bearer {MANTIS_KEY}", "Content-Type": "application/json", }, json={ "url": ( "https://www.linkedin.com/jobs/search/" f"?keywords={query}" ), "extract": { "jobs": ( "array of top 5 jobs with: title, " "company, location, posted_date, url" ), }, "render_js": True, }, ) jobs = resp.json().get("jobs", []) if not jobs: return f"No jobs found for '{query}'" result = f"LinkedIn jobs for '{query}':\n\n" for i, j in enumerate(jobs, 1): result += ( f"{i}. {j.get('title', 'N/A')} at " f"{j.get('company', 'N/A')}\n" f" ๐ {j.get('location', 'N/A')} | " f"Posted: {j.get('posted_date', 'N/A')}\n\n" ) return result # Use in a LangChain agent # agent = create_agent( # tools=[search_linkedin_people, search_linkedin_jobs] # )
Legal Considerations
LinkedIn scraping has the most developed legal precedent of any website, thanks to the landmark hiQ Labs case:
- hiQ Labs v. LinkedIn (2022) โ The definitive case. The Ninth Circuit ruled that scraping publicly accessible LinkedIn profiles does NOT violate the CFAA. LinkedIn could not use the CFAA to block hiQ from scraping public data. This ruling was upheld after the Supreme Court remanded it for reconsideration in light of Van Buren. This is the strongest legal protection for scraping public web data in U.S. law.
- Van Buren v. United States (2021) โ The Supreme Court narrowed the CFAA's definition of "exceeds authorized access" to only cover accessing data one is not entitled to access at all โ not violating usage restrictions on data you can access. Strengthens the case for scraping public pages.
- LinkedIn Terms of Service โ Still prohibit scraping. Violating ToS is a contract/civil matter, not criminal. LinkedIn may send cease-and-desist letters or pursue civil litigation.
- Public vs. Authenticated Data โ The hiQ ruling specifically applies to publicly accessible data. Scraping data behind a login wall (requiring authentication) has significantly higher legal risk.
- GDPR/CCPA โ LinkedIn profiles contain personal data. If you process EU/California residents' data, you must comply with data protection regulations. Have a lawful basis for processing (legitimate interest is commonly cited).
- State Laws โ Some states have computer access laws broader than the CFAA. Check your jurisdiction.
Only scrape publicly accessible pages (no login required). Respect rate limits โ don't hammer LinkedIn's servers. Don't store personal data longer than necessary. Comply with GDPR/CCPA if applicable. Never scrape private messages, connection lists, or data behind authentication. Document your legitimate business purpose. Consult legal counsel for commercial use. Consider a web scraping API that handles compliance for you.
Production-Ready LinkedIn Data Extraction
Stop fighting authwalls, IP bans, and broken selectors. Mantis extracts structured LinkedIn data with a single API call.
View Pricing Get Started FreeFrequently Asked Questions
Is it legal to scrape LinkedIn?
The hiQ Labs v. LinkedIn ruling (2022) established that scraping publicly accessible LinkedIn data does not violate the CFAA. However, LinkedIn's Terms of Service still prohibit scraping (a contract matter, not criminal). Scraping data behind login walls carries more risk. Consult legal counsel for commercial use.
How do I scrape LinkedIn without getting blocked?
Use rotating residential proxies, add random delays (5-20 seconds), rotate User-Agents, limit to under 100 profiles per IP per day, avoid scraping while logged in, and handle authwall detection. Or use a web scraping API that handles this automatically.
Can I scrape LinkedIn job listings?
Yes. LinkedIn job listings at linkedin.com/jobs are publicly accessible without login. You can extract job titles, companies, locations, descriptions, and posting dates.
What data can I extract from LinkedIn profiles?
From public profiles: name, headline, current company/title, location, about section, experience history, education, skills, and certifications. Visibility depends on individual privacy settings.
Does LinkedIn have an official API for profile data?
LinkedIn's API only allows access to the authenticated user's own basic profile. There is no API to search for people, browse others' profiles, or access job listings in bulk. The API is designed for marketing and authentication, not data access.
How many LinkedIn profiles can I scrape per day?
Without proxies: 50-80 before blocking. With rotating residential proxies: 500-2,000. With Mantis API: up to 100,000/month on the Scale plan.