How to Scrape LinkedIn Profiles & Jobs in 2026 (Python, Node.js, API)

Q: Is it legal to scrape LinkedIn?

The landmark hiQ Labs v. LinkedIn case (2022) ruled that scraping publicly accessible LinkedIn data does not violate the Computer Fraud and Abuse Act (CFAA). The Ninth Circuit found that accessing publicly available data cannot constitute 'unauthorized access.' However, LinkedIn's Terms of Service still prohibit scraping, which is a contract matter. Scraping data behind login walls carries significantly more legal risk. Consult legal counsel for commercial use.

Q: How do I scrape LinkedIn without getting blocked?

LinkedIn has aggressive anti-scraping measures including rate limiting, CAPTCHA challenges, account restrictions, and IP blocking. To avoid blocks: use rotating residential proxies, add random delays (5-20 seconds), rotate User-Agent strings, limit requests to under 100 profiles per day per IP, and avoid scraping while logged in to your personal account. For production, use a web scraping API like Mantis that handles anti-detection automatically.

Q: Can I scrape LinkedIn job listings?

Yes. LinkedIn job listings on linkedin.com/jobs are publicly accessible without login. You can extract job titles, companies, locations, descriptions, and posting dates. Public job search results are the safest LinkedIn data to scrape since they don't require authentication.

Q: What data can I extract from LinkedIn profiles?

From public LinkedIn profiles you can extract: name, headline, current company and title, location, summary/about section, work experience history, education, skills, certifications, and profile photo URL. Some profiles have limited public visibility depending on the user's privacy settings.

Q: Does LinkedIn have an official API?

Yes. LinkedIn offers several APIs: the Marketing API (for ads), the Community Management API (for company pages), and the Sign In with LinkedIn API (for authentication). However, the Profile API is heavily restricted — you can only access the authenticated user's own basic profile. There is no API to search or bulk-access other users' profiles or job listings.

Q: How many LinkedIn profiles can I scrape per day?

Without proxies, LinkedIn typically restricts you after 50-80 public profile views. With rotating residential proxies and careful rate limiting, you can scale to 500-2,000 profiles per day. With a web scraping API like Mantis, you can make up to 100,000 requests per month on the Scale plan without managing proxy infrastructure.

Why Scrape LinkedIn?

LinkedIn is the world's largest professional network with over 1 billion members and 67 million company profiles. That professional data powers some of the most valuable business intelligence in B2B:

Lead generation — Build targeted prospect lists based on title, company, industry, and location
Recruiting & talent sourcing — Find candidates matching specific skills, experience levels, and locations
Job market analysis — Track hiring trends, salary ranges, skill demand, and emerging roles
Competitive intelligence — Monitor competitor headcount, hiring patterns, and organizational changes
Sales intelligence — Enrich CRM records with current titles, companies, and professional details
AI agent research tools — Give AI assistants the ability to research people, companies, and job markets
Market research — Analyze industry trends, company growth trajectories, and workforce composition

Whether you're building a recruiting tool, a sales prospecting engine, or an AI research agent, LinkedIn data is foundational for any B2B data strategy.

What Data Can You Extract?

LinkedIn has three main data categories, each with different accessibility levels:

Public Profiles

Data Point	Availability	Notes
Full Name	Usually public	Some profiles show first name + last initial only
Headline	Public	Current role/tagline
Current Company & Title	Public	Most recent experience entry
Location	Public	Metro area (e.g., "San Francisco Bay Area")
About/Summary	Usually public	May be truncated without login
Experience History	Usually public	Companies, titles, dates, descriptions
Education	Usually public	Schools, degrees, dates
Skills	Partially public	Top skills visible; full list may require login
Certifications	Usually public	Issuing org, credential ID
Profile Photo URL	Varies	Some users restrict to connections only

Job Listings

Data Point	Availability	Notes
Job Title	Public	Accessible via linkedin.com/jobs
Company Name	Public	Links to company page
Location	Public	Includes remote/hybrid/on-site labels
Job Description	Public	Full text on job detail page
Posting Date	Public	Relative ("2 days ago") or absolute
Seniority Level	Public	Entry, Mid, Senior, Director, etc.
Employment Type	Public	Full-time, Part-time, Contract
Applicant Count	Sometimes	"Over 100 applicants" shown on some listings
Salary Range	Sometimes	Shown when employer provides it

Company Pages

Data Point	Availability	Notes
Company Name	Public	Official name
Industry	Public	Self-reported category
Employee Count	Public	Range (e.g., "501-1,000")
Headquarters	Public	City, state, country
Description	Public	About section
Website	Public	Company URL
Founded Year	Public	When available
Specialties	Public	Self-reported tags

Method 1: Python + BeautifulSoup

Best for scraping public LinkedIn pages that don't require login — particularly job listings and public profiles. LinkedIn serves some content without JavaScript, making simple HTTP requests viable for basic data extraction.

Install Dependencies

pip install requests beautifulsoup4 lxml

Public Job Listings Scraper

# linkedin_jobs.py
import requests
from bs4 import BeautifulSoup
import json
import time
import random

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/125.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept": "text/html,application/xhtml+xml",
}

def scrape_linkedin_jobs(keyword: str, location: str = "",
                         pages: int = 3) -> list:
    """Scrape LinkedIn public job listings."""
    jobs = []

    for page in range(pages):
        start = page * 25
        url = (
            f"https://www.linkedin.com/jobs/search/"
            f"?keywords={keyword.replace(' ', '%20')}"
            f"&location={location.replace(' ', '%20')}"
            f"&start={start}"
        )
        resp = requests.get(url, headers=HEADERS, timeout=15)
        resp.raise_for_status()
        soup = BeautifulSoup(resp.text, "lxml")

        for card in soup.select(".base-card"):
            title_el = card.select_one(".base-search-card__title")
            company_el = card.select_one(
                ".base-search-card__subtitle a"
            )
            location_el = card.select_one(
                ".job-search-card__location"
            )
            link_el = card.select_one("a.base-card__full-link")
            date_el = card.select_one("time")

            jobs.append({
                "title": (
                    title_el.text.strip() if title_el else None
                ),
                "company": (
                    company_el.text.strip() if company_el else None
                ),
                "location": (
                    location_el.text.strip()
                    if location_el else None
                ),
                "url": link_el["href"] if link_el else None,
                "posted": (
                    date_el.get("datetime") if date_el else None
                ),
            })

        time.sleep(random.uniform(3, 8))

    return jobs

# Example: search for Python developer jobs
results = scrape_linkedin_jobs(
    "python developer", "San Francisco", pages=2
)
print(f"Found {len(results)} jobs")
print(json.dumps(results[:3], indent=2))

Public Profile Scraper

# linkedin_profile.py
def scrape_public_profile(profile_url: str) -> dict:
    """Scrape a public LinkedIn profile page."""
    resp = requests.get(profile_url, headers=HEADERS, timeout=15)
    resp.raise_for_status()
    soup = BeautifulSoup(resp.text, "lxml")

    def text(selector):
        el = soup.select_one(selector)
        return el.get_text(strip=True) if el else None

    # Extract experience entries
    experiences = []
    for exp in soup.select(".experience-item"):
        title = exp.select_one(".experience-item__title")
        company = exp.select_one(".experience-item__subtitle")
        duration = exp.select_one(".experience-item__duration")
        experiences.append({
            "title": title.text.strip() if title else None,
            "company": company.text.strip() if company else None,
            "duration": duration.text.strip() if duration else None,
        })

    # Extract education
    education = []
    for edu in soup.select(".education__list-item"):
        school = edu.select_one(".education__item--school")
        degree = edu.select_one(".education__item--degree")
        education.append({
            "school": school.text.strip() if school else None,
            "degree": degree.text.strip() if degree else None,
        })

    return {
        "name": text(".top-card-layout__title"),
        "headline": text(".top-card-layout__headline"),
        "location": text(".top-card-layout__first-subline"),
        "about": text(".summary .core-section-container__content"),
        "experience": experiences,
        "education": education,
        "url": profile_url,
    }

profile = scrape_public_profile(
    "https://www.linkedin.com/in/example-profile"
)
print(json.dumps(profile, indent=2))

⚠️ Important: Public vs. Authenticated

The examples above only scrape publicly accessible LinkedIn pages — no login required. LinkedIn shows significantly more data to logged-in users, but scraping while authenticated carries much higher legal risk and violates LinkedIn's Terms of Service more clearly. For production use, stick to public data or use an API.

Method 2: Playwright (Headless Browser)

LinkedIn relies heavily on JavaScript for rendering profile sections, job details, and infinite scroll. Playwright renders the full page, giving you access to dynamically loaded content.

Install

pip install playwright
playwright install chromium

Job Detail Scraper

# playwright_linkedin.py
import asyncio
from playwright.async_api import async_playwright
import json

async def scrape_job_detail(job_url: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/125.0.0.0 Safari/537.36"
            ),
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )

        page = await context.new_page()

        # Block images and tracking for speed
        await page.route("**/*.{png,jpg,jpeg,gif,svg}", 
                         lambda route: route.abort())
        await page.route("**/tracking/**",
                         lambda route: route.abort())

        await page.goto(job_url, wait_until="domcontentloaded")
        await page.wait_for_timeout(3000)

        # Click "Show more" to expand description
        try:
            show_more = page.locator(
                "button.show-more-less-html__button"
            )
            if await show_more.count() > 0:
                await show_more.first.click()
                await page.wait_for_timeout(500)
        except Exception:
            pass

        job = await page.evaluate("""() => {
            const text = (sel) => {
                const el = document.querySelector(sel);
                return el ? el.textContent.trim() : null;
            };
            const criteria = {};
            document.querySelectorAll(
                '.description__job-criteria-item'
            ).forEach(item => {
                const label = item.querySelector(
                    '.description__job-criteria-subheader'
                );
                const value = item.querySelector(
                    '.description__job-criteria-text'
                );
                if (label && value) {
                    criteria[label.textContent.trim()
                        .toLowerCase().replace(/ /g, '_')]
                        = value.textContent.trim();
                }
            });
            return {
                title: text('.top-card-layout__title'),
                company: text('.topcard__org-name-link')
                    || text('.top-card-layout__second-subline a'),
                location: text('.topcard__flavor--bullet'),
                description: text(
                    '.show-more-less-html__markup'
                ),
                posted: text('.posted-time-ago__text'),
                applicants: text('.num-applicants__caption'),
                criteria: criteria,
            };
        }""")

        await browser.close()
        return job

# Scrape a specific job listing
job = asyncio.run(scrape_job_detail(
    "https://www.linkedin.com/jobs/view/1234567890"
))
print(json.dumps(job, indent=2))

Profile Scraper with Infinite Scroll

# playwright_profile.py
async def scrape_profile_full(profile_url: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36"
            ),
            viewport={"width": 1920, "height": 1080},
        )
        page = await context.new_page()

        await page.goto(profile_url, wait_until="domcontentloaded")
        await page.wait_for_timeout(2000)

        # Scroll to load lazy sections
        for _ in range(5):
            await page.evaluate(
                "window.scrollBy(0, window.innerHeight)"
            )
            await page.wait_for_timeout(800)

        profile = await page.evaluate("""() => {
            const text = (sel) => {
                const el = document.querySelector(sel);
                return el ? el.textContent.trim() : null;
            };
            const allText = (sel) => {
                return [...document.querySelectorAll(sel)]
                    .map(el => el.textContent.trim())
                    .filter(Boolean);
            };
            return {
                name: text('.top-card-layout__title'),
                headline: text('.top-card-layout__headline'),
                location: text('.top-card-layout__first-subline'),
                about: text(
                    '.summary .core-section-container__content p'
                ),
                experience: allText(
                    '.experience-item .experience-item__title'
                ),
                education: allText(
                    '.education__item--school-name'
                ),
                skills: allText('.skill-categories-card li'),
            };
        }""")

        await browser.close()
        return profile

💡 Pro Tip: Avoid Login Walls

LinkedIn shows an "authwall" modal prompting login after a few page views. Adding ?trk=public_profile to profile URLs and using clean browser contexts (no cookies) can help. Rotating IP addresses between requests also resets LinkedIn's session tracking.

Method 3: Node.js + Cheerio

Lightweight and fast — ideal for scraping LinkedIn job listings from a Node.js backend or serverless function.

Install

npm install cheerio node-fetch

Job Search Scraper

// linkedin-jobs.mjs
import fetch from "node-fetch";
import * as cheerio from "cheerio";

const HEADERS = {
  "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " +
    "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
  "Accept-Language": "en-US,en;q=0.9",
  Accept: "text/html",
};

async function scrapeJobs(keyword, location = "", pages = 3) {
  const jobs = [];

  for (let page = 0; page < pages; page++) {
    const start = page * 25;
    const url =
      `https://www.linkedin.com/jobs/search/` +
      `?keywords=${encodeURIComponent(keyword)}` +
      `&location=${encodeURIComponent(location)}` +
      `&start=${start}`;

    const resp = await fetch(url, { headers: HEADERS });
    const html = await resp.text();
    const $ = cheerio.load(html);

    $(".base-card").each((_, card) => {
      const $card = $(card);
      jobs.push({
        title: $card.find(".base-search-card__title")
          .text().trim() || null,
        company: $card.find(".base-search-card__subtitle a")
          .text().trim() || null,
        location: $card.find(".job-search-card__location")
          .text().trim() || null,
        url: $card.find("a.base-card__full-link")
          .attr("href") || null,
        posted: $card.find("time").attr("datetime") || null,
      });
    });

    // Polite delay
    await new Promise((r) =>
      setTimeout(r, 3000 + Math.random() * 5000)
    );
  }

  return jobs;
}

// Usage
const jobs = await scrapeJobs("machine learning engineer", "New York");
console.log(`Found ${jobs.length} jobs`);
console.log(JSON.stringify(jobs.slice(0, 3), null, 2));

Company Page Scraper

// linkedin-company.mjs
async function scrapeCompany(companySlug) {
  const url =
    `https://www.linkedin.com/company/${companySlug}/about/`;
  const resp = await fetch(url, { headers: HEADERS });
  const html = await resp.text();
  const $ = cheerio.load(html);

  const text = (sel) => $(sel).first().text().trim() || null;

  // Extract detail items
  const details = {};
  $(".core-section-container__content dt").each((i, el) => {
    const key = $(el).text().trim().toLowerCase()
      .replace(/ /g, "_");
    const value = $(el).next("dd").text().trim();
    if (key && value) details[key] = value;
  });

  return {
    name: text(".top-card-layout__title"),
    tagline: text(".top-card-layout__headline"),
    description: text(
      ".core-section-container__content p"
    ),
    ...details,
    url: `https://www.linkedin.com/company/${companySlug}`,
  };
}

const company = await scrapeCompany("google");
console.log(JSON.stringify(company, null, 2));

Method 4: Web Scraping API (Easiest)

The most reliable approach for production LinkedIn scraping. A web scraping API handles proxies, anti-detection, JavaScript rendering, and login wall bypassing — you just send a URL and get structured data back.

Scraping Profiles with Mantis

# One API call — structured LinkedIn profile data
import requests

resp = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://www.linkedin.com/in/example-profile",
        "extract": {
            "name": "person's full name",
            "headline": "professional headline",
            "location": "geographic location",
            "about": "about/summary section",
            "current_company": "current employer name",
            "current_title": "current job title",
            "experience": "work experience (array of: company, title, dates, description)",
            "education": "education history (array of: school, degree, dates)",
            "skills": "listed skills (array)",
        },
        "render_js": True,
    },
)

profile = resp.json()
print(profile)

Scraping Job Listings with Mantis

# Bulk job search
resp = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": (
            "https://www.linkedin.com/jobs/search/"
            "?keywords=AI+engineer&location=Remote"
        ),
        "extract": {
            "jobs": (
                "array of job listings with: "
                "title, company, location, url, posted_date"
            ),
        },
        "render_js": True,
    },
)

jobs = resp.json().get("jobs", [])
for job in jobs[:5]:
    print(f"{job['title']} at {job['company']} — {job['location']}")

Skip the Login Walls and IP Bans

Mantis handles LinkedIn's anti-bot detection, proxy rotation, authwall bypassing, and JavaScript rendering — so you don't have to.

View Pricing Get Started Free

Node.js with Mantis

// mantis-linkedin.mjs
const resp = await fetch("https://api.mantisapi.com/v1/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://www.linkedin.com/in/example-profile",
    extract: {
      name: "full name",
      headline: "professional headline",
      current_role: "current job title and company",
      experience: "work history (array)",
      skills: "professional skills (array)",
    },
    render_js: true,
  }),
});

const profile = await resp.json();
console.log(profile);

Beating LinkedIn's Anti-Bot Detection

LinkedIn invests heavily in detecting and blocking scrapers. Here's what you're up against:

LinkedIn's Defense Layers

Defense	What It Does	Countermeasure
Authwall	Prompts login after a few page views	Clean sessions, ?trk= params, IP rotation
IP Rate Limiting	Blocks IPs with too many requests	Rotating residential proxies
CAPTCHA Challenges	Serves CAPTCHA on suspicious patterns	CAPTCHA solving services or API
Account Restrictions	Limits or suspends accounts scraping while logged in	Don't use personal accounts; use public pages
Browser Fingerprinting	Detects headless browsers via JS	Stealth plugins, real browser profiles
Session Tracking	Correlates requests across pageviews	Fresh sessions per request batch
Honeypot Links	Invisible links that only bots follow	Only follow visible, user-facing links

Essential Anti-Detection Techniques

# linkedin_stealth.py
import random
import time
import requests

PROXY_POOL = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
    "Chrome/125.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
    "Chrome/125.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
    "Chrome/124.0.0.0 Safari/537.36",
]

def create_linkedin_session():
    """Create a session configured for LinkedIn scraping."""
    session = requests.Session()
    proxy = random.choice(PROXY_POOL)
    session.proxies = {"http": proxy, "https": proxy}
    session.headers.update({
        "User-Agent": random.choice(USER_AGENTS),
        "Accept-Language": "en-US,en;q=0.9",
        "Accept": "text/html,application/xhtml+xml",
        "Referer": "https://www.google.com/",
        "DNT": "1",
    })
    return session

def is_authwall(html: str) -> bool:
    """Detect LinkedIn's login prompt."""
    return any(marker in html.lower() for marker in [
        "authwall", "sign in", "join now",
        "login-form", "session_key",
    ])

def scrape_with_retry(url: str, max_retries: int = 3) -> str:
    """Scrape with authwall detection and retry."""
    for attempt in range(max_retries):
        session = create_linkedin_session()
        try:
            resp = session.get(url, timeout=15)
            if resp.status_code == 429:
                wait = 30 * (attempt + 1)
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
                continue
            if is_authwall(resp.text):
                print("Authwall detected — rotating proxy")
                continue
            return resp.text
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
        time.sleep(random.uniform(5, 15))
    raise Exception(f"Failed after {max_retries} retries")

def polite_delay():
    """Longer delays for LinkedIn — they're aggressive."""
    time.sleep(random.uniform(5, 20))

💡 Golden Rule for LinkedIn Scraping

Stick to public pages. LinkedIn's public job listings (linkedin.com/jobs/) and public profiles (linkedin.com/in/) are the safest targets. Scraping while logged in dramatically increases the risk of account suspension and legal action.

LinkedIn Official API vs Scraping

LinkedIn offers several APIs, but none provide the data most scrapers need:

Feature	LinkedIn API	Web Scraping	Mantis API
Profile Data	Own profile only (authenticated user)	Any public profile	Any public profile
Job Listings	Not available via API	Full search + detail pages	Full search + detail pages
Company Data	Managed pages only	Any public company page	Any public company page
People Search	Not available	Via Google dorks or site search	Via URL-based extraction
Rate Limits	Varies (100-500 calls/day typical)	Depends on proxy pool	Based on plan (up to 100K/mo)
Setup	App review required (weeks-months)	Immediate (need proxies)	Immediate (API key)
Approval	Requires LinkedIn partnership for most endpoints	None	None
Cost	Free (very limited) or enterprise pricing	$50-500+/mo (proxies)	$0-299/mo
Reliability	High (when approved)	Medium (authwalls, blocks)	High (maintained infrastructure)
Legal Risk	None (authorized)	Low for public data (hiQ ruling)	API handles compliance

💡 The LinkedIn API Gap

LinkedIn's API is designed for marketing and authentication, not data access. You can post on company pages and sign users in, but you cannot search for people, browse job listings, or access other users' profiles. This is why scraping (especially public pages) remains the primary method for LinkedIn data collection.

Method Comparison

Criteria	Python + BS4	Playwright	Node.js + Cheerio	Mantis API
Setup Time	5 min	10 min	5 min	2 min
JS Rendering	❌	✅	❌	✅
Authwall Handling	Manual detection	Manual workaround	Manual detection	Built-in
Anti-Detection	Basic	Good (with stealth)	Basic	Built-in
Speed	Fast	Slow	Fast	Medium
Maintenance	High	High	High	None
Scale	Low	Low	Medium	High
Cost (5K profiles/mo)	$100-300 (proxies)	$200-500 (proxies + compute)	$100-300 (proxies)	$99 (Pro plan)
Best For	Job listings	Full profiles	Serverless jobs	Production

Real-World Use Cases

1. Recruiter Lead Generation Tool

Build a pipeline that finds candidates matching specific criteria — title, skills, location — and enriches them with LinkedIn profile data for outreach.

# recruiter_tool.py
import requests
import json

MANTIS_KEY = "YOUR_API_KEY"

def find_candidates(
    title: str, location: str, skills: list
) -> list:
    """Find candidates matching criteria via LinkedIn."""
    # Use Google to find LinkedIn profiles
    query = (
        f'site:linkedin.com/in/ "{title}" "{location}" '
        + " ".join(f'"{s}"' for s in skills[:3])
    )
    search_resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.google.com/search?q={query}&num=10",
            "extract": {
                "results": (
                    "array of search results with: "
                    "title, url, snippet"
                ),
            },
            "render_js": True,
        },
    )
    results = search_resp.json().get("results", [])

    # Enrich each profile
    candidates = []
    for r in results:
        if "linkedin.com/in/" not in r.get("url", ""):
            continue
        profile_resp = requests.post(
            "https://api.mantisapi.com/v1/scrape",
            headers={
                "Authorization": f"Bearer {MANTIS_KEY}",
                "Content-Type": "application/json",
            },
            json={
                "url": r["url"],
                "extract": {
                    "name": "full name",
                    "headline": "professional headline",
                    "location": "location",
                    "current_company": "current employer",
                    "current_title": "current job title",
                    "experience_years": "total years of experience (estimate)",
                    "skills": "key skills (array, top 10)",
                },
                "render_js": True,
            },
        )
        candidate = profile_resp.json()
        candidate["linkedin_url"] = r["url"]
        candidates.append(candidate)

    return candidates

candidates = find_candidates(
    title="Senior ML Engineer",
    location="San Francisco",
    skills=["PyTorch", "LLM", "MLOps"],
)
print(f"Found {len(candidates)} candidates")
for c in candidates:
    print(f"  {c.get('name')} — {c.get('headline')}")
    print(f"  {c.get('linkedin_url')}\n")

2. Job Market Analyzer

Track hiring trends, salary ranges, and skill demand across industries — useful for career platforms, salary benchmarking tools, and workforce analytics.

# job_analyzer.py
import requests
import json
from collections import Counter
from datetime import datetime

MANTIS_KEY = "YOUR_API_KEY"

def analyze_job_market(
    keyword: str, location: str = ""
) -> dict:
    """Analyze job listings for trends."""
    # Scrape job listings
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": (
                f"https://www.linkedin.com/jobs/search/"
                f"?keywords={keyword}&location={location}"
            ),
            "extract": {
                "jobs": (
                    "array of all visible jobs with: title, "
                    "company, location, seniority_level, "
                    "employment_type, posted_date"
                ),
            },
            "render_js": True,
        },
    )
    jobs = resp.json().get("jobs", [])

    # Analyze trends
    companies = Counter(
        j.get("company", "Unknown") for j in jobs
    )
    locations = Counter(
        j.get("location", "Unknown") for j in jobs
    )
    seniority = Counter(
        j.get("seniority_level", "Unknown") for j in jobs
    )

    return {
        "keyword": keyword,
        "total_listings": len(jobs),
        "top_hiring_companies": dict(companies.most_common(10)),
        "top_locations": dict(locations.most_common(10)),
        "seniority_distribution": dict(seniority),
        "analyzed_at": datetime.utcnow().isoformat(),
    }

report = analyze_job_market("AI engineer", "United States")
print(json.dumps(report, indent=2))

3. AI Agent Talent Sourcing

Give an AI agent the ability to search for professionals and generate outreach recommendations — a core building block for AI-powered recruiting and sales assistants.

# agent_talent.py — LangChain tool for LinkedIn research
from langchain.tools import tool
import requests

MANTIS_KEY = "YOUR_API_KEY"

@tool
def search_linkedin_people(query: str) -> str:
    """Search for professionals on LinkedIn and return
    their profiles with current roles and companies."""
    # Google X-ray search for LinkedIn profiles
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": (
                "https://www.google.com/search?"
                f"q=site:linkedin.com/in/+{query}&num=5"
            ),
            "extract": {
                "profiles": (
                    "array of LinkedIn profiles with: "
                    "name, title, company, linkedin_url"
                ),
            },
            "render_js": True,
        },
    )
    profiles = resp.json().get("profiles", [])

    if not profiles:
        return f"No LinkedIn profiles found for '{query}'"

    result = f"LinkedIn profiles matching '{query}':\n\n"
    for i, p in enumerate(profiles, 1):
        result += (
            f"{i}. {p.get('name', 'N/A')}\n"
            f"   {p.get('title', 'N/A')} at "
            f"{p.get('company', 'N/A')}\n"
            f"   {p.get('linkedin_url', '')}\n\n"
        )
    return result

@tool
def search_linkedin_jobs(query: str) -> str:
    """Search LinkedIn for job openings and return
    listings with company, location, and posting date."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": (
                "https://www.linkedin.com/jobs/search/"
                f"?keywords={query}"
            ),
            "extract": {
                "jobs": (
                    "array of top 5 jobs with: title, "
                    "company, location, posted_date, url"
                ),
            },
            "render_js": True,
        },
    )
    jobs = resp.json().get("jobs", [])

    if not jobs:
        return f"No jobs found for '{query}'"

    result = f"LinkedIn jobs for '{query}':\n\n"
    for i, j in enumerate(jobs, 1):
        result += (
            f"{i}. {j.get('title', 'N/A')} at "
            f"{j.get('company', 'N/A')}\n"
            f"   📍 {j.get('location', 'N/A')} | "
            f"Posted: {j.get('posted_date', 'N/A')}\n\n"
        )
    return result

# Use in a LangChain agent
# agent = create_agent(
#     tools=[search_linkedin_people, search_linkedin_jobs]
# )

Legal Considerations

LinkedIn scraping has the most developed legal precedent of any website, thanks to the landmark hiQ Labs case:

hiQ Labs v. LinkedIn (2022) — The definitive case. The Ninth Circuit ruled that scraping publicly accessible LinkedIn profiles does NOT violate the CFAA. LinkedIn could not use the CFAA to block hiQ from scraping public data. This ruling was upheld after the Supreme Court remanded it for reconsideration in light of Van Buren. This is the strongest legal protection for scraping public web data in U.S. law.
Van Buren v. United States (2021) — The Supreme Court narrowed the CFAA's definition of "exceeds authorized access" to only cover accessing data one is not entitled to access at all — not violating usage restrictions on data you can access. Strengthens the case for scraping public pages.
LinkedIn Terms of Service — Still prohibit scraping. Violating ToS is a contract/civil matter, not criminal. LinkedIn may send cease-and-desist letters or pursue civil litigation.
Public vs. Authenticated Data — The hiQ ruling specifically applies to publicly accessible data. Scraping data behind a login wall (requiring authentication) has significantly higher legal risk.
GDPR/CCPA — LinkedIn profiles contain personal data. If you process EU/California residents' data, you must comply with data protection regulations. Have a lawful basis for processing (legitimate interest is commonly cited).
State Laws — Some states have computer access laws broader than the CFAA. Check your jurisdiction.

⚖️ Best Practices for LinkedIn Scraping

Only scrape publicly accessible pages (no login required). Respect rate limits — don't hammer LinkedIn's servers. Don't store personal data longer than necessary. Comply with GDPR/CCPA if applicable. Never scrape private messages, connection lists, or data behind authentication. Document your legitimate business purpose. Consult legal counsel for commercial use. Consider a web scraping API that handles compliance for you.

Production-Ready LinkedIn Data Extraction

Stop fighting authwalls, IP bans, and broken selectors. Mantis extracts structured LinkedIn data with a single API call.

View Pricing Get Started Free

Frequently Asked Questions

Is it legal to scrape LinkedIn?

The hiQ Labs v. LinkedIn ruling (2022) established that scraping publicly accessible LinkedIn data does not violate the CFAA. However, LinkedIn's Terms of Service still prohibit scraping (a contract matter, not criminal). Scraping data behind login walls carries more risk. Consult legal counsel for commercial use.

How do I scrape LinkedIn without getting blocked?

Use rotating residential proxies, add random delays (5-20 seconds), rotate User-Agents, limit to under 100 profiles per IP per day, avoid scraping while logged in, and handle authwall detection. Or use a web scraping API that handles this automatically.

Can I scrape LinkedIn job listings?

Yes. LinkedIn job listings at linkedin.com/jobs are publicly accessible without login. You can extract job titles, companies, locations, descriptions, and posting dates.

What data can I extract from LinkedIn profiles?

From public profiles: name, headline, current company/title, location, about section, experience history, education, skills, and certifications. Visibility depends on individual privacy settings.

Does LinkedIn have an official API for profile data?

LinkedIn's API only allows access to the authenticated user's own basic profile. There is no API to search for people, browse others' profiles, or access job listings in bulk. The API is designed for marketing and authentication, not data access.

How many LinkedIn profiles can I scrape per day?

Without proxies: 50-80 before blocking. With rotating residential proxies: 500-2,000. With Mantis API: up to 100,000/month on the Scale plan.

How to Scrape LinkedIn Profiles & Jobs in 2026

📑 Table of Contents

Why Scrape LinkedIn?

What Data Can You Extract?

Public Profiles

Job Listings

Company Pages

Method 1: Python + BeautifulSoup

Install Dependencies

Public Job Listings Scraper

Public Profile Scraper

Method 2: Playwright (Headless Browser)

Install

Job Detail Scraper

Profile Scraper with Infinite Scroll

Method 3: Node.js + Cheerio

Install

Job Search Scraper

Company Page Scraper

Method 4: Web Scraping API (Easiest)

Scraping Profiles with Mantis

Scraping Job Listings with Mantis

Skip the Login Walls and IP Bans

Node.js with Mantis

Beating LinkedIn's Anti-Bot Detection

LinkedIn's Defense Layers

Essential Anti-Detection Techniques

LinkedIn Official API vs Scraping

Method Comparison

Real-World Use Cases

1. Recruiter Lead Generation Tool

2. Job Market Analyzer

3. AI Agent Talent Sourcing

Legal Considerations

Production-Ready LinkedIn Data Extraction

Frequently Asked Questions

Is it legal to scrape LinkedIn?

How do I scrape LinkedIn without getting blocked?

Can I scrape LinkedIn job listings?

What data can I extract from LinkedIn profiles?

Does LinkedIn have an official API for profile data?

How many LinkedIn profiles can I scrape per day?

Related Guides