mirror of https://github.com/DoingFedTime/ChangeScraper.git synced 2026-06-11 08:14:49 +00:00

Scrapes Change.org petitions. Gets signatures, metadata, and supporter details without touching their API.

change-org data-collection osint python scraper web-scraping

Python 100%

Find a file

Sam Bent 5b577f2ff8 chore: switch license to GPL-3.0		2026-04-25 21:23:22 -04:00
change_scraper.py	Added 4 files.	2025-11-25 19:27:17 -05:00
LICENSE	chore: switch license to GPL-3.0	2026-04-25 21:23:22 -04:00
README.md	Update demo image link in README.md	2025-11-25 20:19:31 -05:00
requirements.txt	Added 4 files.	2025-11-25 19:27:17 -05:00
setup.py	Added 4 files.	2025-11-25 19:27:17 -05:00

README.md

Change.org Petition Scraper

A Python scraper that extracts all data from Change.org petitions, including comments via infinite scroll.

Demo

Click the image to watch the full tutorial

Features

Petition Details: URL, title, author, target, signature count, description, date created
All Comments: Uses Selenium to handle infinite scroll and extract every comment
Auto Browser Detection: Automatically uses Edge (Windows), Chrome, or Chromium
Cross-Platform: Works on Windows, macOS, and Linux
Clean Output: Formatted console display + structured JSON file

Installation

Windows

git clone https://github.com/DoingFedTime/ChangeScraper.git
cd ChangeScraper
pip install -r requirements.txt

Edge is built-in. You're ready to go.

Linux (Ubuntu/Debian)

git clone https://github.com/DoingFedTime/ChangeScraper.git
cd ChangeScraper

# Create virtual environment (required on modern Linux)
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Install Chrome
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt -f install

macOS

git clone https://github.com/DoingFedTime/ChangeScraper.git
cd ChangeScraper
pip3 install -r requirements.txt

Install Chrome from https://google.com/chrome if not already installed.

Usage

python change_scraper.py "https://www.change.org/p/petition-name"

Linux users: Remember to activate your venv first:

source venv/bin/activate
python3 change_scraper.py "https://www.change.org/p/petition-name"

That's it. The scraper will:

Fetch petition details
Open a headless browser
Scroll through all comments (infinite scroll)
Save everything to petition_data.json

Example

python change_scraper.py "https://www.change.org/p/violet-s-crosswalk-violet-s-way"

Output:

============================================================
Change.org Petition Scraper
============================================================
[*] Petition: violet-s-crosswalk-violet-s-way
[*] Fetching petition page...
[*] Title: Violet's Crosswalk (Violet's Way)
[*] Signatures: 57
[*] Fetching ALL comments (infinite scroll)...
    Using Chrome browser
    Scrolling to load all comments... 1 2 3 4 5 done (8 scrolls)
[*] Total comments extracted: 56

======================================================================
PETITION DETAILS
======================================================================

URL:        https://www.change.org/p/violet-s-crosswalk-violet-s-way
Title:      Violet's Crosswalk (Violet's Way)
Author:     Lillian Rzepka
Target:     Decision Maker: Kevin Corcoran
Signatures: 57
Created:    2025-11-19T05:57:35Z
...

Output Format

Console

Formatted, readable output with word-wrapped text.

JSON File (`petition_data.json`)

{
  "url": "https://www.change.org/p/petition-name",
  "title": "Petition Title",
  "signatures": 1234,
  "description": "Full petition description...",
  "author": "Author Name",
  "author_url": "https://www.change.org/u/123456",
  "target": "Decision Maker Name",
  "date_created": "2025-01-01T00:00:00Z",
  "comments": [
    {
      "name": "Commenter Name",
      "comment": "Their comment text...",
      "date": "3 days ago"
    }
  ],
  "total_comments": 56,
  "scraped_at": "2025-11-25T19:00:00.000000"
}

How It Works

Petition Details

Fetches the petition page via HTTP request
Parses HTML using BeautifulSoup
Extracts structured data from data-testid attributes and LD+JSON schema

Comments (Infinite Scroll)

Opens headless Chrome/Edge browser via Selenium
Navigates to the petition's comments page (/p/slug/c)
Scrolls to bottom, waits for content to load
Repeats until no new content appears (3 consecutive unchanged scrolls)
Parses all loaded comments from the final HTML

Requirements

Python 3.7+
Google Chrome, Microsoft Edge, or Chromium browser
Python packages (installed via requirements.txt):
- requests
- beautifulsoup4
- selenium
- webdriver-manager

Troubleshooting

Linux: "externally-managed-environment" error

Use a virtual environment:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Linux: "chromium-browser has no installation candidate"

Install Chrome directly:

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt -f install

"No browser available"

Windows: Edge is built-in, should work automatically
Linux: Install Chrome using the command above
macOS: Install Chrome from https://google.com/chrome

"Selenium not installed"

pip install selenium webdriver-manager

Comments not loading

Check your internet connection
The petition may not have any comments yet
Try running again (sometimes the page needs more time to load)

License

MIT License - Use responsibly and respect Change.org's terms of service.

Disclaimer

This tool is for educational and research purposes. Always respect website terms of service and use responsibly. The authors are not responsible for misuse of this tool.