Wayback Machine Snapshot Program

I created a script to extract websites and their assets from the Wayback Machine.

It requires Python 3.7+ and the following dependencies: aiohttp and aiofiles.

How to Use

  1. Install Python: HERE

    ⚠️ During installation, make sure to check "Add Python to PATH"

  2. Install the required packages by typing in Command Prompt or PowerShell: pip install aiohttp aiofiles
  3. Use the Wayback Machine Snapshot Generator to get your JSON file: HERE This generator uses the Internet Archive’s CDX API to fetch snapshots in JSON format. It allows you to extract archived websites and their assets over a specific time range.

    ⚠️ Tip: To grab an entire website, add /* to the end of the URL. Example: https://www.yahoo.com/*

    Steps:
    • Enter the website URL (include https:// or http://).
    • Optionally set a start date (YYYYMMDD).
    • Optionally set an end date (YYYYMMDD).
    • The script generates a CDX API link.
    • Open the link in your browser and wait for it to load.
    • When the JSON data appears, right-click and choose “Save As” to download it.
  4. Go to the WebArchiveSnapshotProgram generator site: HERE

How to Use WebArchiveSnapshotProgram

Things to Know

Happy searching :)

Click to go back to my website
Click to go back back to my website