Wayback Machine Snapshot Program

I created a script to extract websites and their assets from the Wayback Machine.

It requires Python 3.7+ and the following dependencies: aiohttp and aiofiles.

How to Use

Install Python: HERE
⚠️ During installation, make sure to check "Add Python to PATH"
Install the required packages by typing in Command Prompt or PowerShell: pip install aiohttp aiofiles
Use the Wayback Machine Snapshot Generator to get your JSON file: HERE This generator uses the Internet Archive’s CDX API to fetch snapshots in JSON format. It allows you to extract archived websites and their assets over a specific time range.

⚠️ Tip: To grab an entire website, add /* to the end of the URL. Example: https://www.yahoo.com/*

Steps:
- Enter the website URL (include https:// or http://).
- Optionally set a start date (YYYYMMDD).
- Optionally set an end date (YYYYMMDD).
- The script generates a CDX API link.
- Open the link in your browser and wait for it to load.
- When the JSON data appears, right-click and choose “Save As” to download it.
Go to the WebArchiveSnapshotProgram generator site: HERE

Prepare your JSON file with timestamps and original URLs.
Create a download folder (e.g., C:\download_images).
Generate the Python script by filling in your JSON path and folder path. Download WebArchiveSnapshotProgram.py.
Run the script:
python WebArchiveSnapshotProgram.py
The program downloads images asynchronously, skipping duplicates. You’ll see “Happy searching :)”.

Extracting an entire site may take time; JSON files can be large.
If it struggles to connect and keeps retrying, exit and reopen the script — it will skip duplicates.
Be as specific as possible when entering URLs (e.g., spacejam.com/1996).
It automatically skips duplicate files.

Happy searching :)