Spoiler alert: Firefox does have a button labeled “Siterip.”
Even if your tool ignores it, you shouldn’t. Firefox extensions like “Ignore Robots?” exist, but using them to bypass a site’s crawl directives is bad form. The file is there for a reason: server load, paywall segmentation, or privacy.
Find the site’s sitemap ( /sitemap.xml ) or use an SEO tool like “Screaming Frog” (free for up to 500 URLs) to crawl just the URL list—not the content. firefoxs siterip
| If you need… | Use… | Not Firefox | |--------------|-------|--------------| | Recursive crawl (follow every link) | wget --mirror , httrack | ❌ | | Respecting robots.txt and crawl delays | wget with --wait | ❌ (unless scripted) | | Save 10,000+ pages efficiently | zimit , archivebox , heritrix | ❌ | | Save one complex, JS-heavy page exactly as seen | | ✅ | | Download all images from a gallery page | Firefox + DownThemAll! | ✅ | | Archive pages behind a login (your own account) | Firefox + SingleFile (logged in) | ✅ |
Before you install a single extension, know what Firefox already gives you. Spoiler alert: Firefox does have a button labeled “Siterip
Firefox’s cache stores every asset it downloads. With extensions like “CacheViewer,” you can browse and export cached files. This is a post-hoc siterip—you visit pages, then pull them from cache. Not efficient for large sites, but zero extra requests.
SingleFile has a “Auto-save” mode. Enable it, set a 2-second delay after page load. Then open all 100 tabs. Firefox will churn through them, saving each page to your Downloads folder. Find the site’s sitemap ( /sitemap
Firefox, left to its own devices, will open dozens of parallel connections. For a siterip, that looks like a DDoS. Use extensions or scripts that add delays (500ms–1s between requests). Your target site’s sysadmin will thank you.