From no_snapshots_by_codes_02_12.json — 15,379 total sources with no snapshots, broken down by HTTP status code:
| Status Code | Count | Description |
|---|---|---|
| 404 | 7,380 | Not Found — pages removed/never existed |
| 500 | 5,519 | Internal Server Error / scraping failures |
| 599 | 983 | Custom/timeout error |
| 415 | 386 | Unsupported Media Type |
| 403 | 297 | Forbidden — access blocked |
| 402 | 238 | Payment Required — paywall |
| 503 | 102 | Service Unavailable |
| 412 | 81 | Precondition Failed |
| 410 | 80 | Gone — explicitly removed |
| 429 | 64 | Too Many Requests — rate limited |
| 418 | 63 | I'm a Teapot (bot detection) |
| 413 | 42 | Payload Too Large |
| 401 | 36 | Unauthorized |
| 423 | 22 | Locked |
| 406 | 19 | Not Acceptable |
| 521 | 17 | Cloudflare — Web Server Down |
| 502 | 13 | Bad Gateway |
| 400 | 9 | Bad Request |
| 511 | 5 | Network Authentication Required |
| 530 | 4 | Cloudflare 1xxx Error |
| 525 | 4 | SSL Handshake Failed (Cloudflare) |
| 405 | 3 | Method Not Allowed |
| 409 | 3 | Conflict |
| 444 | 3 | Nginx No Response |
| 498 | 2 | Invalid Token |
| 522 | 2 | Cloudflare — Connection Timed Out |
| 407 | 1 | Proxy Authentication Required |
| 451 | 1 | Unavailable for Legal Reasons |
| 523 | 1 | Cloudflare — Origin Unreachable |
| Category | Count | % | Recoverable? |
|---|---|---|---|
| Snapshot dir not created | 1,539 | 28.0% | Yes — scraping ran but directory issue |
| DNS temporary failure (EAI_AGAIN) | 573 | 10.4% | Yes — transient DNS, retry likely works |
| SSL/TLS errors | 240 | 4.4% | Yes — transient handshake failures |
| Connection reset/refused | 234 | 4.3% | Yes — server temporarily down |
| "client is not defined" | 219 | 4.0% | Yes — internal bug, fixable |
| Timeouts | 88 | 1.6% | Yes — retry with longer timeout |
| Server errors (HTTP 500) | 73 | 1.3% | Yes — server-side, may resolve |
| Unexpected end of file | 62 | 1.1% | Yes — partial response, retry |
| Redirect loops | 41 | 0.7% | Yes — may need config adjustment |
| DNS not found (ENOTFOUND) | 2,207 | 40.2% | No — domains are dead/gone |
| Other (detached frames, host unreachable, etc.) | 218 | 4.0% | Mixed |
| Category | Status Codes | Count |
|---|---|---|
| Recoverable | ||
| Rate limited | 429 | 64 |
| Bad Gateway (transient) | 502 | 13 |
| Service Unavailable (transient) | 503 | 102 |
| Cloudflare transient | 521, 522, 523, 525, 530 | 28 |
| Timeout / custom | 599 | 983 |
| Status 500 (recoverable portion) | 500 | 3,069 |
| Recoverable subtotal | 4,259 | |
| Non-recoverable | ||
| Not Found | 404 | 7,380 |
| Gone | 410 | 80 |
| Legal takedown | 451 | 1 |
| Unauthorized | 401 | 36 |
| Payment required | 402 | 238 |
| Forbidden | 403 | 297 |
| Status 500 (dead domains) | 500 | 2,207 |
| Non-recoverable subtotal | 10,239 | |
| Unclear / config issues | ||
| Bot detection | 412, 418 | 144 |
| Unsupported media type | 415 | 386 |
| Other client errors | 400, 405, 406, 407, 409, 413, 423, 444, 498, 511 | 109 |
| Status 500 uncategorized | 500 | 243 |
| Unclear subtotal | 882 | |
| Classification | Count | % of all 15,379 |
|---|---|---|
| Potentially recoverable | 4,259 | 27.7% |
| Non-recoverable | 10,239 | 66.6% |
| Unclear | 882 | 5.7% |