cybrkyd

wget spider to find broken links

 Tue, 24 Jun 2025 09:04 UTC
wget spider to find broken links
Image: CC BY 4.0 by cybrkyd

There are many online services which offer to probe a website for broken links. I prefer using wget to do the legwork for me. It is slow but it gets the job done for my audits, reporting on both internal and external broken links.

wget --spider -r -nd -nv -w 2 -o run1.log https://example.org

The command uses wget to recursively scan a website (https://example.org) in a non-intrusive way (spider mode).

Find broken links in the log file

An example output of the log:

2025-06-24 10:00:00 URL: https://example.org/good-page 200 OK
2025-06-24 10:00:02 URL: https://example.org/broken-page 404 Not Found
2025-06-24 10:00:04 URL: https://example.org/missing.jpg [following]
2025-06-24 10:00:06 URL: https://example.org/missing.jpg 404 Not Found
2025-06-24 10:00:08 URL: https://example.org/forbidden-page: 403 Forbidden
2025-06-24 10:00:10 URL: https://broken.com: Failed: Name or service not known
2025-06-24 10:00:12 URL: https://example.org/redirect-loop: Too many redirects

All the above are types of broken links.

Instead of trawling through the log file line-by-line, use grep to filter for errors.

grep -E '404|Failed|error' run1.log

Or for a cleaner list of just broken URLs:

grep -B1 '404 Not Found' run1.log | grep 'https://'
»
Tagged in:

Visitors: Loading...