There are many online services which offer to probe a website for broken links. I prefer using wget
to do the legwork for me. It is slow but it gets the job done for my audits, reporting on both internal and external broken links.
wget --spider -r -nd -nv -w 2 -o run1.log https://example.org
The command uses wget
to recursively scan a website (https://example.org
) in a non-intrusive way (spider mode).
--spider
: Makes wget act like a web crawler (checks links without downloading)-r
: Recursive download (follows links)-nd
: Do not create a hierarchy of directories-nv
: Non-verbose (quiet mode)-w 2
: Wait 2 seconds between requests (so as to not flood the website)-o run1.log
: Save output to run1.log instead of stdout
Find broken links in the log file
An example output of the log:
2025-06-24 10:00:00 URL: https://example.org/good-page 200 OK
2025-06-24 10:00:02 URL: https://example.org/broken-page 404 Not Found
2025-06-24 10:00:04 URL: https://example.org/missing.jpg [following]
2025-06-24 10:00:06 URL: https://example.org/missing.jpg 404 Not Found
2025-06-24 10:00:08 URL: https://example.org/forbidden-page: 403 Forbidden
2025-06-24 10:00:10 URL: https://broken.com: Failed: Name or service not known
2025-06-24 10:00:12 URL: https://example.org/redirect-loop: Too many redirects
- The URLs with the
404 Not Found
error are missing. - The URLs with the
403 Forbidden
error means the requested resource is forbidden. - The URLs with the
Failed: Name or service not known
error indicates failed DNS resolution or connection. - The URLs with the
Too many redirects
error indicates redirect loops (HTTP 301/302 issues).
All the above are types of broken links.
Instead of trawling through the log file line-by-line, use grep
to filter for errors.
grep -E '404|Failed|error' run1.log
Or for a cleaner list of just broken URLs:
grep -B1 '404 Not Found' run1.log | grep 'https://'
»
Visitors: Loading...