A CLI Website Crawler for Accessibility Checks

I just finished a little side project: website-crawler-tool. It’s a Node.js CLI tool that crawls a website and runs some basic accessibility checks on all internal pages. I built it because I needed something simple and fast for my own projects, and I thought maybe others could use it too.

What does it do?

  • It crawls your site recursively, starting from a base URL.
  • You can also give it a sitemap if you want to define the scope.
  • It checks for two things right now:
    • Skipped heading levels (like jumping from h1 to h3)
    • Images missing alt text
  • It spits out CSV reports for each check, plus a list of crawl errors.

Why another crawler?

Honestly, I just wanted something that’s easy to run, doesn’t need a browser, and gives me CSVs I can actually use. Also, you can control how many requests it makes at once, so you don’t accidentally kill your own server.

How to use it

  1. git clone git@github.com:gaambo/website-crawler-tool.git
  2. cd website-crawler-tool
  3. npm install

Clone the repo, run npm install, and then:

npm start -- --url https://example.com

There are a bunch of options, like setting concurrency, using a sitemap, or picking which checks to run. Check the README for all the flags.

A quick warning

Don’t go wild with the concurrency setting. Start low, or you might get blocked by the server. Only use this on sites you own or have permission to test.

That’s it

If you want to try it out or contribute, the code’s here: https://github.com/gaambo/website-crawler-tool. Let me know if you find it useful or have ideas for more checks!

In: