HTML Proof Your Site in a CI Build Pipeline

4 minute read

Following up on my blog post Removing Exif Data from Images in Your Website With Rake and CI Build Pipeline, I have a another nice feature that I added to this website’s CI build (for me Rake + Netlify) that I would like to share. Basically what I wanted to have is automatic detection links that I have in blog posts that are dead and needs to be updated. I soon found the tool HTMLProofer which does exactly this and more like general sanity check of the HTML.

I wanted to integrate this to my build flow so that a deployment will be stopped if there are an broken links in a new or old blog post. Daniel Sieger wrote a nice post on how to do this in Jekyll that built my own integration with. It’s actually really simple. Start by adding the html-proofer Gem to the project:

$ gem 'html-proofer'

Then add a new task :htmlproof in Rakefile:

desc "Validates HTML files wit htmlproofer..."
task :htmlproof => :build do
  puts "Checking HTML with htmlproofer...".bold
  require 'html-proofer'
  options = { :assume_extension => true,
              :allow_hash_href => true,
              :alt_ignore => [%r{/assets/images/teasers/.*}],
              :check_favicon => true,
              :file_ignore => [
                               %r{/blog/\d+/(\d+/(\d+/)?)?index.html},
                               /google.*\.html/,
                               ],
              :url_ignore => [
                              "/blog/blog",
                              "/blog/general",
                              "/blog/management",
                              "/blog/tech",
                              %r{.*erikw.me/page\d+},
                              %r{/tags/.*},
                             ],
            }
  HTMLProofer.check_directory("./_site", options).run
end

Wooha what is going on here? A lot! Actually not so much. Let’s break it down:

  1. A new task is created, that depends on that the project is already :build‘t
  2. Define a few options for the tool. As you can see, there are quite a few files or URLs that I decided to ignore. Most of these are generated automatically by the theme that I use as a base for this site, and I’m OK with that the tool ignore these.
  3. Then we simply ask htmlproofer to check HTML files in my _site/ directory

Then, as the base command I’ve instructed Netlify to use for building my project is $ bundle exec rake ci, it’s as simple as just making my :ci task depend on this task:

task :ci => [:build, :test, :notify, :htmlproof]

I found it too distracting to integrate it directly to the :test task which is run by the default rake task when I develop locally. For me it’s fine to run the :ci locally if I’m unsure or rather just push to remote and ask for forgiveness in case the CI build would complain. For this personal website, this makes sense as only I have access to the git and a mistake is not going to block deployment for anyone but me.

That’s it – no more broken links or HTML tags; at least until the next CI build!

Leave a comment

Your email address will not be published. Required fields are marked *

Loading...