Removing Exif Data from Images in Your Website With Rake and CI Build Pipeline

11 minute read

Exif?

You know that when you take a photo, or edit one you found on the internet with an image program, there is going to be a lot of extra data attached to those images. It’s going to be what is known as EXIF Exchangeable Image File Format
data. So what might this be? Let’s take an example. If you have ImageMagic installed via your standard package manager you can query any image for Exif data like:

$ identify -verbose funny-internet-cat.png | grep exif:
Profile-exif: 17023 bytes
    exif:BrightnessValue: -65/100
    exif:ColorSpace: 1
    exif:DateTime: 2021:07:14 20:00:19
    exif:Flash: 16
    exif:FNumber: 175/100
    exif:GPSLatitude: 52972130/1000000, 0/1, 0/1
    exif:GPSLatitudeRef: N
    exif:ImageLength: 960
    exif:ImageWidth: 2080
    exif:LightSource: 21
    exif:Make: Samsung
    exif:YCbCrPositioning: 1
    [...]

There’s a lot more to the funny cat that we first think! This output is already heavily truncated but we can learn about the camera settings that was used, that some sort of Samsung device took the photo and even at which GPS coordinates it was taken at! That’s pretty cool, because photo apps can then draw a map and show your photos on it, or show them to you in a timeline.

However if you’re publishing images to your website, maybe you want to remove some of this information for whatever reason. Maybe you have the need to keep your anonymity? Then this data can be used to profile you easily. When you edit an image with Gimp, Photoshop etc., there will be some metadata stamping that such an image editor was used - which you may or may not want people to know that you used. Maybe an attacker could learn what version of an image program you use and make this information to their advantage for an attack? What do you I know, there might be any reason that you want to not share huge amount of extra data about your photos when you publish them on your website.

So what might you be able to do about this?

Creating Rake Tasks to Detect and Remove Exif Data

Task automation using great tools to the rescue! ExifTool let’s you easily view, modify and remove Exif data for one or many files. Check out the manual here. For my blog, this blog, I’m using Jekyll A Static Site Generator (SSG) built with ruby. Popularized as of its adoption in GitHub Pages.
which is a ruby project. Thus it’s natural for me to use Rake create myself some build tasks in a Rakefile just like the good old days with Makefile! I’ll show you here how you can create rake tasks that lets you detect if any of your images have Exif data, a task for removing all of those and how you can integrate this to you ci build pipeline.

Installing the Tooling

Given the ruby setting, the most available way to get ExifTool installed for my Jekyll project was by using the exiftool_vendored Gem. Simply add to your Gemfile something like

group :development do
  gem 'exiftool_vendored', '~> 12.0', require: false
end

and run the usual $ gem install after.

Alright, so how can we use it? Checking the manual tells us that we can require the gem in a ruby file and then we can access the path to the exiftool binary via Exiftool.command:

require 'exiftool_vendored'
sh "#{Exiftool.command} funny-internet-cat.png"
end

That’s a great first step; we can now from ruby code make Exif operations on images!

Creating Rake Tasks

Now let’s make use of these new powers by looking for all images in the website source directory, then check for each of them if there are any Exif data in there. I’m using Jekyll and thus I would like to scan asset/images which is where you would typically put your images in a Jekyll project. There happen to be a specific folder in there, favicons-gen which I want to stay untouched as I want them to be the way that realfavicongenerator.net produced them. To my Rakefile I start by adding this utility function that we will use in rake tasks later on:

$IMAGE_PATH = 'assets/images'

# Return an array of images in $IMAGE_PATH that contain EXIF data
def find_exif_files()
  exif_files = Array.new
  # Why not regex alternation? Because BSD find uses BRE/POSIX regex by default, and the -E extend switch is not supported by GNU find....
  lines, = Open3.capture3("find #{$IMAGE_PATH} -not \\( -path assets/images/favicons-gen -prune \\) -type f \\( -iregex \".*\\.png\" -o -iregex \".*\\.jpe?g\" -o -iregex \".*\\.gif\" -o -iregex \".*\\.webp\" \\)")
  for line in  lines.split(/\n/)
    _, _, status = Open3.capture3("#{Exiftool.command} #{line} | grep -v 'ExifTool Version Number' | grep -q '^Exif '")
    exif_files.append(line) if status == 0
  end
  return exif_files
end

It can surely be done in many different ways, but for now this does the job. With this handy function, it’s now a breeze to create us a rake task that simply prints all images in the source tree that does have Exif data:

desc "Find images under #{$IMAGE_PATH} that contain EXIF data."
task :exif_find do
  puts "Looking for EXIF data in #{$IMAGE_PATH}/...".bold
  puts find_exif_files()
end

We could have made this just 3 lines if we wanted. Now it’s as easy as to type $ bundle exec rake exif_find.

Let’s say that we have a workflow of uploading many photos, and we just always want to remove any Exif data. Then we can remove those by again using the utility function we created and then let the ExifTool remove the data for all those images. From the ExifTool manual we find that to simply remove all Exif data, we should use the -all= argument to the program.

desc "Remove EXIF data from all images in #{$IMAGE_PATH}/"
task :exif_clean do
  puts "Removing EXIF data for all images in #{$IMAGE_PATH}/...".bold
  files = find_exif_files()
  sh "#{Exiftool.command} -all= -overwrite_original_in_place #{files.join('  ')}" if files.any?
end

Not so hard!

CI Integration

If you want, you could automatically run the exif_clean task as part of your build step in you ci setup even. For me however, I want to review any changes locally and commit them to my git repo first after I’m sure the files are good. Thus I could create a simple task that just fail the build if there are any Exif data detected. I have a meta rake task called ci which I would extend with this check. In my case, I let Netlify to build & host my site and I have configured their build system to simply call bundle exec rake ci. This is the relevant portion from the Rakefile:

desc "Build steps to be used by ci runner"
task :ci => %w[build test exif_find_fail]


desc "Fail build if there are images under #{$IMAGE_PATH} that contain EXIF data"
task :exif_find_fail do
  puts "Looking for EXIF data in #{$IMAGE_PATH}/...".bold
  if not find_exif_files().empty?
	fail "Found images containing EXIF data in #{$IMAGE_PATH}"
  end
end

git commit && git push

and that’s it :)

Leave a comment

Your email address will not be published. Required fields are marked *

Loading...