The definitive heatmap

The final HeatmapAfter the interest shown about the clickmaps / heatmaps articles, I’ve decided to gather all the information into an easy to use system. What we are going to make is a complete solution that allows collecting, analyzing and showing the click information our users give us. Now, it works in web pages not center aligned and is quite a bit more robust. Read on…

What?

If you are a webmaster, you had probably thought about what do users do in your website. Beyond usual statistics, clickmaps allow you to find where your users are clicking. This is quite useful to find areas in needing of change, layouts that don’t work as intended or anchors that aren’t being understood as you would like.

You’re going to be able to find every single click your users make in your website, being over a link or even in blank areas. We are going to do it the following way:

The proccess

We need to divide the full proccess into some manageable steps that use some open source tools. Since I work both in windows and linux systems, I’ll be OS agnostic and use only tools available in most systems, including Mac OSX.
The main steps and the tools they use are the following:

  1. The collecting (javascript and apache)
  2. The processing (ruby and imageMagick)
  3. The showing (javascript)

The collecting

We are going to use a small snippet of unobtrusive javascript to allow the client to tell our server the click positions. Just place this small javascript file at the very end of your template, right before the closing <body> tag:

registerclicks.js

The code adds a onMouseDown handler to the document, executes a function for every click and returns true, since we want the user to follow the normal navigation. Then, when the user clicks any part of the page, a tiny request is going to get sent on the background to our server. The script has to calculate the offsett of the first element inside the <body> tag, because most pages arent aligned to the top-right corner. In liquid layouts the system is not going to work at all

The request is sent via a HttpRequest object that calls a file in the server. In last version, I used a small GCI written in perl to log the request and return an empty document, but since we want to serve so many request, there’s a better method to apply. Using a perl CGI, in a modern server, we get the following results benchmarking with apache bench (100 requests, 10 concurrent ones):


Concurrency Level: 10
Time taken for tests: 6.537187 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 17100 bytes
HTML transferred: 0 bytes
Requests per second: 15.30 [#/sec] (mean)
Time per request: 653.719 [ms] (mean)
Time per request: 65.372 [ms] (mean, across ...)
Transfer rate: 2.45 [Kbytes/sec] received

Mod_imap

Apache has some modules that work the following way:
You define a handler and what you want to do with it. Some of them are well known, like mod_perl or mod_cgi, but a lesser known one, called mod_imap, does exactly what we want. It’s a module meant to return server-side image maps, but if we use an empty map file, all we get is a 204 status (no data) and a logged transaction. The difference is quite significative. Using Apache Bench with the same configuration, this is what we get:


Concurrency Level: 10
Time taken for tests: 0.106316 seconds
Complete requests: 100
Failed requests: 0
Write errors: 0
Total transferred: 36464 bytes
HTML transferred: 20246 bytes
Requests per second: 940.59 [#/sec] (mean)
Time per request: 10.632 [ms] (mean)
Time per request: 1.063 [ms] (mean, across ...)
Transfer rate: 329.21 [Kbytes/sec] received

That’s 950 requests per second vs 15 with the CGI method!!! We are almost a hundred times faster with this approach!The only thing we have to do to use this mod_imap is to touch a little bit the apache configuration file. Do it carefully because it can hurt your entire server. In the relevant section add the following lines:


AddHandler mod_imap .map
CustomLog /tmp/clicklog clicklog #or modify according to your system

And define a custom log in the same file adding this:

LogFormat "%q,%{Referer}i" clicklog

This way, everything ending in .map is going to be treated as a server-side map, and since the map is empty, it’s not taking your user anywhere. But it logs it, in file /tmp/clicklog (YMMV).

The log analysis

Since we used a logFormat apache directive to write our log, the format should be easy to parse. The query string is written in the log as it comes, and the full lines should be in the following format:


?x=483&y=32&dx10&dy15,http://demo.html
?x=461&y=177&dx10&dy15,http://demo.html
?x=408&y=40&dx10&dy15,http://demo.html
(...)

I decided to write a Ruby script to parse the file and generate the final images, because I hadn’t used ruby before and thought it would be a good way to approach the problem. Last time I had written an structured perl script, but I think that object-oriented is the way to go in this particular situation, since the objects should be well-defined and dividing the program among several coders should be easier too.

Update:Thanks to Jerret, this part has been enhanced using RMagick. Part of the code below can be updated and works some 50 times faster. On top of that, a new sourceforge project has been started at http://sourceforge.net/projects/clickmaps/ under a GPL license. Of course, if you don’t want to install/use RMagick you can still download the original version at the end of this post.

I´ll try to explain the model. It uses five classes:

Conf:
Sets some configuration variables and returns them as a hash. This way, every configuration variable is set in this class and it’s easy to get them later on

conf.rb

Readparsefile
Reads and parses the file defined as logfile in the conf object. For each log line, it stores it into a click object and append it to an array. There are two methods that return all the URLs in the log file (geturls) and all the information for a single URL as a Log object

readparsefile.rb

Click
Stores the data in each log line, including X, Y and URL. Provides a method (xy) that returns an string like «x100y200» to compare the exact coordinates, useful to extract the maximum number of times a single click is repeated

click.rb

Log
Stores all the values pertinent to a single URL and gives accesors to them. There’s also a «next» method that returns next click within the same URL

log.rb

Image
Receives a log object and the conf object. There are three methods to normalize the spot we’re going to use as a click indicator (normalizespot), compose every click as a dot (iterate) and colorize the final image (colorize)

image.rb

Then, the main program is only eight lines long. It leverages the objects’ methods to be as compact as possible. In fact, the only thing it does is to iterate over each url to create a different image.

conf = Conf.new
file = Readparsefile.new(conf.data['logfile'])
file.geturls.each do |url|
    image = Image.new(file.coordsurl(url),conf)
    image.normalizespot
    image.iterate
    image.colorize
end

Is it better?

You can find another program (this time written in perl) in an older post that does a similar job of making heatmaps. But there has been some modifications that makes this an usable system instead of a proof of concept:

Flexible configuration
Over the harcoded last version, in this one is quite simple to modify the images used in the heatmap generation, or the log name. You only have to modify the Conf definition. It would be so easy to use an external conf file, but doing it this way is quicker for me
Multiple URL support
While last version only let you extract one image, this one makes a heatmap for every URL in your log.
Much faster execution time
Instead of composing the full image everytime, now we create a single ImageMagick sentence to do al the composition for us. That gives us a couple of orders speed advantage. Last version lasted about fifteen minutes for a couple hundred clicks, and now it’s about five seconds. Please note that, for many clicks, the program uses quite a bit of memory. Probably for a production environment it would be neccesary to divide the compose sentence into manageable chuncks, and iterate at the end with them to create the final heatmap.
Manual capture is not needed anymore
Since the last step is to decrement the opacity of the map, we can use a little bit of javascript to overlay the PNG image over the original page. So, the stakeholders can review it without someone manually capturing the screen. This way we don’t need to set an XServer in the production environment
Easier to maintain and extend
The object oriented paradigm doesn’t give us faster code, but much more manageable one. You can extend it as you want

What you get

Now, you’ll have several images. Most of them are OK to delete, but there’s one ending in final.png that’s your heatmap. We’re going to overlay it on top of your web page. That image should be a semi-transparent PNG like this one:

The overlay

This is the final part of the proccess. We already have the overlay image and all we need is a javascript snippet that can be called anytime and that creates a layer on top of your website with the click information. Just like the first step, we’re going to position it over the very first item in the page.
The best way to do that is via a bookmarklet, that is, an small javascript snippet saved as a bookmark. This way, you can have it in your browser and ask for the overlay image when you feel like. The javascript recalculates the offsets of the first element inside the <body> tag and writes the heatmap image on top of it.

overlay.js

The result

We got a beautiful heatmap on top of our web page. We can call the overlay from wherever we want and show it to the project stakeholders. Look at the result:

The code

I made a ready to download package with all the code. It’s released under a MIT license that means that you can do whatever you want with it. Probably in the future it’ll be part of an open source release; if you feel like, start it yourself or contact me for more information.

Download code. Tar.gz file

What else?

The sky is the limit. If you want a hosted service, contact us. Our company can give you bespoke solutions to all your web intelligence needs, being it log analysis, path tracking and so on. If you’re a developer, feel free to use all the code as you wish, and please write me to tell your experiences. Stay tuned!

By the way, there has been a post in remysharp blog explaining how to record the clicks in a different server. Thanks.

93 respuestas a “The definitive heatmap”

  1. Why use imap to server the static file if you compare it to a benchmark against a .gif file there’s no optimization. Without the map_imap you can strip down apache hence it will run faster.
    See tests below:

    // gif
    Concurrency Level: 1
    Time taken for tests: 10.294763 seconds
    Complete requests: 10000
    Failed requests: 0
    Write errors: 0
    Total transferred: 3640000 bytes
    HTML transferred: 430000 bytes
    Requests per second: 971.37 [#/sec] (mean)
    Time per request: 1.029 [ms] (mean)
    Time per request: 1.029 [ms] (mean, across all concurrent requests)
    Transfer rate: 345.22 [Kbytes/sec] received

    // imap
    Concurrency Level: 1
    Time taken for tests: 10.278557 seconds
    Complete requests: 10000
    Failed requests: 0
    Write errors: 0
    Total transferred: 3540000 bytes
    HTML transferred: 180000 bytes
    Requests per second: 972.90 [#/sec] (mean)
    Time per request: 1.028 [ms] (mean)
    Time per request: 1.028 [ms] (mean, across all concurrent requests)
    Transfer rate: 336.33 [Kbytes/sec] received

  2. It’s a perfect valid solution. You add some KBs to the transfer, but that should’t be a problem. Anyway, the best solution would be a handler that returns a «204: No Content» response and take no further action.

  3. I’m trying to realize in Java the same software in order to visualize eye-movement fixation on a web page. I have some problem to visualize the heatmap with the same graphical quality because I don’t understand what kind of graphical function has been used in Ruby…. there is someone that know how can I obtain the same graphical results by using Java?
    thank you very much
    Carlo

  4. very cool article. I am using it with mod_log_access, which is a drop-in replacement for mod_log_config and the clickmap logs are very useful! thanks again for an informational read.

  5. Hi Liesen,

    We’ve also developed a standalone version, written in C and using LibPNG for the output for a customer. It goes about ten thousand clicks per second!
    I’m glad that the ideas presented have been useful. I’ll try to write some more next month.

  6. Do you have the «formula» by which you made bolilla.png? (Also, the C person that did the LibPNG port… Can you post your code?)

  7. The pelotilla images were created with The Gimp, using just a gray gradient. For the LibPNG version, they were exported to PGM and filtered via a perl script. The colors were exported to BMP format and read by another script to output the vector information.

    The LibPNG version isn’t meant to be published, but probably used as a web service. The only people using it in their own machines right now are the original customers (TNS Galllup) and we both share the code base.

  8. Hi,

    I was wondering whether it\’s possible to set this up without direct access to the httpd? My host, Hostgator, apparently won\’t insert the lines in the Apache httpd file as I\’m on a shared server. I guess it might affect other customers.

    Any thoughts?

  9. Hi martin,

    You should be able to record easely the query string into a file. It can be done via PHP, Perl, Ruby… The easiest way would be (in perl)

    #!/usr/bin/perl
    open (LOG,»>>log.txt»);
    print LOG «$ENV{QUERY_STRING}\t$ENV{HTTP_REFERER}\n»;
    close (LOG)
    print «Content-type: text/html\n\n»;

    And that’s all (pretty basic and not production ready, but that’re the basics)

Comentarios cerrados.