Skip to Navigation | Skip to Content

Eye Tracking Without Eyes

"Participant-free eye tracking" has been around for a while but is still attracting quite a bit of attention. Websites such as EyeQuant, Feng-GUI, and Attention Wizard allow you to upload an image (e.g., a screenshot of a web page) and obtain a visualization (e.g., a heatmap) showing a computer-generated prediction of where people would look in the first five seconds of being exposed to the image. No eye tracker, participants, or lab required!

These companies claim a 75 - 90% correlation with real eye tracking data. Unfortunately, I couldn't find any research supporting their claims. If you know of any, I'm all ears.

To satisfy my curiosity about the accuracy of their predictions, I submitted an eBay homepage to EyeQuant, Feng-GUI, and Attention Wizard and obtained the following heatmaps:

Attention heatmap simulation by Feng-GUI
Attention heatmap simulation by Feng-GUI

Attention heatmap simulation by Attention Wizard
Attention heatmap simulation by Attention Wizard

Attention heatmap simulation by EyeQuant
Attention heatmap simulation by EyeQuant

I then compared these heatmaps to the initial five second gaze activity from a study with 21 participants tracked with a Tobii T60 eye tracker:

Attention heatmap based on real participants
Attention heatmap based on real participants (red = 10+ fixations)

First, let me just say that I'm not a fan of comparing heatmaps just by looking at them because visual inspection is subjective and prone to error. Also, different settings can produce very different visualizations, and you can't ensure equivalent settings between a real heatmap and a simulated one.

With that in mind, let's take a look at the four heatmaps. The three simulations look rather similar, don't they? But the "real heatmap" seems to differ from the simulated ones quite a bit. For example, the simulations predict a lot of attention on images (including advertising), whereas the study participants barely even looked at many of those elements. Our participants primarily focused on the navigation and search, which is not reflected in the simulated heatmaps.

The simulations also show a fair amount of attention on areas below the page fold but the study participants never even scrolled! In addition to a heatmap, Feng-GUI produced a gaze plot indicating the sequence with which users would scan the areas of the page. The first element to be looked at was predicted to be a small image at the bottom of the page, well below the fold:

Gaze plot simulation by Feng-GUI
Gaze plot simulation by Feng-GUI (the numbers in the circles indicate the order of fixations)

I wish we could compare the simulated gaze activity to the real gaze activity quantitatively but that doesn't appear to be possible. Even though Feng-GUI and EyeQuant provide some data (percentage of attention on an area of interest) in addition to data visualizations, it's unclear what measure these percentages are based on:

Percentage of attention on the navigation predicted by EyeQuant
Percentage of attention on the navigation predicted by EyeQuant

But even based on me just eyeballing the results, I know I wouldn't be comfortable making decisions based on the computer-generated predictions.

The simulations have limited applicability and can by no means replace real eye tracking. They make predictions mostly based on the bottom-up (stimulus-driven) mechanisms that affect our attention, failing to take into account top-down (knowledge-driven) processes, which play a huge role even during the first few seconds.

Computer-generated visualizations of human attention may work better for pages with no scrolling and under the assumption that users will be completely unfamiliar with the website and have no task/goal in mind when visiting it. How common is this scenario? Not nearly as common as the sellers of participant-free eye tracking would like us to believe.

Comments

Aga,

Interesting study.

I'd just like to know:
1. What was the particiapants' task for your study?
2. Do these simulations services claim anything about the tasks?

Thanks
Caroline

Great questions. Thanks, Caroline!
1. Our participants were looking for holiday gifts.
2. The simulation services’ websites don’t seem to mention anything about tasks (or other factors that may affect the applicability of their heatmaps).

I’d like to run a few more Web pages and full page ads through the simulations and compare them to the results from real eye tracking studies. Even though the companies claim high accuracy for websites, I believe their predictions will be better for ads.

I was waiting for a post like this one :) Check out VAS by 3M too if you've got the time. Their algorithms indicated "accurately" how people would look at a particular image ;)

Hi Aga,
It seems that the Tobii eye tracking results you used are only showing webpage information above the fold. And at the simulation services, you used a larger webpage image with information bellow the fold. You should crop the original webpage image and run the simulation services again.

Great article - thanks. I think Caroline's question hits the nail on the head, too.

I read a related article by Susan Weinschenk which makes a similar point; worth a look:

My blog post on "What People Look At On a Picture Or Screen Depends On What You Say To Them" by Susan Weinschenk

Thanks, Alex! 3M VAS wasn't much better at predicting attention on the eBay homepage: Attention Heatmap Simulation by 3M VAS.

They claim that you can "test client-directed creative or usability decisions." Usability decisions? That's a bold statement. And their pricing is higher than their competitors'.

I liked the fact that they have a paper on their website that describes a validation study which compared their simulations to real eye tracking data. However, because scene photos and marketing material were used in the study and there were no tasks (participants were instructed to just look), the results can't be generalized to websites.

Hi Rafael,

Great to hear from someone from Feng-GUI! Actually, I used the exact same image for the simulation as I did for the real eye tracking study. The reason why the "heat" is only above the fold on the Tobii heatmap is because our participants didn't choose to scroll in the first five seconds.

I cropped the image to show only the above-the-fold content and ran it through Feng-GUI again but the prediction still doesn't seem close: Attention Heatmap Simulation by Feng-GUI (above the fold only)

Can you think of anything else I should try?

Thanks, Neil! Yes, eye movements are very task-dependent and many people bring up Yarbus' picture with the different scanpaths to demonstrate that.

What many people don't know is that there was only one person in Yarbus' study. What's more, that might have even been Yarbus himself! Regardless, there has been a lot of research since then to support task dependency of eye movements.

Some of the simulation services claim that the task doesn't matter in the initial few seconds of exposure to a visual. They say the task and other top-down processes kick in a little later. According to research, however, that's not exactly true.

Hi Aga,
Thank you for comparing. For your convenience, I have asked to add more credits to your Feng-GUI account.

The eye tracking results you used are clearly task dependent, as you can see that the users where browsing through the left menu, the categories dropdown and the contact us link.

Attention simulation services are mostly task independent and this leads to the differences in the results. Attention services assist the designer to choose between alternatives of layout, colors, brightness and composition of page and advertisements elements, in order to come up with the most engaging design for the click to action and the key messaging.

Thanks for the extra credits, Rafael!

I think what we've established thus far is that because the simulation is task-independent, it may not be applicable to websites because people don't visit websites just to look at them. Instead, they typically have a (more or less specific) goal in mind.

I'm going to run some task-free stimuli and compare with my eye tracking data next week. So, stay tuned!

Hi,

i did the same a long time ago and published few results - in French :) - and do agree with your comments...

http://www.catcheye.fr/confusion/

BR. Loic / Catcheye

Thanks for sharing, Loic!

Hi Aga!

EyeQuant co-founder here. Thanks for this article and thanks for checking us out!

Firstly, as to the heatmap comparisons in your example: I am little surprised about your assessment of the differences and similarities of the heatmaps: while AttentionWizard's, FengGui's and 3M's maps look in fact conspicuously similar, EyeQuant's predictions come very close to the empirical heatmap, don't they?

Mainly I'd just love to chime in with the *research answers* you requested and a few comments on the science, validity and restrictions of attention prediction services.

I saw you also have a background in cognitive science (cool!), so you maybe even heard of some of the neuroscientists on EyeQuant's team - e.g. Prof. Christof Koch from Caltech and Prof. Laurent Itti from USC, who are both quite well known for their work on computational attention modeling and saliency maps.

Let's start with the most important thing: do these services actually deliver their claimed accuracy? I absolutely agree that the only way to know is to quantitatively measure them against eye-tracking results, which is something we do quite a bit at EyeQuant! While EyeQuant itself is based on hundreds of eye-tracking studies, we evaluate every new version with a dedicated eye-tracking study on its own - typically with a mixed group of around 50 subjects. Specifically, we examine how well EyeQuant predicts the outcome of such a study using the standard ROC / AUC approach - that means we first check how well the study predicts its 'own' results (i. e.: how well does one half of the subjects predict the other half's fixations?) and compare EyeQuant directly with this Gold Standard. In a typical 'content awareness' task, where subjects try to quickly scan the content of a website, EyeQuant consistently achieves over 90% of the Gold Standard's accuracy.

The evaluation method is described in a bit more detail in one of our publications here: http://cogsci.uni-osnabrueck.de/~NBP/PDFs_Publications/Betz-2010-jov-10-3-15.pdf

The model used in the paper *is not* the EyeQuant model, since the latter is patent-pending in the USA. We do not publish data or algorithms that are directly related to EyeQuant, but I'd be happy to send you a more detailed document on the evaluation in an email!

We have a ton of internal comparisons, but understand that external eye-tracking data delivers a more objective basis - e. g. check out the following comparison of an eye-tracking study by Tobii vs. EyeQuant and the other tools: http://hq.whitematter.de/~jss/pub/baby_com.png

On the right are EyeQuant's results, compared to Tobii (Gold Standard), Feng-Gui and 3M VAS. The comparison speaks volumes, and we're glad to back it up with data.

PS: more EyeQuant credits for you, too! :)

I didn't think this discussion could get any better but it just did! Thanks so much, Fabian, for all the additional information (and the extra credits, of course!).

I found the EyeQuant heatmap to be the most subtle/cautious in terms of its coloring. Because of that and because the blue shading covers a lot of areas that the other heatmaps leave blank, the EyeQuant visualization does look less inaccurate to me.

It's actually less of interest to me which simulation service is better. I'm more interested in defining the applicability of the simulations in general. It seems that many people either believe that the simulations can completely replace eye tracking or they consider the simulations to be a joke. I think there is a place for them but perhaps more on the marketing side than UX.

But the first step is to review the research that aims to determine how the simulations predicts real eye tracking data. Ideally, the research would be done with various types of stimuli (e.g., full-page ad, website) and in various contexts (e.g., specific vs. general task, stimulus-related vs. stimulus-unrelated task).

I really like the fact that you do your own research - please always feel free to email me any additional information if you can't post it here.

Great article Aga, and definitely one that has sparked some lively debate on this! Been keeping an eye on the discussion since you first posted.

Like it or lump it, the simulated attention analysis results do appear quite impressive to the untrained eye, especially to those who sign off marketing budgets and like to see a pretty picture, especially when their logos and product seem to have high attention. (Speaking from experience!)

Maybe I'm biased, but I'm still very much convinced that you get what you pay for. Even if these attention simulations are 100% accurate, there is rarely a situation for real life where you just go to a website and just “look at it”.

We are motivated to “do” something on a site, and the secret sauce of real user studies is to determine whether they “get it”; which means, visual attention in the context of what they are trying to achieve, visual cognition and mental models (ie familiarity), what gets stored into working memory (i.e., learning how it works), how they prioritise visual elements, how they react/hesitate to key actions, and how they achieve their task (even on a page level).

Once simulation services can do all of the above accurately, I’ll be more than happy to get rid of my Eye Tracker on eBay. It’d be an easy sale as EyeQuant would’ve helped users find it quickly!

Dan, I could not have said it better myself. Thank you! :)

Many thanks for your analysis and your passion.

It's really important to understand what's behind these tools ...

Visual Attention is used to select important areas of our visual fiel (alerting) and • to search a target in cluttered scenes (searching).

There are two kinds of Visual Attention :
• Overt Visual Attention: involving eye movements
• Covert Visual Attention: without eye movements (covert fixations aren't observable). Attention can be voluntarily focussed on a peripheral part of the visual flied (as when looking out the corner of one's eyes. Covert attention is the act of mentally focusing on one of several possible sensory stimuli)

Indeed eyetracking gather Overt Visual Attention where Feng-Gui, AttentionWizard and 3M VAS try to modelize Covert Visual Attention.

Overt Visual Attention data depends to the task that people want to realize on a site where Covert Visual Attention is relative to characteristics of graphical components in the screen independently of the tasks.

That means that we can't compare these two complementary data : in a behavioral sequence, preattentive processing (Overt Visual Attention) occurs before attentive processing (Covert Visual Attention).

Thanks for your comment, Marc.

It is true that overt attention involves direct gaze, while covert attention is the ability to attend to something without actually looking at it. However, it is not true that overt attention depends on the task and covert attention on the stimulus characteristics. What you are referring to are top-down processes (task- and experience-dependent) and bottom-up processes (stimulus-dependent). These processes play a role in both attention types – covert and overt.

I also disagree with your claim that the simulation services are trying to simulate covert attention. Their websites indicate they are simulatingovert attention. For example:

  • “Feng-GUI attention map reaches over 80% of AOI (Areas of interest) similarities to Eye and Mouse Tracking. […] We compare and measure the results with eye-tracking sessions.”
  • "AttentionWizard simulates human vision during the first 5 seconds of exposure to visuals, and creates an eye-tracking heatmap based on an algorithm that predicts what a real human would be most likely to look at. […] AttentionWizard results are 75%+ correlated with eye tracking and mouse tracking approaches."
  • “EyeQuant combines a decade of neuroscientific research in attention psychophysics with the most current computer models of human attention. The models are based on empirical Eye-Tracking data that has been gathered in numerous studies with over 400 human subjects.”
  • “3M™ Visual Attention Service is a cutting-edge scanning tool that lets you scientifically analyze design effectiveness, based on how the average human eye responds.”

If you meant that the simulations can, for the most part, only simulate bottom-up processes, I agree with that completely.

Hi Aga, we are 100% aligned ;) Simulation services can only simulate covert bottom-up processes.

In other words, they can predict which component of a screen will attract attention more than another and not where people will look at during a real task. It's a precious information and it can help many designer to finetune their creation.

It's also strange to see that all these models are talking about eyefixations during 3 to 5 seconds. Indeed, it never takes 3 seconds for the brain to analyse the landscape of a page.

Have a nice day...

Hi Aga,

Thanks for the post - very helpful. The follow-up discussion has been good too. A few quick points (some of them have been touched on already):

1) All algorithmic approaches focus primarily on the visual interest in the scene. When this is applied to websites, that means focusing on contrast, line orientation, areas of interesting detail, skin texture, faces, etc. Many sites include moving elements or animation, so of course static screenshots will not be able to take those into account. Motion draws huge amounts of attention.

2) In my opinion, comparisons of accuracy are not very helpful. Eye tracking is very heavily person-specific and task/intent dependent. Also, as discussed by others above, algorithms focus on initial first-impressions and not any kind of volitional actions (e.g. scrolling or reading text).

3) You need to find the right tool for the right job. Eye tracking is expensive and time consuming (equipment, recruiting of subjects, analysis of the data) but it yields very rich information. Mouse tracking is helpful too, but requires a live landing page with a decent amount of traffic to it. Algorithmic approaches are less accurate but give you instant results. They can also be used on mockups before you deploy a page live. This allows you to fine tune a design and eliminate the obvious visual attention leaks first. They are also relatiely cheap. You should be using all three approaches (sometimes in parallel depending on the circumstances).

Hope this helps. I don't have any free credits for you. But I hope that the one-cent trial for ten heatmaps will not deter you from signing up on http://AttentionWizard.com.

Thanks for your comments, Tim. I agree with your points #1 and #3 but I'm not sure if I agree with #2. If the simulations claim to predict how the human eye responds to visual stimuli, why can't we test their accuracy by comparing their predictions to real eye tracking data collected for the first 5 seconds of the participant's exposure to a stimulus (which is what they claim to be predicting)?

I'd like to chime in on #2: You really *have* to be able to measure accuracy by evaluating your tool against the Gold Standard of eye-tracking - and if your tool really works, why wouldn't you? :)

We at EyeQuant have published extensively on this topic via our associated labs at CalTech, USC and the University of Osnabrück - there's actually a very standardized way of assessing predictive performance (at least in the scientific community), here's how it works:

You've got a prediction and would like to know how well it actually predicts real behavior. First, you need a Gold Standard: how well does one eye-tracking study predict another one?

The way any serious lab will do this is to use a so-called "Receiver operating characteristic" or ROC. Bear with me, it's pretty straightforward:

In this context, a ROC curve (more specifically, the area under the ROC curve or AUC) can tell you how well one set of data (the prediction set) discriminates between fixated and non-fixated points in a second set of data (the eye-tracking set): if it's 0.5, your algorithm performs *at chance level* (not good, you're better off guessing yourself). The closer you get to 1, the better.

Now, eye-tracking itself achieves around 0.9 - 0.95 with 25 subjects predicting another 25. Obviously very good - that's why it's the Gold Standard. Also note how it interestingly doesn't get better when using more subjects!

EyeQuant achieves over 0.85 for websites - i. e., over 90% of the Gold Standard's performance.

The big difference to algorithmic approaches is that EyeQuant uses a data-driven model that is constantly fed with new eye-tracking data - the current model is based on data from over 500 subjects.

After all, developing technologies like this isn't a pet project but takes considerable resources and research. At EyeQuant we're glad to be working very closely with the three leading attention researchers in the world and we pride ourselves with laying extensive focus on the actual accuracy of our predictions.

Post a comment

We’ve enabled comment moderation on Rosenfeld Media. Upon posting your comment, it will not immediately appear on this page. Hang tight, we’ll be sure to screen it before too long. (Starred fields are required)

We don’t like these either (but comment spam makes them a must)