FEATURE1 February 2010

Ahead for numbers

Data analytics is becoming an increasingly mainstream way to squeeze more value from information. Doug Edmonds, managing director of 2CV, offers some tips for those who feel daunted.

The recent purchases of leading analytics companies Omniture and SPSS by tech giants Adobe and IBM show that data analytics is continuing its move towards the mainstream. As a result, researchers can expect to have more and more contact with the field.

As a topic of conversation, analytics divides the world into two groups: those who know a lot about it and those who know very little. It will come as no surprise that the first group is much smaller than the second. But there are always a few exceptions – those people who know enough to separate them from the larger group but not quite enough to gain membership of the elite analytical brains that reside in the smaller group. I belong in this middle ground, and as such I have made it one of my professional aims to try to bring analytics to the research masses.

Where there’s muck there’s brass
The expression ‘Where there’s muck there’s brass’ epitomises analytics. While we don’t mean ‘muck’ in the traditional sense, it does have a parallel in the digital era. As people go about their daily lives in the 21st century they cast off a rich trail of data. When captured and analysed, it can deliver value to businesses – in the pricing of insurance products, the mix of goods that a supermarket stocks on its shelves or the advertising that you get served on a website.

Networks and distributed computing power have enabled businesses to become adept at capturing customer data at every consumer touchpoint. In creating an ongoing dialogue with their customers, businesses now have swathes of data at their disposal. With this come challenges and opportunities. How do you manipulate such quantities of information? And how do you mine that data for commercially useful insights?

In an economic climate where businesses are under severe financial pressure, using existing customer data to identify key relationships and develop marketing hypotheses is a way of making the data that you’ve already obtained work harder – and thus to stretch the research budget. Analytics is the way to turn data into brass, uncovering hidden patterns from which valuable insights can be derived.

Understanding of how analytics can be applied has been clouded by the use of the word as a catch-all term covering anything from the analysis of web-usage data to advanced statistical modelling. Both uses are correct, despite being very different activities, and this hasn’t helped the broader audience grasp some of the fundamentals of analytics.

A simple definition would be that analytics is the process by which a data set is explored to look for underlying relationships (which can then be exploited for future commercial gain). It’s the process of taking raw data and squeezing out the interesting bits – and it is important to think of it as a process rather than as a department, toolkit or piece of software. It demands rigour and attention to detail as well as careful forethought, in order to have a meaningful outcome.

Identifying your objective
The starting point for analytics is to define what we are attempting to do. Is it to identify the purchase drivers of the most valuable customers? Is it to understand the impact of product choice on churn? Either way, we need to decide on a concrete end point, so that we can identify how to start, what hypotheses we have, and what tools and techniques we need to verify or reject those hypotheses.

To be worthwhile, analytics must produce something of meaning and value to a business. Our aim is to link information that we already know about customers with critical business performance indicators. Ideally we will select hard business measures such as sales, but this can bring with it a whole new set of complexities, so to make the process easier we will often select a measure that is a proxy for a hard business success measure – typically primary measures from survey research that relate to customer experience.

One such metric is customer satisfaction, but even this is not without its problems. The relationship between business metrics and satisfaction will never be cut and dried since stated intentions change over time and some customers do not follow through with their claimed behaviour. This can be overcome to some extent by recalibrating a proxy measure to account for this lack of follow-through on the part of some customers, and to account for those external factors that often play a vital role in consumer decision-making.

Having set the appropriate objective, we now have a framework with which to start designing the analytics.

Set out on the right foot
To know where best to start we must create a set of hypotheses that we will either accept or reject as a result of our work. Failure to start any analysis in this way will lead us into trouble. George Soros, the billionaire businessman and philanthropist, is adamant that if you search for evidence that supports what you believe to be true you will find it eventually – but that doesn’t necessarily mean it is true. Creating a framework of hypotheses helps avoid this analytical pitfall.

To get from start to end we need to open up our analytics toolkit and pick an approach that is most appropriate to the objectives of the analytics. At this point we enter a world that is by its nature very technical. We may have more power to analyse data than ever before, but we also have more data to analyse. The technical details of all this are important to practitioners of analytics, but we shouldn’t let them distract us from the fundamentals.

Analytics: Making it all add up

?1. Less is more
The most important principle is that less is more: the fewer explanatory variables that go into creating an analytical model the better. Starting off with too many variables can cause models to become too precise as a result of aberrant statistical effects. One example is the portfolio effect, where if you have, say, twenty different metrics, statistical error and natural variance are likely to mean that one of them will move in line with customer satisfaction. These random movements can result in models that are built on too many metrics. Instead of looking at whether variables correlate, look at whether they add anything to the analysis. If they only add complication rather than explanation, it’s best not to include them.

2. Seeing is believing
Visualising data is arguably the bedrock of analytics. At its simplest a graph is a visual exploration of the relationship between two different pieces of data. It also allows us to use the most advanced analytical tool – our own brains. One neat example of this is the control chart, which plots the average and the variation from the average. By showing values that are significantly higher and lower than the average, our eyes can then see which movements are important and which are not. Using this simple approach we can identify a relationship between product and marketing activity and people’s attitudes towards the brand. This creates a robust platform to conduct more advanced analysis.

3. The importance of variance
If analytics is all about uncovering hidden relationships in data, then how a metric varies in value between different audiences or over time is key. When we find patterns in data it’s because the variance in the value of one metric follows a similar pattern to the variance of another. However, not all variance is equal and those metrics that move up and down steadily in value such as temperature are a lot easier to incorporate into a piece of analytics than a binary metric such as gender, which will require special treatment by the analyst.

4. Accurate enough, but not too accurate
Ironically, a prediction based on analytics can sometimes be ‘too good’. The term used to measure the accuracy of predictions in analytics is r2 (r-squared). The r-squared value tells us how much of the change in a target metric, such as customer satisfaction, is explained by other measures included in the analytics. It ranges from zero to one - an r-squared of zero means that there is no relationship between the target measure and other measures used in the analysis. Conversely an r-squared of one (or 100%) means that all changes in the target measure are explained within the analytical process. But be careful – too high an r-squared may also attract suspicion, because it suggests that the model is falling foul of the ‘less is more’ principle above, with too many variables included.In these financially constrained times, squeezing every last drop of insight out of research is key to the health of the industry, and analytics gives us the capability to do that. Understanding the power of analytics in helping to identify the value in customer databases and survey data should be high on the research industry’s agenda.

1 Comment

14 years ago

Well written Doug. I have always believed that imagining the solution and then setting out to prove it is far better then the approach of 'throwing it in the pot' and seeing what comes out. I always use a 3 S rule - sense, sensitivity and statistics. I place them in that order. If it makes sense, it it explains variations between groups over time and holds up to be a reasonably valid statistical model I will take that over a strong r-squared from a marketing sciences team any day.

Like Report