Thursday, 02 September 2010

The science of Web 2.0 and what it means for research

In the world of Web 2.0 everybody gets the chance to rate and evaluate a vast range of products and services. What does this mean for the research world? Beehive Research's Paul Kavanagh tries to find a formula for valuable insight

Whatever your view of Research 2.0, you would have to be a recluse not to have spotted the rise in popularity of wikis, blogs, forums, online auctions, e-commerce, virtual worlds, word of mouth advertising, ratings and social networking sites. Interest and debate have heightened among researchers worldwide in a quest to understand how to tap into consumer opinion and behavior in a new and exciting way and it appears the trend is set to continue. Not surprisingly it is the younger generation that seems to have taken to Web 2.0 in greater numbers.

For my two sons, computers have always been the norm and life couldn't exist without the daily visit to Miniclip, Facebook, Bebo or YouTube. However if, like me, you are of the generation who didn't even have a computer at school, interacting with friends, acquaintances or even strangers in a virtual world may seem a little more distant. That said, I, like many others, created a Facebook profile in the true spirit of research and swiftly discovered, to my Goddaughter's glee, that I am perhaps a demographic or two above the norm! It also became readily apparent that my motivation, engagement and usage of the site and interaction with others were also considerably different. So what lessons can we draw from Web 2.0 to apply in Research 2.0 communities and panels?

These were some of the questions Screwfix Direct wanted to answer, with the aim of applying any findings to the future development of their online customer panel. Lloyd Viney, research manager at Screwfix, explained: "We have a separate talk forum and online panel, our customer base is extremely loyal and our panellists like to provide feedback. However, we would like to explore how we can provide a better environment to gain customer feedback, whilst managing the common business issues of how to effectively engage with customers, how to manage diversity and how to deliver reliable results cost-effectively."

Panellists were engaged in a research programme to understand:

• their use of social networking sites

• which Web 2.0 engagement features interested them

• their own suggestions of things they would like to see or do


Some of the results raised interesting questions for Research 2.0 application.


Demographics and diversity
Respondent skew has always been a research issue, and not surprisingly the adoption rate of social network sites, in particular, is heavily skewed to the younger generation. Amongst Screwfix panellists, the older you were, the less likely you were to have visited or interacted with a social network. Over 70% of 15- to 34-year-olds had visited YouTube compared to fewer than 35% of 55+ year olds and the story with Facebook was similar, with around 55% 15- to 34-year-olds compared to less than 15% 55+ year olds.

When looking at panellists who had visited Facebook, the pattern was repeated for their frequency of use. The younger generation was again significantly more likely to use the site daily or weekly than the 55+ year olds, whose usage was more likely to be monthly or less, which was remarkably similar to my own pattern of behaviour.

The significance of this to Research 2.0 appears to be that if we can build really engaging communities,there is likely to be a heavy skew towards the lower age groups. In some ways that will be a good thing since the younger generation is often one of the harder audiences to reach. But the question is then, can we engage participants sufficiently in a research community in the first place or do we just find ways of utilising these sites more?


Engagement and motivation
This leads on to engagement and motivation. Most Web 2.0 communities have a high level of participant engagement to create stickability, the desire of the participant to return to the site day in day out. Engagement may involve blogs, chat, 'friendships', news, fun, forums, information, games, product reviews, and it is these that are the motivating factors behind whether a member will utilise the community or not. But which of these can used in a Research 2.0 community, and are the two compatible?

A common feature on community sites are ratings, which often appear as product ratings, rating of other community member's comments or satisfaction ratings. Examples are widespread, from price comparison to hotel recommendation sites. Similarly research uses ratings all the time to understand levels of satisfaction, interest and importance. Perhaps a natural synergy seems evident?

This was an interesting subject for Screwfix, since many panellists had already fed back the comment that they would like to be able to rate products or statements from other panel members. "We liked the idea of panellists leaving ratings. However, we wanted to know how reliable results were and where we could use them effectively," Viney explained.

Concerns had been expressed over the validity of research conducted in this way – not only because respondent samples are often very small, but also over how representative the response is of the population. From an observational point of view I would also add motivation into the mix since in some instances the results can appear accurate and yet at other times can be more confusing than helpful. An example I would draw on is hotel ratings. Recently I was looking to use one of my local hotels to put some colleagues up and having only ever eaten there I thought I'd see if there were any reviews. Of seven reviews, five of them gave a perfect 5/5 rating but in two the rating was a very low 2/5. If looking for an average you'd see 4.1, but who should I believe? Can this hotel be so good one week and so bad the next? Is it just people have different expectations and thus one person's 5 is another's 3. Have the 5 star reviews from one person set the expectation for the next, so that there was disappointment, and could that person overcompensate in their rating? Also with only seven to choose from how reliable are the results?

Here are four of the ratings:


***** Fantastic

I was slightly concerned when I read a couple of the reviews for this hotel, but I shouldn't have been!! Everything from the decor, the staff, the location and the atmosphere were...

** Grossly overpriced. Very Disappointing

Working in the area, away from home, on and off since last year, I have had mixed experiences in the various B&B and hotel establishments in which I have stayed in the area...

** Three star hotel with five star prices

I was very disappointed with this hotel having read previous reviews. I arrived with my wife on a Sunday evening for two nights. We were shown to our room which was adequate in...

** Hidden Gem

I booked this Hotel for a group of friends who were having a short weekend away visiting Kew Gardens and Windsor Castle. I was looking for a small quality establishment within easy...


From a Screwfix perspective this led to some key questions we wanted to explore in order to evaluate whether we could reliably use ratings in their community:

• Does previous knowledge of other people's ratings cause a sheep effect or over-compensation in their answers?

• What are the motivations for people to leave rating feedback?

• What type of people leave ratings?

• What are the profiles of these responders?

• Can we trust the results or does it just provide good engagement?


The ratings experiment
Firstly, one of the qualitative feedback comments from a number of panellists was "Why not give panel members the option to score other panel members suggestions?" So being part of the co-creative process of getting panellists to create features in their own panel, we decided to test whether this was something that all panellists would want. However, rather than just obtain quantitative results for this, we decided to expand the research to also assess some other community related suggestions and to see whether there was any impact in showing an average rating (i.e. was there a sheep effect, an over-compensation effect, or no effect at all).

An initial test was set up with the sample split into three similar respondent groups, A, B and C (1,127 respondents). Each group were shown the same three questions to rate on an agreement scale of 1 to 5, the statements came from qualitative feedback. The only difference between each group was the average score that they were shown. Groups A and B were shown a rating score for each question, which were fixed at a set level throughout, whereas Group C were never shown an average score, to act as a control.

The rating score shown to groups A and B were varied in the three tests, sometimes high, sometimes low, again as a cross test. The hypothesis being that if showing an average score had no effect on subsequent participants, we would expect all average scores to mirror those of group C, and we would expect groups A and B to be the same.

The 3 statements being:

1. I would like to see feedback or a summary of the survey, and the actions you are taking as a result of our inputs.

2. Having purchased items such as power tools it would be useful to leave feedback.

3. Why not give panel members the option to score other panel members suggestions.


The initial findings
The top-level results appear to show the following:


• Comparing group A and B for each statement shown, the group shown the higher average score always scored higher than the other group, whichever way round the high or low score was allocated.

• The control group average score always appeared in between the average scores for Group A and B for each statement, which suggested that showing a higher or lower rating, did have some influence on respondents' behaviour.

• Statements 1 and 2 were significantly more important to participants than statement 3. The difference here was marked. In both 1 and 2 the average scores were around 3.9, but statement 3 was closer to 3.1. Ratings therefore not being as popular as other features.


In addition to the top-level results there were also some interesting findings when looking at the data broken down by age category. The key findings here were:


• Younger people (aged 15 to 34) were significantly more likely to be affected by the ratings shown than the older age group (55+).

• On all three statements, the younger people (aged 15 to 34) were significantly more likely to rate these engagement style community features higher than the older age group (55+)


Considerations and conclusions
It is accepted that in a normal rating situation the average score shown does vary from one participant to the next, whereas in this test it was fixed. It is also accepted that the audience used was the Screwfix customer panel which, while representative of their customers, may not be representative of the UK population as a whole. Nevertheless, the results showed that there could be some impact in showing participants the average score from other people.

It is possible and perhaps likely that if we had continued this experiment beyond the 1,127 respondents, the average scores may have eventually come together for groups A, B and C, but the question is at what critical mass? And as we saw in the hotel ratings earlier, the variance of scores can be quite significant in the first few participants, so there could be a see-saw effect which eventually balances itself out?

Whatever the supposition, in this initial trial, differences were seen and have thus raised a question whether, or how, we would use ratings on the Screwfix panel site. In particular, it has raised more concern as to how reliably we can use this information for business decisions rather than as purely an engagement feature. In addition to this it is also interesting to see that being able to rate other people's comments was, for Screwfix panel members anyway, far lower scoring than the other community- type statements and even raises the question whether it is sufficiently engaging?

The findings would also suggest that the younger age groups are more easily swayed by other people's opinion, whereas the older age group seemed more set in their opinion. With social network sites being heavily biased towards the younger generation, would a rating system in such an environment provide robust results at all?

Next steps: we accept this is a trial test and more research is required to explore, understand and substantiate this further. We are now embarking on a series of additional tests and other hypotheses, which will include a closer look at how this is reflected in a wider UK audience, whether there is a critical mass at which ratings are totally reliable, and other questions that have been raised by this trial.

October | 2008

Have your say

Please add your comment. You can include links, but HTML is not permitted.
Your email address will not be displayed on the site

Mandatory
Mandatory
Mandatory
Mandatory