FEATURE1 May 2011
All MRS websites use cookies to help us improve our services. Any data collected is anonymised. If you continue using this site without accepting cookies you may experience some performance issues. Read about our cookies here.
FEATURE1 May 2011
Companies are turning to text analytics tools in an effort to make sense of what customers are saying about them online. Paul Golden investigates.
Those who bemoan the impact of email, text messaging and Facebook on our ability to write tend to forget one thing: that the people who have grown up with these tools write far more than any generation before them. They might not have much respect for the conventional rules of writing, but they’re making the language their own, and adapting it to different formats and audiences.
In the process they’re producing vast amounts of data, which marketers are keen to make sense of. That means understanding the meaning and sentiment of text – and numerous tech companies are queuing up to help researchers do this, each with grander claims than the last about their tools’ ability to accurately interpret and categorise millions of online comments. These text analytics tools are adept at picking up brand references from the massive volumes of online communication generated every day. But opinion is widely divided on their ability to determine whether a comment is positive, negative or neither.
Rights and wrongs
A study conducted last year by FreshMinds found that automated sentiment analysis tools were only accurate about 30% of the time – in other words you’d be much better off flipping a coin. However, providers of such tools argued that it wasn’t fair to test their software without first training it on language relating to the topic in question.
FreshMinds’ lead research manager Anna Tomkowicz told Research: “In our experience there is no single leading text analytics tool and it is impossible to check whether each claimed product improvement actually makes a difference – although in the early days they did not identify re-tweets and spam as accurately as they do now.”
Social media monitoring firm Radian6 recently announced plans to incorporate text analytics capabilities from six different providers into its service – indicating that users are still looking for a range of answers rather than trusting just one. But that doesn’t seem to be dampening interest in the area – Radian6 was bought by CRM firm Salesforce in March for $326m (see below).
There is, of course, always a trade-off between cost, timing and accuracy when trying to analyse text. But if you are going to base decisions on research, the tools have to be robust, says Tomkowicz. “Sentiment is evaluated by proximity of positive or negative words to the brand. Mixed sentiments are highlighted as neutral by the analytics tools, but we distinguish between mixed and neutral data because this means the person is happy about some features of the brand but unhappy about others, and this is very valuable data for the client.”
Other important considerations include the length of the text you’re looking at. Sentiment detection tends to work better on Twitter compared to other social media sources, she says, because of the 140-character limit on tweets.
Mark Westaby of Spectrum Consulting is confident that automated analysis can do the job. Spectrum uses semantic association to measure sentiment – a technique based on the associations made by the brain between words, and which does not require software to be trained. “This method is academically proven and we have carried out our own, extensive research which reveals a high correlation of meaning,” says Westaby.
Whether or not analytics tools can recognise irony or sarcasm is a point often raised by advocates of human analysis, but Westaby says it’s a “red herring [that] never arises in our work”. “Given the volumes of text we analyse, errors created by irony or sarcasm are very small and fall well within acceptable limits,” he said. False positives and negatives are not a big issue either, he says, for the same reason.
“We have developed our own version of the Likert scale, which is very effective for tracking and analysing sites such as Twitter where the restricted number of characters means people typically use words such as ‘love’ or ‘hate’ to express tonal sentiment.”
“Irony, sarcasm and slang are close to impossible for computers to understand, so there needs to be human intervention to guarantee accuracy”
Jessica Whittaker, BuzzNumbers
Mixed feelings
But not everyone is satisfied with the tools on offer. Jessica Whittaker of social media monitoring firm BuzzNumbers is not alone in saying that her company has struggled to find technology that could produce accurate measurements of sentiment. “Irony, sarcasm and slang are close to impossible for computers to understand, so there needs to be human intervention to guarantee accuracy,” she said.
Sentiment measurement is usually based either on dictionary definitions of words, or on natural language processing. Dictionary systems are as accurate and up to date as the dictionary they’re based on – emerging language can be added, but only once someone notices it’s missing.
Natural language processing systems are more flexible – using algorithms that learn rules by analysing large sets of example documents, in which certain terms have been tagged with information about what they mean or how they relate to the rest of the text. These systems are quicker to catch slang and new language, but anomalies do arise. For example, if a system finds a group of comments saying things like ‘I hate Christmas because my family are horrible’, it will code Christmas as negative – probably a mistake. It may take a while for enough new positive mentions of Christmas to come through and change it back.
Both these approaches have flaws, but then so do people, says Annie Pettit, head of research at Conversition. “The important thing to remember is that we don’t care if individual messages are coded correctly. We care that the aggregate coding across large numbers of messages validates well. We aren’t trying to prove whether I hate or love Pop-Tarts, we are trying to prove whether a million people hate or love Pop-Tarts. A reading that is 70% accurate is actually a very accurate reading.”
The scale offered by automated solutions also means you can get away with less accuracy, says Pettit. “If humans can score hundreds of messages and validate at 85%, and computers can score millions of records and validate at 70%, then computers are a valuable option for large datasets.”
Text analytics providers are set to benefit as social media monitoring catches the eye of business. Customer relationship management firms have shown particular interest in the area, with Salesforce recently paying $326m for social media monitor Radian6, and Overtone getting snapped up by Kana Software.
The tools seem to be of particular interest to companies looking to get an ongoing view of what their customers are up to, and an ability to respond quickly when there are problems. Salesforce said that integrating Radian6 into its services will allow it to provide “real-time social intelligence”.
For Radian6 customers, there will be no single text analysis solution. The firm revealed it was partnering with OpenAmplify, Klout and OpenCalais to incorporate their text and sentiment analytics capabilities into its platform, and it also plans to build in analysis from Clarabridge, PeekYou and Lexalytics. CEO Marcel Lebrun said the partnerships had allowed the firm to “amass the largest index of semantically enriched social data in the world”.
Another social media tracker, Visible Technologies, secured a further $6m of investment shortly after the Radian6 acquisition, with CEO Kelly Pennock saying the buy had “raised the stakes” for other social media monitoring providers.
The written word
In order to handle the nuances of language, the software needs to have really good coding that takes account of slang, as well as the capability of understanding words in context and learning over time, says Theo Downes-LeGuin, chief research officer at Market Strategies International. He acknowledges that sarcasm is one of the toughest things to code because the tone and emphasis often rely on broader contextual cues that go beyond the available text.
So a good text analysis tool will recognise ‘Thanks, NatWest, for charging me usurious rates’ as sarcasm because the usual positive affect of ‘thanks’ is outweighed by the strong negative of ‘usurious’. But a sentence like ‘Thanks, NatWest, for charging me such wonderfully high rates’ might be wrongly coded as positive, unless it existed as part of a larger block of text that provided more context.
Many social media measurement solutions base their sentiment analysis on whether predetermined ‘good’ and ‘bad’ terms crop up in the same phrases as the term that’s being looked for. Lisa Joy Rosner, chief marketing officer at NetBase, says the problem with these approaches is that understanding language is much more complex than simply looking for the presence and proximity of certain words. False positives and negatives crop up, and no tool on the market does a perfect job of getting sentiment right. Then again, says Rosner, it’s all relative. “In many studies on human annotation to the task of sentiment analysis, it was still found that accuracy never really exceeds 85%.”
Scott Blacker, senior director of product management at survey software maker Vovici, says most of these tools “work out of the box”. “The bad ones are bad and even the best rarely go above 80%, but then humans also disagree. The sentiment engines can be tuned to improve the accuracy of the systems so the more investment in tuning the categories, the greater the accuracy will be. The [size of the] initial data set for tuning these tools is also vital since the larger the data set, the larger the sample size to teach the engine.”
Vovici co-founder Jeffrey Henning adds that some vendors have attempted to compare human efficiency to machine, “but not many have done it with scientific accuracy. Most leading vendors claim their solutions work in other major languages, particularly European languages, and they tend to test well in Spanish, for example. The vendors who take more of a natural language approach tend to be stronger in English or French.” However Henning says he has yet to see a really good solution for comparing a survey across numerous languages.
“If humans can score hundreds of messages and validate at 85%, and computers can score millions of messages and validate at 70%, then computers are a valuable option”
Annie Pettit, Conversition
Details, details
The first step in any automated sentiment detection strategy is to define what you want to measure, says Seth Grimes, founder of analytics consultancy Alta Plana Corporation. “Once you have detected, you can create aggregate measures, plot and compare trends and so on. Tackle these not-so-basic basics before you take a shot at complexities such as irony and sarcasm because that stuff is very difficult to decode systematically. If you do want to automate, you are almost certainly going to need linguistic techniques that match word use and patterns to vocabulary and phrases that indicate irony and sarcasm.”
If you’re dealing with languages other than English, don’t fall into the trap of thinking you can just translate into English, analyse and then translate back again – expressions of sentiment are particularly tricky to translate. For these situations, Grimes suggests conducting simple analysis and leaving the rest to people. “For a less-used language, start by creating a lexicon of sentiment-bearing words (‘like’, ‘love’, ‘hate’, ‘bad’) in the target language and use it to detect sentiment for further analysis by a person. A partially automated solution of this nature will surely be better than human-only analysis, if only in its reach.”
Another variable that makes text analysis challenging is that not enough is understood about how people behave, communicate and interact online. For one thing, the language they use is distinct from that used in other areas, and is evolving fast. In theory, slang, vernacular and abbreviations shouldn’t cause a problem as long as they are correctly labelled in the texts used to train the software. But these things are changing constantly, so the task of building text analytics tools is never finished.
Campbell Keegan director Rosie Campbell says: “There is a way of interacting and communicating online that has its own rules, so regular social media contributors, particularly young people, almost never write anything that is not ironic and I am not sure text analytics has the capacity to tease this out.”
Market research isn’t the only business that has shown an interest in text analytics.
The techniques are also used widely in military intelligence, IT security, legal research and even to identify potential problems in manufacturing processes. Companies increasingly need to consider ‘information governance’ – managing their records and ensuring compliance.
“Text analytics has become in some organisations such a popular thing to do, it’s actually changed the nature of surveys”
Nick Patience, 451 Group
Within the field of market research, the use of text analytics isn’t confined to social media. In a webinar hosted by Attensity, Nick Patience of the 451 Group described how it has allowed companies to change the way they write questionnaires by allowing them to quickly and effectively analyse open-ended questions.
“It’s become in some organisations such a popular thing to do, it’s actually changed the nature of surveys. We’ve seen organisations that have had, say, a survey of 30 questions, the first 29 of which have been structured… then the last question would be, ‘Do you have any other comments?’ Text analytics has changed that, such that some of the more forward-thinking organisations have actually gone for all open-ended questions. You couldn’t do that before text analytics if you had any decent number of responses. That’s one area where text analytics has quite fundamentally changed things.”
Judgement day
According to the Terminator movies, 19 March this year was the day that the world’s computers became sentient, triggering nuclear Armageddon and the destruction of most of the human race. Fortunately, machines have yet to take over from people in the real world, and there seems to be a consensus that insight professionals still have a vital role in making sense of data from automated analysis – and coming up with strategic insights.
Grimes and Henning agree that a hybrid approach is the way forward. Machines can be used for their speed, reach, scalability and consistency, but people are needed to train, guide and oversee automated systems and interpret findings.
In fact, a huge benefit of automated analysis is that it shifts costs from data collection and allows much more time to be invested in insight analysis, which is as it should be.
“In some ways automated analysis creates more opportunities for research,” suggests Henning. “Some firms outsource data collection for reasons of cost, but these tools could allow them to bring this in-house.”
Clearly it’s not just the ability of text analytics software to learn that will make it more useful, but the ability of researchers to learn how best to deploy these tools in their work.
10 Comments
Anon
14 years ago
I think this article is sick, bad and wicked.....do I mean this positively or negatively? That "probably" depends on my age but text analytics is never going to know.
Like Reply Report
Mark Westaby
14 years ago
This soooooo misses the point and here's why: You'd NEVER use automation for a single article, just as you'd never use a sample of one for a piece of market research. So using the above piece or the Natwest example is just nonsense. Take 10,000 pieces of coverage and I guarantee the automated system would win. Also, remember this needs to be done in what is effectively real-time for which humans are useless and automation very good (and accurate). Also, why do people get so hot under the collar about analysing sarcasm?! That's fine if you're an academic who wants to produce an automated system that "understands sarcasm" but I've never ever come across a client who is remotely interested in this; and for the very good reason that doing so adds no value whatever to insight about a brand. Let humans do what they're best at, ie analysing small volumes of text, for which automation isn't suitable; and computers do what they're best at, which is analysing large volumes of text extremely quickly, for which humans are not suitable.
Like Reply Report
Annie Pettit
14 years ago
Let the debate begin! “If humans can score hundreds of messages and validate at 85%, and computers can score millions of messages and validate at 70%, then computers are a valuable option” Annie Pettit, Conversition Strategies (an e-Rewards company)
Like Reply Report
Tom H. C. Anderson
14 years ago
@Mark Westaby above is completely right. Sounds like this 'test' was rather limited in scope and tools used. I question also their degree of knowledge around actual use of the tool/s based on some of the comments such as "“Sentiment is evaluated by proximity of positive or negative words to the brand. Mixed sentiments are highlighted as neutral by the analytics tools", In fact there are many many way to approach text analytics depending on the situation. An analogy we like to use is that much like a carpenter the researcher needs to understand which tool to choose for the job at hand. Secondly, they need to know how to use that tool. No one talks about the accuracy of a hammer or a saw! Text analytics is the same, with the exception of some of the simpler tools, to get real value you do need some knowledge and best practices in place. No one seems to think all numeric data should be treated the same way? Why do people think that text from Twitter should be treated the same way as Google Financial News or Focus group transcripts. Ridiculous! There is domain and use case differences, to name a few. Our use case expertise obviously is marketing research. We have less experience in the PR use case (Radian6). RE the "sarcasm" argument. I often say I can't detect my own sarcasm in something I've written just a week ago. The way human coding is compared to text analytics is often a bit absurd. Next time you guys do an article on text analytics in marketing research call us. I've been working on that use case longer than anyone else in our industry, happy to give you some of our learning pro-bono -- NO Sarcasm Intended ;) -Tom Anderson Analytics
Like Reply Report
Tom H. C. Anderson
14 years ago
PS. Curious, the Author, Paul Golden, is that Paul Golden of Market Tools or someone else?
Like Reply Report
Mark Westaby
14 years ago
Perfectly valid point, Annie, though we know that our automated system can achieve significantly higher accuracy than 70%; and, crucially, much, much better than humans over the large volumes now required, which is really the point you're making. I can understand why people don't want to accept that automated systems are better than humans for some very important processes but whether they like it or not the fact is that they are; and the sooner this ridiculous thing about sarcasm is knocked on the head the better because it is such a red herring. Ironically, I suspect 10 different humans would most probably interpret a piece of sarcasm in different ways in any event.
Like Reply Report
Seth Grimes
14 years ago
For the full interview author Paul Golden did with me, please visit http://www.b-eye-network.com/view/15276 . And if you're a current or prospective user of these technologies, please take a survey I'm running: https://www.surveymonkey.com/s/Text11. I'll be reporting findings later this summer in a free report, Text Analytics Perspectives 2011. Seth, http://twitter.com/SethGrimes
Like Reply Report
Samir Batla
14 years ago
This article (which I enjoyed reading) reminds me of one of my favorite movies - talk about analyzing discourse! [Conversation between Product Manager and Industry Analyst]: IA: So, Mr. Product Manager, when will your new product be available? PM: I can't tell. IA: You can tell me. I'm an Analyst. PM: No, I mean I'm just not sure. IA: Well, can't you take a guess? PM: Well, not for another two months. IA: You can't take a guess for another two months!? Regarding sarcasm: In reality, are the number of sarcastic comments high enough to matter? And how much stock should we put into sarcastic comments anyway? My wife stops listening to me when I get sarcastic. :)
Like Reply Report
Dmitri
14 years ago
Excellent article - goes much deeper into the substance compared to most writeups on the matter. One point I'd like to add to the discussion: it's not only the overall sentiment ratio that matters, but also WHAT drives the sentiment trend in a positive or negative direction. Semantic analysis and text analytics can help to uncover such issues - we do it on OpinionCrawl.com. See the weekly/monthly trend charts and semantic clouds accompanying them.
Like Reply Report
Huw Hepworth
13 years ago
On the topic of measuring sarcasm, a fantastic example might be the #QANTASLuxury event that happened late in 2011. You can read a summary here (http://theconversation.edu.au/qantasluxury-a-qantas-social-media-disaster-in-pyjamas-4421) but the short of it was the appearance of a lot of sarcastic replies to a QANTAS promotion. One organisation did some analysis and came up with everything looking fine (http://igo2group.com.au/blog/demistfying-qantas-and-social-media/) but it would an interesting case study to run with some alternative text analytic software and see the results.
Like Reply Report