FEATURE1 June 2009

PASW Text Analytics for Surveys (SPSS) reviewed

Technology

One of the greatest logistical issues with online research is handling the deluge of open-ended responses that often arrive. While much of the rest of the survey process can be automated, analysing verbatim responses to open questions remains laborious and costly. If anything, the problem gets worse with Web 2.0-style research. A lot of good data gets wasted simply because it takes too long and costs too much to analyse – which is where this ingenious software comes in.

PASW Text Analytics for Surveys (TAfS) operates as either an add-on to the PASW statistical suite – the new name for the entire range of software from SPSS (see overleaf) – or as a standalone module. It is designed to work with case data from quantitative surveys containing a mixture of open and closed questions, and will help you produce a dazzling array of tables and charts directly on your verbatim data, or provide you with automatically coded data.

A wizard helps you to start a new project. First you specify a data source, which can be data directly from PASW Statistics or PASW Data Collection (the new name for Dimensions an ODBC database or an Excel file (via PASW Statistics). Next you select the variables you wish to work with, which can be a combination of verbatim questions, for text analysis, and ‘reference questions’ which are any other closed questions you would like to use in comparisons to classify responses or to discover latent relationships between text and other answers. Another early decision in the process is the selection of a ‘text analysis package’ or TAP.

SPSS designed TAfS around the natural language processing method of text analysis. This is based on recognising words or word stems, and uses their proximity to other word fragments to infer concepts. The method has been developed and researched extensively in the field of computer-based linguistics, and can perform as well as, if not better, than human readers and classifiers if used properly.

A particular disadvantage of using NLP with surveys is the amount of set-up that must be done. It needs a lexicon of words or phrases and also a list of synonyms so that different ways of expressing the same idea converge into the same concept for analysis. If you wish to then turn all the discovered phrases and synonyms into categorised data, you need to have classifiers. The best way to think of an individual classifier is as a text label that describes a concept – and behind it, the set of computer rules used to determine whether an individual verbatim response falls into that concept.

TAfS overcomes this disadvantage by providing you with ready-built lexicons (it calls them ‘type’ dictionaries), not only in English, but in Dutch, French, German, Spanish and Japanese. It also provides synonym dictionaries (called substitution dictionaries) in all six supported languages, and three pre-built sets of classifiers – one for customer satisfaction surveys, another for employee surveys and a third for consumer product research. It has developed these by performing a meta-analysis of verbatim responses in hundreds of actual surveys.

Out of the box, these packages may not do a perfect job, but you will be able to use the analytical tools the software offers to identify answers that are not getting classified or those that appear to be mis-classified, and use them to fine tune them or even develop your own domain-specific packages. So selecting dictionaries and classifiers is done in just a couple more clicks in the wizard, after which the software processes your data and you are ready to start analysing the verbatims.

The main screen is divided into different regions. One lets you select categories into which the answers have been grouped, another lets you review the ‘features’ or words and phrases identified, and in the largest region there appears a long scrolling list of all your verbatim responses to the currently selected category or feature. All the extracted phrases are highlighted and colour-coded. The third panel shows the codeframe or classifers, which is a hierarchical list. As you click on any section of it, the main window is filtered to show just those responses relating to that item. However, it also shows you all the cross-references to the other answers, which is very telling. There is much to be learned about your data just from manipulating this screen, but TAfS has much more up its sleeve.

One potentially useful feature is sentiment analysis, in which each verbatim is analysed according to whether it is a positive or a negative comment. Interface was not able to test the practical reliability of this, but SPSS claim that it works particularly well with customer satisfaction type studies. In this version, sentiment analysis is limited to the positive/negative dichotomy, though the engine SPSS uses is capable of other kinds of sentiment analysis too.

The software also lets you use ‘semantic networks’ to uncover connections within the data and build prototype codeframes from your data, simply by analysing the frequency of responses to words and phrases and combinations of words and phrases, rather like performing a cluster analysis on your text data – except it is already working at the conceptual level, having sorted out the words and phrases into concepts.

You can build codeframes with or without help from semantic networks. It’s a fairly straightforward process, but it does involve building some rules using some syntax. I was concerned about how transparent and maintainable these would be as you handed projects from one researcher to another.
Another very useful tool, which takes you beyond anything you would normally consider doing with verbatim data, is a tool to look for latent connections between different answers, and even between textual answers and closed data, such as demographics.

This may be a tool for coding data, but it is not something you can hand over to the coding department. If you put in a little effort, though, this tool not only has the potential to save hours and hours of work, but to let you dig up those elusive nuggets of insight you probably long suspected were in the heaps of verbatims, if only you could get at them.

?
THE VERDICT: PASW Text Analytics for Surveys from SPSS
Textual analysis software which uses the Natural Language Processing method to process textual data from verbatim response to surveys which will categorise or group responses, find latent associations and perform classification or coding, if required.

Ease of use – 3.5 out of 5
Cross-platform compatibility – 4 out of 5
Value for money – 3.5 out of 5

Cost
One-off costs: standalone user £2,794; optional annual maintenance £559. Single concurrent network user: £6,985 software, plus maintenance £1,397

Pros
• Flexible – can use it to discover and review your verbatims individually, or to produce coded data automatically under your supervision
• User interface is simple and productive to use, once you are familiar with the concepts
• Lets you relate your open-ended data to closed data, other questions or demographics
• Easy import and exports from SPSS data formats or Microsoft Excel

Cons
• This is an expert system which requires time and effort to understand
• System relies on dictionaries, which need to be adjusted for different subject domains
• Rules-based approach for defining coded data requires learning and using some syntax

Further info
spss.com

?

What’s in a name? Tim Macer on the SPSS product strategy
?It’s ten years since SPSS announced its vision for the future of research software in 1999: its ‘Vision 2000’. Its dedicated MR division, SPSS MR was tasked with turning this vision into reality and the product was named Dimensions. A stream of products eventually started to appear for customers using the firm’s legacy products like Quancept, Surveycraft and In2quest.

In acquiring this family of products, SPSS had become the undisputed global number one supplier of MR software.

There is no doubt that Dimensions was among the most technically advanced for MR when it emerged and the platform has allowed customers to build ingenious software solutions of their own in a way they only dreamed before. But the project has been dogged with problems too, with customers criticising the software for being over-complex – increasing, not decreasing the skill level of those required to run and manage the software – and for being slow. Some IT managers have needed to cluster unprecedented numbers of servers in order to deliver performance, while several rival packages still seem to operate satisfactorily as single-box solutions.

No happy anniversary celebrations have been announced by SPSS. Instead the Dimensions name is being dropped. SPSS Inc. wants to
see its extensive product family of some 50 programs united under a new name: PASW. It stands for Predictive Analytics Software.

The iconoclastic new product names seem to require exceptional powers of recall. mrInternview becomes ‘PASW Data Collection Interviewer Web’. mrStudio becomes ‘PASW Data Collection Base’. Even the venerable SPSS becomes ‘PASW Statistics Base’. SPSS will live on only as as company name.

The firm denies that this name change has any connection with the approach from SPSS founder Norman Nie last year, who has offered to sell the SPSS name to the firm for $20 million. It is also adamant that its commitment to the market research community
is undiminished. However, ten years on, there is no longer a specialist MR division and the firm’s focus is clearly on predictive analytics and modelling based on business intelligence.

Let’s hope that the new SPSS remembers we do much more than that in market research. There’s only so much you can learn by looking in the rear-view mirror, no matter how cleverly.

3 Comments

15 years ago

where's the link to the application?

Like Report

14 years ago

Can you clarify on the costs of this product? Is the base price 2794 with each additional user at 6985? Not sure I'm clear on this...

Like Report

14 years ago

Melissa - there are two different price models - the £2,794 price is for a single licence tied to a single user on their PC, the £6,985 is for a single network licence that anyone can use in the organisation, but the number of people using it at once is limited to the number of concurrent user licences you buy - and the £6,895 price is for just one licence. These are the prices I was given in April last year,so they may have varied since then, especially since the underlying prices are in US Dollars. I don't have prices for add-on users, but these were less than the base price.

Like Report