FEATURE17 January 2024

Learning the language of terrorism

Data analytics Features Innovations Public Sector

Terrorism prevention needs new approaches, and Prof Harvey Whitehouse, Dr Julia Ebner and Dr Chris Kavanagh from Oxford University have devised a novel and potentially more reliable diagnostic tool. 

Winners of 2023 President's Medal receive their award at MRS Awards

Pictured is Prof Harvey Whitehouse and Dr Julia Ebner collecting the President’s Medal 2024 from Saj Arshad which was awarded in recognition of their work at the MRS Awards last December. 

How intelligence services identify and assess online discourse that is likely to lead to real world violence is based on outdated approaches. Extremist behaviour online is increasingly fragmented, unpredictable and coded – it’s a long way from the lists of proscribed terrorist organisations that have been the focus of surveillance in the past. 

The Oxford University team’s approach builds on the pre-existing concept of ‘identity fusion’ – a very strong form of group binding in which members of a group feel they share the same essence – whether that is about sharing genes in the case of close relatives or sharing life-defining experiences in the case of frontline fighters in a military organization. Extremist groups – from Qanon and far right groups to suicide bombers – are often fused with each other in much the same way and this can provide the core motivation for carrying out acts of violence when they believe their groups are threatened.

Using Natural Language Processing (NLP) the researchers analysed manifestos and tracts from violent and non-violent groups, and compiled a lexicon of hate speech. The highest risk arose when individuals used language that indicted a potent combination called the ‘fusion + threat model’. An example of this is the use of kinship words like ‘brother’, ‘sister’, ’motherland’ which indicate fusion, combined with language which suggests an existential threat to the group (exposing theories like ‘The Great Replacement’). Dehumanising and demonising language applied to the out-group and violence condoning norms among members of the in-group were additional statistically significant markers in the language of violent extremists.

In future, this linguistic framework could enable intelligence services and technology companies to anticipate violent intent more accurately than looking for explicit declarations of violence.

How do you recognise identity fusion in linguistic terms – what are the symptoms?

Identity fusion manifests itself linguistically in the use of kinship terms or metaphors of shared blood. For example, fused individuals would refer to other members of their in-group as “family”, “brothers” or “sisters”.

Trolling and satire are defining aspects of internet discourse – how do you take that into account?

Trolling and satire have made the work of intelligence agencies and tech firms more difficult because the lines have become blurry between actual threats of violence and satirical posts or trolling activities. Experts have called this phenomenon the “gamification” of radicalisation and terrorism. Our framework offers a solution to the challenges of gamified radicalisation, as it doesn’t focus on threatening language as such but looks instead at deeper psychological patterns that are indicative of a deadly mindset.

Does this help identify potential violent behaviour from individuals not associated (fused) with groups?

Our definition of groups draws on group psychology to explain violent extremism. It’s important to emphasize that there may well be other factors that come into play, but which fall outside our framework. For example, some people who carry out murderous attacks that result in their own deaths may be motivated by individual pathologies rather than strong forms of group bonding. But since so many forms of intergroup violence seem to be best explained by the ‘fusion plus threat’ model, it’s important to understand how that works and to use that knowledge to create more effective tools for early detection and prevention.

It’s also important to understand that we don’t view groups as being defined by the spaces or communication modes through which they operate but rather by the types of bonds between their members. For example, a loose community of online users may be a group (or in-group). As soon as there is a perceived in-group, identity fusion can occur and pave the way towards violence. We should note, however, that our analysis indicates that some individuals may move towards violent behaviour via different pathways. For example, violent misogynist Incel terrorists might be an important exception to our framework, as they don’t tend to hold the same “us versus them” mindset but are often “loners” who are in a perceived battle again humanity.

Are there geographical or cultural limitations to this approach?

The ‘fusion plus threat’ model has been tested by Whitehouse’s team in a great range of geographical and cultural settings – from insurgent groups in Libya to violent football fans in Brazil, and from Muslim fundamentalists in Indonesia to farmer-herder conflicts in Cameroon. The same fundamental group psychology appears to be universal and rooted our evolutionary past.

An important advantage of our new framework is that the examined key variables are not determined by ideological or cultural factors. As the relevant variables are revealed unconsciously in language, they also reach beyond strategically chosen words of escalation or de-escalation and are a more reliable predictor than explicit threats to violence. Presently, our research has relied on English language material but we would like to see replication efforts in other languages.

Language changes fast on the internet – how does this approach keep up?

Today, entire think tanks and research institutions are dedicated to studying online subcultures, encompassing their communication and linguistic specificities. For example, today’s extreme right has built its own lexicon of hate speech and uses a range of subculture references, memes and insider jokes. Our approach is to look at psychological patterns (specifically, identity fusion) that are largely independent of these language changes in online subcultures. However, our vocabulary lists used for the natural language processing should also be updated to reflect emerging subculture terms used to express threats to the in-group, to dehumanise or demonise the enemy, or to condone violence.

What about access to hidden or private forums – does more needs to be done to open these up to research?

Limited access to data is one of the biggest challenges for developing rigorous models that can trace patterns in pathways to violence. Technology companies should make their datasets available to academic researchers. This is not only true for hidden fringe forums and private chats in end-to-end encrypted messaging apps but even for the more public platforms. For example, Twitter

(now X) has made it much harder under Elon Musk’s leadership to legally gather and analyse its data.

Have there been any real world applications of this approach – where would you expect to see them first?

The intelligence community and big tech firms have expressed interest in the approach and started to reconsider their approaches to violence risk assessments in online environments. We presented the findings of the research project to over 200 employees of the German domestic intelligence agency, Bundesamt für Verfassungsschutz, at the Cologne and Berlin headquarters and gave briefings to YouTube’s executives and policy teams at the firm’s headquarters in San Bruno, California. We have also advised a range of intelligence and security agencies, governmental units and tech firms in Europe and North America based on the research. Some of them have taken steps to integrate the new socio-psychological framework into their workstreams.

Does AI offer an opportunity to extend your work – how?

AI offers many opportunities for our work. One next step could be to integrate the violence risk assessment model with AI tools that can further refine the linguistic markers and test them for different areas for application. Today, AI-based predictive policing is primarily used for geographic hot-spot mapping and spatial risk calculations. But risk assessments of individual offenders have seen significant growth in recent years. Any use of AI tools will need to be complemented with manual reviews and should carefully consider ethical challenges. For example, human bias can be replicated and even amplified by AI-supported predictive policing.

What challenges did you run into in building your framework – technical or other?

The project was marked by challenges related to data availability and access conditions for the gathering of violent and non-violent extremist group datasets. The selection of online groups we used to test our framework on is not a representative sample that reflects the entire far-right extremism landscape in cyberspace. It is rather a convenience sample of virtual groups from across the violence spectrum that were suitable for the purpose of this

analysis. Moreover, the data collection for this research project was challenging due to the widely differing architectures of the online platforms that needed to be scraped to obtain the relevant datasets. Instead of using a self-written Python scrip for each platform, we therefore made use of a commercial web scraping service.

What are the limitations of this approach in predicting violent behaviour?

The new violence risk assessment framework carries inevitable limitations in its design. While the framework was developed based on previous evidence concerning psychological and linguistic violence predictors found in offline settings, as well as a comparative language-based analysis of manifestos of authors with varying levels of real-world violence records, it was subsequently applied to the analysis of content from online groups across the violence spectrum. However, neither offline group dynamics nor the process of manifesto writing can be fully equated with the message exchanges in online groups. There are clear differences in the communication modes, as well as the timeframe and intentionality of the texts produced by manifesto authors as opposed to members of online groups.

Another notable limitation stems from the highly contextual nature of online messages. To address this limitation, we carried out manual reviews to remove false positives from our NLP analysis results. These reviews were informed by the coding framework, which was previously tested in an ICR analysis with the help of two expert coders and 24 non-expert coders. However, the messages in the manual sample reviews were often ambiguous and subject to interpretation. For example, “plague” could be read as either demonisation or dehumanisation, depending on the context. The sentence “I am fighting for what I believe is right, not dreaming of some goofy revolution” could be interpreted as a physical or a metaphorical fight. We sought to address these challenges in the additional qualitative assessment that explored the nature and context of messages in more depth.

Can this approach be used to analyse content in other formats, for example video or audio?

It would be possible to create a coding framework for images or audio messages to trace proxies for the variables in our framework (i.e. identity fusion, existential threat, dehumanisation, demonisation or violence condoning norms). We haven’t actually done this yet so it would need some discussion with the team but we would likely begin by simply transcribing verbal content in audio or video formats and code the content in much the same way as we did for written materials (e.g., manifestos). Automating such processes is something we would consider later. As computational analytics for visual/audio formats is not yet as sophisticated as text-based NLP, there might be additional challenges.  

What is the next stage / evolution in this research topic?

We would welcome efforts from other researchers to attempt to replicate and build on our findings. We are also seeking closer collaboration with the policy community to develop more sophisticated diagnostic tools and early interventions/ preventative measures. Our next project will use the framework to investigate the risk that despotic heads of state resort to extreme forms of violence, for example in the form of genocides or wars of aggression, at high risk to self. We will start by analysing the communication materials of historical leaders who have been convicted as war criminals for carrying out violent atrocities against their own populations or other country’s populations, before moving on to contemporary leaders to assess their risk of using extreme violence.

This article was first published in the Research Live Industry Report 2024.