FEATURE18 November 2022
Mirella Lapata in seven
x Sponsored content on Research Live and in Impact magazine is editorially independent.
Find out more about advertising and sponsorship.
FEATURE18 November 2022
x Sponsored content on Research Live and in Impact magazine is editorially independent.
Find out more about advertising and sponsorship.
In the latest of a regular Impact series, Mirella Lapata, professor at the School of Informatics at the University of Edinburgh, discusses natural language processing and the explosion of data.
Mirella Lapata is a professor at the School of Informatics at the University of Edinburgh, and affiliated with the Institute for Communication and Collaborative Systems and the Edinburgh Natural Language Processing Group. She is also a fellow of the Royal Society of Edinburgh, the Association for Computational Linguistics, and Academia Europaea.
It focuses on natural language processing (NLP), specifically the representation, extraction and generation of semantic information from structured and unstructured data, involving text, images, video, and large-scale knowledge bases. My goal is to get computers to understand, reason with and generate natural language.
I was recently awarded a UK Research and Innovation Turing AI World-Leading Researcher Fellowship to work on reasoning, a skill that machines still lack and that humans are perfectly capable of. We have mastered the art of correlating and integrating different types of information from different sources, and re-using acquired experience and expertise to transfer it to radically different challenges and domains. I am hoping to build new models that can do exactly that.
As data gets bigger, processing it and storing it will become more challenging. Computers will have to get faster, computing should become cheaper, and our models will have to be better.
Over the past few years, deep learning has brought a revolution in NLP, producing remarkable results. Tasks such as machine translation or sentiment analysis have made huge leaps forward with respect to earlier state-of-the-art systems. There has also been substantial global commercial activity in the deployment of digital assistants (such as Amazon’s Alexa) and smart home devices (such as Google Nest). In the future, we will see more efforts to analyse very long documents, such as books, and being able to answer questions based on their content, creating summaries for each chapter, for the book as a whole or an entire book series. We have already seen developments that I did not think would have been possible 10 years ago.
Most data on the web, and within individual organisations, is in an unstructured format. The biggest barrier will be coming up with algorithms and tools than can jointly process different types of unstructured data (such as images together with text or video) and draw conclusions from incomplete and noisy data.
There has been an explosion of data recently, which has made the job of the insights industry more challenging – especially as the data is not just text, but involves multiple modalities, including audio, video and images. In addition, consumer behaviour is evolving faster nowadays than in the past, and changes in behaviour mean that analyses become obsolete quite fast, as the consumers and the data move on to the next trend.
Covid-19 has had a huge impact on universities and, by extension, on NLP research. It is fair to say that universities have not yet returned to a pre-pandemic normality. Interacting with colleagues and students, attending workshops and seminars, and exchanging ideas have all been hampered by the pandemic. It is not easy to know what different labs are working on and to keep abreast of new developments. The entire community has been compartmentalised, and the collective momentum has slowed down significantly.
This article was first published in the October 2022 issue of Impact.
0 Comments