University of Bath press release
DECOTA transforms open-ended survey responses into clear themes — helping policymakers make better use of underutilised public feedback
- AI tool DECOTA analyses free-text data rapidly, affordably, and with human-like accuracy
- Free-text data is rich in insight, but is often underused due to the time and cost of analysing it manually
- Research team at the University of Bath say DECOTA could help ensure more public voices are included in policy decisions
A powerful new AI tool, published today, offers a fast, low-cost way to understand public attitudes – by automatically identifying common themes in open-ended responses to surveys and policy consultations.
DECOTA – the Deep Computational Text Analyser – is the first open-access method for analysing free-text responses to surveys and consultations at scale. Detailed in a research paper published in Psychological Methods today (Monday 7 April), the tool delivers insights around 380 times faster and over 1,900 times more cheaply than human analysis, while achieving 92% agreement with human-coded results.
It uses fine-tuned large language models to identify key themes and sub-themes in open-ended responses – where people share their views in their own words. While rich in insight, this type of qualitative data is notoriously time-consuming to analyse – meaning it often goes unused.
Developed by a multidisciplinary team at the University of Bath – led by recent PhD graduates Dr Lois Player and Dr Ryan Hughes, with support from Professor Lorraine Whitmarsh – the tool is designed to help governments and organisations better understand the people they serve.
The tool came about initially to better understand opinions about climate policies; however, it can be applied to a wide range of applications. It has already garnered interest from four UK Governmental bodies, academic institutions, and global think tanks.
Dr Lois Player, who completed her PhD in Behavioural Science within Bath's IAAPS Doctoral Training Centre, explains: "When thousands of people respond to surveys or consultations, it's often impossible to analyse all that free-text data by hand. DECOTA makes it possible to summarise which themes are most common in large populations – in a way that simply wouldn't be feasible otherwise."
Detailed, human-like accuracy
DECOTA is grounded in a well-established qualitative analysis technique known as thematic analysis, which sees researchers manually group free-text data into common themes. Mirroring this, DECOTA uses a six-step approach involving two fine-tuned large language models and a clustering approach to identify the themes and sub-themes underlying the data.
The team compared DECOTA's performance to human analysts on four example datasets. DECOTA detected 92% of the sub-themes found by analysts, and 90% of the broader themes. Remarkably, DECOTA generated insights in just 10 minutes, compared to an average of 63 hours for the human analysts – a startling 380 times faster.
These time savings have huge cost implications – with DECOTA analysing responses from around 1,000 participants for just $0.82, compared to approximately $1,575 using a human research assistant paid $25 per hour. DECOTA is even 240 times faster and 1,220 times cheaper than existing state-of-the-art computational methods, such as topic modelling.
"Importantly, DECOTA is not designed to replace human thematic analysis, but rather complement it," explains Dr Player. "We want it to unlock the huge volumes of data going unanalysed, allowing more voices to be heard in policy and decision-making settings, and freeing up valuable researcher time for deeper, more interpretative work."
Going beyond thematic analysis, the tool also determines which demographic groups are more likely to mention certain themes. For example, it can ascertain if women are more likely than men to mention a specific issue, or whether younger people are more likely than older people to highlight certain themes. It also draws out representative quotes for each sub-theme, aiding interpretation of results.
Transparency built-in
Dr Ryan Hughes, whose PhD focused on Mechatronics and Data Science, adds: "DECOTA doesn't just summarise data. It also provides depth, showing who said what, and how often. It's also transparent by design. It doesn't hide how it processes data: researchers can inspect and edit each stage of the pipeline, and all the code is openly available on the Open Science Framework."
Professor Lorraine Whitmarsh says: "DECOTA offers a huge leap forward in the analysis of open-ended questionnaire data. Applying machine learning to analyse large volumes of text will save time and money for researchers and policymakers wanting to understand public attitudes, allowing for a stronger role of public engagement in policy design."
Openly accessible online, the tool is detailed in the research paper The Use of Large Language Models for Qualitative Research: the Deep Computational Text Analyser (DECOTA), published today in the journal Psychological Methods (DOI: 10.1037/met0000753).
The team say that DECOTA will continue to be developed over time, with plans for a user-friendly web application, accessible to those unfamiliar with code.
Parties interested in receiving updates about DECOTA or participating in the initial rollout can express their interest via a contact form at: https://tinyurl.com/DECOTAform