Every day, millions of people engage with and seek information from ChatGPT and other large language models (LLMs). But how are the responses given by these models shaped by the language in which they are asked? Does it make a difference whether the same question is asked in English or German, Arabic or Hebrew? Christoph Steinert, a postdoc at the Department of Political Science of the University of Zurich (UZH), and physicist Daniel Kazenwadel from the University of Konstanz, Germany, have now conducted a systematic analysis of this question.
Information shapes armed conflicts
The researchers explored the issue in the contentious context of the Israeli-Palestinian and Turkish-Kurdish conflicts. They used an automated query procedure to ask ChatGPT the same questions in different languages. For example, the researchers repeatedly prompted ChatGPT in Hebrew and Arabic about the number of people killed in 50 randomly chosen airstrikes, including the Israeli attack on the Nuseirat refugee camp on 21 August 2014.
"We found that ChatGPT systematically provided higher fatality numbers when asked in Arabic compared to questions in Hebrew. On average, fatality estimates were 34% higher," Steiner says. When asked about Israeli airstrikes on Gaza, ChatGPT mentions civilian casualties more than twice as often and killed children six times more often in the Arabic version. The same pattern emerged when the researchers queried the chatbot about Turkish airstrikes against Kurdish targets and asked the same questions in Turkish and Kurdish.
The phrase "The first casualty when war comes is truth" is often attributed to US senator Hiram Johnson (1866–1945). Throughout history, selective information policies, propaganda and misinformation have influenced numerous armed conflicts. What sets current conflicts apart is the availability of an unprecedented number of information sources – including ChatGPT.
Exaggerated in one language, embellished in the other
The results show that ChatGPT provides higher casualty figures when asked in the language of the attacked group. In addition, ChatGPT is more likely to report on children and women killed in the language of the attacked group, and to describe the airstrikes as indiscriminate. "Our results also show that ChatGPT is more likely to deny the existence of such airstrikes in the language of the attacker," adds Steinert.
The researchers believe this has profound social implications, as ChatGPT and other LLMs play an increasingly important role in information dissemination processes. Integrated in search engines such as Google Gemini or Microsoft Bing, they fundamentally shape the information provided on various topics through search queries.
"If people who speak different languages obtain different information through these technologies, it has a crucial influence on their perception of the world," Christoph Steinert says. Such language biases could lead people in Israel to perceive airstrikes on Gaza as causing fewer casualties based on information provided by LLMs, compared to Arabic speakers.
Biases and information bubbles
Unlike traditional media, which may also distort the news, the language-related systematic biases of LLMs are difficult for most users to detect. "There is a risk that the increasing implementation of large language models in search engines reinforces different perceptions, biases and information bubbles along linguistic divides," says Steinert, which he believes could in the future fuel armed conflicts such as in the Middle East.