Google Translates Flaws Reveal AIs Future

The computer scientists Rich Sutton and Andrew Barto have been recognised for a long track record of influential ideas with this year's Turing Award , the most prestigious in the field. Sutton's 2019 essay The Bitter Lesson , for instance, underpins much of today's feverishness around artificial intelligence (AI).

Author

  • Adam Lopez

    Reader in Informatics, University of Edinburgh

He argues that methods to improve AI that rely on heavy-duty computation rather than human knowledge are "ultimately the most effective, and by a large margin". This is an idea whose truth has been demonstrated many times in AI history. Yet there's another important lesson in that history from some 20 years ago that we ought to heed.

Today's AI chatbots are built on large language models (LLMs), which are trained on huge amounts of data that enable a machine to "reason" by predicting the next word in a sentence using probabilities.

Useful probabilistic language models were formalised by the American polymath Claude Shannon in 1948, citing precedents from the 1910s and 1920s. Language models of this form were then popularised in the 1970s and 1980s for use by computers in translation and speech recognition , in which spoken words are converted into text.

The first language model on the scale of contemporary LLMs was published in 2007 and was a component of Google Translate, which had been launched a year earlier. Trained on trillions of words using over a thousand computers, it is the unmistakeable forebear of today's LLMs, even though it was technically different.

It relied on probabilities computed from word counts, whereas today's LLMs are based on what is known as transformers. First developed in 2017 - also originally for translation - these are artificial neural networks that make it possible for machines to better exploit the context of each word.

The pros and cons of Google Translate

Machine translation (MT) has improved relentlessly in the past two decades, driven not only by tech advances but also the size and diversity of training data sets. Whereas Google Translate started by offering translations between just three languages in 2006 - English, Chinese and Arabic - today it supports 249. Yet while this may sound impressive, it's still actually less than 4% of the world's estimated 7,000 languages .

Between a handful of those languages, like English and Spanish, translations are often flawless. Yet even in these languages, the translator sometimes fails on idioms, place names, legal and technical terms, and various other nuances.

Between many other languages, the service can help you to get the gist of a text, but often contains serious errors . The largest annual evaluation of machine translation systems - which now includes translations done by LLMs that rival those of purpose-built translation systems - bluntly concluded in 2024 that " MT is not solved yet ".

Machine translation is widely used in spite of these shortcomings: as far back as 2021, the Google Translate app reached 1 billion installs . Yet users still appear to understand that they should use such services cautiously: a 2022 survey of 1,200 people found that they mostly used machine translation in low-stakes settings, like understanding online content outside of work or study. Only about 2% of respondents' translations involved higher stakes settings, including interacting with healthcare workers or police.

Sure enough, there are high risks associated with using machine translations in these settings. Studies have shown that machine-translation errors in healthcare can potentially cause serious harm, and there are reports that it has harmed credible asylum cases . It doesn't help that users tend to trust machine translations that are easy to understand , even when they are misleading.

Knowing the risks, the translation industry overwhelmingly relies on human translators in high-stakes settings like international law and commerce. Yet these workers' marketability has been diminished by the fact that the machines can now do much of their work, leaving them to focus more on assuring quality.

Many human translators are freelancers in a marketplace mediated by platforms with machine-translation capabilities. It's frustrating to be reduced to wrangling inaccurate output, not to mention the precarity and loneliness endemic to platform work . Translators also have to contend with the real or perceived threat that their machine rivals will eventually replace them - researchers refer to this as automation anxiety .

Lessons for LLMs

The recent unveiling of the Chinese AI model Deepseek, which appears to be close to the capabilities of market leader OpenAI's latest GPT models but at a fraction of the price, signals that very sophisticated LLMs are on a path to being commoditised. They will be deployed by organisations of all sizes at low costs - just as machine translation is today.

Of course, today's LLMs go far beyond machine translation, performing a much wider range of tasks. Their fundamental limitation is data, having exhausted most of what is available on the internet already. For all its scale, their training data is likely to underrepresent most tasks , just as it underrepresents most languages for machine translation.

Indeed the problem is worse with generative AI: unlike with languages, it is difficult to know which tasks are well represented in an LLM. There will undoubtedly be efforts to improve training data that make LLMs better at some underrepresented tasks. But the scope of the challenge dwarfs that of machine translation.

Tech optimists may pin their hopes on machines being able to keep increasing the size of the training data by making their own synthetic versions, or of learning from human feedback through chatbot interactions. These avenues have already been explored in machine translation , with limited success.

So the forseeable future for LLMs is one in which they are excellent at a few tasks, mediocre in others, and unreliable elsewhere. We will use them where the risks are low, while they may harm unsuspecting users in high-risk settings - as has already happened to laywers who trusted ChatGPT output containing citations to non-existent case law.

These LLMs will aid human workers in industries with a culture of quality assurance, like computer programming, while making the experience of those workers worse. Plus we will have to deal with new problems such as their threat to human artistic works and to the environment . The urgent question: is this really the future we want to build?

The Conversation

Adam Lopez previously received funding from Google in 2015.

/Courtesy of The Conversation. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).