Niks meer missen?
Schrijf je in voor onze nieuwsbrief!
Foto: Marc Kolle
international

ChatGPT works best for languages of economically prosperous countries

Sija van den Beukel,
30 mei 2023 - 16:02

How well ChatGPT works depends on the language you use. The algorithm excels in languages of economically prosperous countries, while languages of developing countries often still fall by the wayside. Professor Christof Monz is working on ways to make the algorithm language-independent.

If you ask ChatGPT-4 in Dutch what the most expensive painting by Karel Appel is, the answer is, “Woman Children Animals worth $750,000.” Ask the same question in English and the chatbot will say, “Two birds and a flower, which was sold for $1.1 million.” In Swahili, the algorithm adds: “Vive la France, worth €4.8 million.” This last answer turns out to not even be a painting by Karel Appel.

 

With this example, Professor of Language Technology Christof Monz wants to illustrate two things. One: ChatGPT is by no means always reliable. And two: reliability depends on the language you use.

Foto: Kirsten van Santen

This is because ChatGPT needs a huge amount of data for its training. The more data it has at its disposal, the smarter it gets. Monz says: “For a language for which little data is available, it’s like having a very big brain while only reading children’s books. So in those languages, the algorithm is less intelligent.”

 

“Big” and “small” languages

Thus ChatGPT creates inequality, Monz argues. Indeed, the algorithm is trained primarily with English texts, followed by Chinese, Spanish, and other languages of economically prosperous countries. “Of these ‘major’ languages - seen from an AI perspective - there are at most two hundred, while there are 7,000 languages spoken in the world.”

 

So the “smaller” languages such as Bengali or various African languages are at a disadvantage when it comes to the quality of ChatGPT, even though those languages sometimes have more speakers. Monz comments: “Language independence would also help provide equal access to information. Then we would not depend on sources like NOS (the public broadcasting system of the Netherlands) to obtain information about other countries. You could read news from other countries directly through newspapers or social media.”

“For a language for which little data is available, it’s like having a very big brain while only reading children’s books”

Monz received a Vici research grant in 2020 for his research and is now developing models that can “see across language barriers” and thus produce language-independent answers. To do so, Monz uses human translations and language pairs. Can an Arabic-French translation through a French-Dutch translation also say something about an Arabic-Dutch translation? “We are trying that now for about 10 to 20 languages. Ideally, we will find a universal pattern for sentences in different languages with the same meaning.”

 

Depressing

That research will not produce a new version of ChatGPT, for which big tech companies are way too far ahead in terms of budget and available information. “PhD students sometimes get depressed about that, asking themselves: How can we ever compete with that? And aren’t all problems already being solved?” Yet there are still research questions; ChatGPT is far from perfect. Monz adds: “We can hopefully make a small contribution to better translation systems that ChatGPT can integrate.”

 

Christof Monz delivers his talk, “Meaningful Language Technology: From Patterns to Meaning and Back,” on Thursday, June 1, at 4:30 p.m., in the Aula (Old Lutheran Church). Attendance is free.