How Latam-GPT is building culturally relevant AI for the region – Context News


David Feliba
Published: 8 hours and 21 mins ago
An illustration of a mobile phone showing a chat between a user and an AI chatbot. Thomson Reuters Foundation/Nura K. Ali
Latin America to launch region's first AI GPT language model in September after two years of building led by Chile.
BUENOS AIRES – “Tell me the most recent relevant books and novels from Chile.”
That was the prompt Chilean engineers at the state-run National Center for Artificial Intelligence (CENIA) gave OpenAI’s ChatGPT to test its grasp of Latin America culture.
But when the chatbot replied with only titles by renowned Chilean poet Pablo Neruda and not much else, researchers were not impressed.
“It seemed like it only knew Neruda’s work,” said Carlos Aspillaga, a computer science engineer at CENIA, who worked on the project.
“The model lacked diversity and wasn’t locally accurate. Worse, some of the books it mentioned didn’t even exist, or had factual errors.”
The exchange highlighted a key limitation of mainstream AI, revealing that it struggles with regional precision and nuance.
That means its responses can lack accuracy or contain mistakes when addressing highly localised matters, particularly in small countries with languages other than English.
Mainstream large language models, though equipped with multilingual capabilities, are predominantly trained on English-language content that still dominates the internet.
Much of the data related to Latin America comes from Spain, or is translated from texts originally written in English, which could explain why the models often fail to produce content that feels authentic or culturally grounded to Latin American users.
That realisation sparked CENIA’s two-year effort to create a GPT-style language model rooted in Latin America and one that reflects the region’s diverse culture and languages.
The result is LatAm-GPT, set to launch this September as the first large language model in the region.
Built with input from more than 30 regional institutions, and developed specifically for Latin America, it is a milestone in a global AI race that could leaving emerging economies behind.
“What’s crucial for Latin America is to jump on this technology now,” Aspillaga told Context.
“We’re at a point where it’s still feasible to adopt and adapt existing techniques. Maybe in five years, that window would close. This lets us start building our own know-how,” he said.
Unlike widely known models like OpenAI’s ChatGPT or Meta’s LLaMA, LatAm-GPT flips the dynamic of relying heavily English language and global North datasets.
It is exposed heavily to Latin American data, with an important focus on local languages, idioms and expressions.
The team is also working on preserving Indigenous languages.
The first version of LatAm-GPT includes around 70 billion tokens – words and word fragments – in Spanish, Portuguese and English.
It draws from more than 8 terabytes of regional data and nearly 3 million documents, including books, Wikipedia entries, and a myriad of texts obtained through partnerships with libraries and universities across Latin America and Spain.
The largest economies in the region – Brazil and Mexico – contributed the bulk of this material.
While smaller in scale than the most advanced global models, its architecture is closer to ChatGPT-2 than the current GPT-4, but LatAm-GPT’s edge lies elsewhere. It is not the quantity, but the quality of the data and its relevance.
“We’re feeding the model concentrated knowledge about Latin America,” said Aspillaga.
“Global models aim to cover all the world’s knowledge. We’re focused on a niche where we can actually outperform them.”
CENIA researchers believe LatAm-GPT could be especially useful in schools and other local applications that need accuracy on regional affairs.
“Right now, the available models aren’t accurate or complete when it comes to local issues. They don’t understand how locals speak or think,” Aspillaga said.
“It shouldn’t be the person who adapts to the technology, it should be the technology adapting to them.”
Another goal of the centre is to help preserve endangered Indigenous languages.
One of its most prominent projects has taken place 3,700 km off the Chilean coast, on Easter Island (Rapa Nui), often described as the most remote inhabited place on Earth.
There, researchers worked with the local community to build an AI-powered translator for the Rapa Nui language, part of a broader strategy to revitalise and digitally preserve it.
“Rapa Nui is currently at risk because there are very few fluent speakers,” said Jackeline Rapu, who leads the Rapa Nui Language Academy.
“This digital repository is really important. It supports all the linguistic revitalisation efforts we’ve been working on and helps young people reconnect with the language.”
Latin America is not alone in building home-grown models and AI-powered tools.
Around the world, governments are racing to create AI systems tailored to local languages and needs.
The United Arab Emirates government, for example, has launched Falcon and Jais to advance Arabic AI.
India is developing BharatGPT to support more than 14 regional languages, led by public universities and backed by the Department of Science & Technology, in partnership with AI firm CoRover.
In South Korea, tech giant Naver Corporation has introduced HyperCLOVA for Korean, while AI Singapore, a national programme funded by Singapore’s government, is building SEA-LION to serve Southeast Asia.
LatAm-GPT’s development also reflects growing political momentum for AI cooperation in the region.
In April, Chile and Brazil signed a Memorandum of Understanding to jointly advance AI research, with Brazil officially joining the LatAm-GPT initiative.
Last year, Brazil’s President Luiz Inácio Lula da Silva unveiled a national AI strategy that includes investing more than $4 billion by 2028 to boost the industry, while Argentina has expressed ambitions to become a global AI and data hub.
Today, major platforms offer slightly more nuanced answers to prompts about Chilean literature.
The same query today no longer just mentions Neruda, but also brings up other renowned authors like Chilean poet Gabriela Mistral or Chilean writer José Donoso.
“When it comes to AI, we’re always going to be behind countries like the United States, but that doesn’t mean we can’t do something useful, and that’s our ultimate goal,” Aspillaga said.
(Reporting by David Feliba; Editing by Anastasia Moloney and Jonathan Hemming.)
Context is powered by the Thomson Reuters Foundation Newsroom.
Our Standards: Thomson Reuters Trust Principles
New Tab IconThese links open on reuters.com
By providing your email, you agree to our Privacy Policy.
How Japan election could shape future of same-sex marriage rights
Will rising right in Japan election block same-sex marriage?
Deepfake porn tools bypass safeguards to hide in Apple app store
What does a Donald Trump presidency mean for LGBTQ+ rights?
Q&A: The pastor pushing back on anti-DEI in Trump's America
Britain's foreign aid: Where does the money go?
How Japan election could shape future of same-sex marriage rights
Will rising right in Japan election block same-sex marriage?
Deepfake porn tools bypass safeguards to hide in Apple app store
What does a Donald Trump presidency mean for LGBTQ+ rights?
Q&A: The pastor pushing back on anti-DEI in Trump's America
Context is a media platform created by the Thomson Reuters Foundation. We provide news and analysis that contextualises how critical issues and events affect ordinary people, society and the environment. Find out more.
The Workforce Disclosure Initiative is an investor-backed project to improve the quantity & quality of corporate workforce data, via an annual survey & engagement process.
Trust Conference is the Thomson Reuters Foundation’s flagship annual event, taking place in the heart of London each year.
TrustLaw is the Thomson Reuters Foundation’s global pro bono service, facilitating free legal assistance to NGOs and social enterprises around the world.

source

Jesse
https://playwithchatgtp.com