The Yẽgatu Digital project was the result of a process which started with the development of basic technologies by IBM Research and the University of São Paulo, and their initial exploration with a Guarani Mbya community school in São Paulo. After this initial engagement, the tools were ported to the Nheengatu language and, following the protocols of the Federation of Indigenous Organizations of the Rio Negro (FOIRN), a partnership was established with two local communities in the Amazon to pilot the tools in their schools to foster the digital use of Nheengatu.
The laboratory of IBM Research in Brazil, in partnership with University of São Paulo (USP), has been developing AI-based reading/writing toolsfor Indigenous languages for the last 3 years in the context of the PROLIND project of the Center of Artificial Intelligence set up by USP, IBM, and FAPESP in 2020 (C4AI). Writing-supporting tools are essential to easier use of languages on the Internet and social media.
IBM Research has developed translators, orthographic correctors, electronic dictionaries, and word completers for Indigenous languages. These tools, developed using state-of-the-art AI/LLM technologies, have been embedded in easy-to-use mobile and web apps. The tools were developed using linguistic data (such as dictionaries) following ethical guidelines, without any kind of data collection from Indigenous groups.
Since the start of the PROLIND project in 2022, we have been engaging with many communities in Brazil, including the Guarani Mbya, Guarani Kaiowá, Guarani Nhandewa, Tupi, Terena, Baré, Wassu, Tukano, Pankararu, Zoé, Baniwa, and Mehinako peoples. In the beginning of 2023, following a series of meetings with the Tenondé Porã, a Guarani Mbya community in the outskirts of São Paulo city, we were invited by the community to explore the use of writing assistants by Indigenous high school students and to conduct activities fostering community-led linguistic documentation and analysis.
The invitation from this Guarani Mbya community led to weekly 2-hour workshops where various technologies and prototypes of writing assistants were introduced to and used by students of a local, Indigenous, bilingual high school. We used these prototypes as ''technology probes'', a variation of the idea of ''cultural probes'', a.k.a. ''design probes''. The main idea was to insert some sort of technological artifact into the classroom which could elicit responses from the students in the context of actual writing tasks.
In total, 14 workshops were conducted over three months, during which different versions of the writing assistant prototype and its components were used and discussed within the context of various writing activities. Although this first engagement had limited outcomes in terms of creating high-quality technology or deploying actual writing tools, it demonstrated the need for good writing tools and methods to support a generation of students who, despite being fluent in their native language, were still learning how to write in it. While these youngsters were actively involved in writing messages among themselves, reading social media, and sharing content, the presence of Guarani Mbya text in their digital lives seemed to be almost nonexistent. We concluded that there is a pressing need to develop tools to support digital writing among youngsters, who encounter difficulties translating into text concepts and ideas they can easily express verbally.
Following our engagement with the Guaranis, we decided to focus on another Indigenous language, Nheengatu, which is spoken by approximately 20,000 people across three different areas of the Amazon area and in the Northeast of Brazil. This language is used by various peoples and ethnicities, including cases where the language was adopted by groups after the loss of their original language, such as the Baré. For Nheengatu, the IBM Research team built initial prototypes in 2024 of translators to and from Portuguese language, spell-checkers, and next word completers, which were packaged in three prototypes of writing assistants. Only data available from linguistic sources in the Internet were used.
We considered that a good starting point could be using the AI tools in the context of Indigenous middle and high schools of the Baré ethnicity in communities near São Gabriel da Cachoeira, Brazil, in one of the most linguistic diverse areas of the Amazon. The process started in April of 2024 with a presentation by one member of our team about the project during an assembly of CAIBARNX, when the community approved the idea of exploring a partnership and the entrance of the researchers and technicians in the Indigenous land.
With the support of FOIRN and CAIBARNX, in September of 2024 a team visited two of those communities, Juruti and Tabocal dos Pereira. These are typical Indigenous villages of the banks of the Rio Negro river, and are accessible only by small boats, about a 5 hour ride from São Gabriel da Cachoeira.
The Juruti community is home to about 20 families, has a local K-9 school, and only recently has obtained access to high speed Internet through a cheap satellite provider. We had three meetings with leaders and teachers of this community, about 15 people in total, where we were told about the concerns of the community with the excessive use by their young of the Internet, and how strengthening the use of their language is essential to create a strong sense of identity. At some point we showed them a video of the writing assistant for the Nheengatu language we were developing and in the following discussion most people agreed that it could be a tool to be tested in the school with the students, as a way to have them write more and better. We suggested having weekly workshops exploring using digital writing tools as part of the school language classes, to what they agreed if we could provide also the infrastructure.
The second community we visited, Tabocal dos Pereira, is larger than the first one, about 100 families, with a well-established bilingual K-12 school program. We were received in the traditional community meeting place and given demonstrations of the students' skills in writing of the Nheengatu language. We then went through a series of meetings, where we first saw the concerns of the local teachers with the impact of the arrival of satellite Internet. We also talked to the high school students who confirmed the teachers perceptions that, although they use Nheengatu as the preferred language for spoken interactions within the community, they never use it in the Internet, not even in spoken segments in social media. The proposal of establishing a weekly workshop with both the middle and high school students was also welcomed by the teachers, who wanted a strong focus on facilitating the use of Nheengatu in the Internet, and on the creation of content in the language to attract and motivated young people. However, this should be done in the particular orthography of the Nheengatu language used in the area, what prompted important modifications in the tools being developed.
In May of 2025, a cooperation agreement was signed between FOIRN (Federation of Indigenous Organizations of the Rio Negro) and the University of São Paulo (USP), through the CIAAM (Center for Artificial Intelligence and Machine Learning), with the participation of the USP Artificial Intelligence Center (C4AI) and IBM Research Brasil.
The FOIRN, which for more than three decades has acted as the main institutional reference for the indigenous peoples of the Rio Negro, has taken a leading role in the process as an articulator of community actions and coordinator of the policy to strengthen the Yẽgatu language in the territory. USP, through the CIAAM, inaugurated a pioneering participation in the actions of vitalization of Indigenous languages with the use of advanced digital technologies in combination with linguistic research methods.
The agreement established bilateral commitments in the areas of digital education, applied research and development of language technologies, based on protocols of listening, co-responsibility and indigenous autonomy, through the construction of two digital classrooms, for distance education and collaboration in the Baré de Juruti and Tabocal dos Pereira communities. The C4AI contributes to the technical and scientific basis for the development of natural language and artificial intelligence solutions adapted to Indigenous languages, initially explored in the PROLIND project.
This agreement represents a milestone in the construction of institutional alliances centered on Indigenous linguistic sovereignty and the intercultural production of knowledge, in which science and ancestry go hand in hand.