Google Research Africa unveils WAXAL: A 21-Language Breakthrough for African Voice Technology with Kikuyu, Luo and Swahili Taking Lead

0
348
Share this

In a major step toward digital inclusivity, Google has unveiled a massive new speech dataset named WAXAL, designed to ensure that technology can finally understand millions of speakers of African languages.

In Kenya, Dholuo, Kikuyu and Swahili are among the first indigenous languages to have a voice in Artificial Intelligence (AI).

Writing in a joint article, Aisha Walcott-Bryant, Head of Google Research Africa, and Perry Nelson, Google Ghana Site Lead, highlighted that while talking to devices is “second nature” for much of the world, this convenience often vanishes in Sub-Saharan Africa.

The region boasts over 2,000 distinct languages, yet the development of helpful voice technology has long been stalled by a lack of high-quality speech data.

To address this barrier, researchers have introduced WAXAL—a name derived from the Wolof word for “speak.”

Developed over a three-year period, the dataset is intended to empower researchers to build more inclusive tools for a continent that has historically been overlooked by global tech giants.

The WAXAL collection is vast in scale, featuring over 11,000 hours of total speech data comprised of nearly 2 million individual voice recordings.

This extensive dataset specifically includes approximately 1,250 hours of transcribed speech designed for automatic speech recognition (ASR) alongside over 20 hours of high-quality studio recordings intended for text-to-speech (TTS) voice synthesis.

Crucially, the project was not built in isolation. Google collaborated with leading African academic and technical organizations to ensure the data was both authentic and ethically sourced.

Partners at Makerere University in Uganda and the University of Ghana spearheaded collection efforts for 13 languages, while Rwanda’s Digital Umuganda managed five major languages.

Furthermore, the African Institute for Mathematical Sciences (AIMS) assisted with multilingual data for future releases. Under this collaborative framework, these local partners retain ownership of the data they collected while sharing the goal of making it globally accessible.

To make a smartphone “understand” a language, it requires thousands of hours of training. WAXAL provides this through:

Feature Scale and Detail
Total Audio

Over 11,000 hours of speech.

 

Individual Recordings

Nearly 2 million unique snippets.

 

Recognition (ASR)

~1,250 hours of transcribed speech for AI to “hear”.

 

Synthesis (TTS)

20+ hours of studio audio for AI to “speak”.

 

In an effort to capture how people truly communicate, participants were asked to describe various pictures in their native tongues rather than simply reading scripts.

This was balanced with professional studio sessions featuring voice actors to provide the precision required for high-fidelity voice synthesis.

The dataset spans 21 languages, including Kikuyu, Swahili, Dholuo, Acholi, Akan, Dagaare, Dagbani, Ewe, Fante, Fulani (Fula), Hausa, Igbo, Ikposo (Kposo), Lingala, Luganda, Malagasy, Masaaba, Nyankole, Rukiga, Shona, Soga (Lusoga) and Yoruba.

By releasing the complete WAXAL collection today under an open license on Hugging Face, Google hopes to foster a new era of innovation and aid in the digital preservation of Africa’s rich linguistic heritage.

Share this