By:
Nana Appiah Acquaye
Google
has announced the launch of WAXAL, a new open dataset designed to support the
development of inclusive speech technology across 21 African languages.
The
dataset, whose name is derived from the Wolof word for “speak,” aims to address
data scarcity that has historically limited the representation of African
languages in artificial intelligence models. WAXAL contains more than 11,000
hours of speech data compiled from nearly two million individual recordings.
Languages
included in the dataset range from Acholi and Hausa to Luganda and Yoruba,
among others. The initiative is intended to improve the performance and
inclusivity of speech recognition systems while contributing to the digital
preservation of African linguistic heritage.
The
project was developed in collaboration with Makerere University in Uganda, the
University of Ghana, and Digital Umuganda in Rwanda. Through these
partnerships, Google seeks to foster innovation in speech technology and expand
access to AI tools for communities across the continent.
Company
leadership noted that initiatives such as WAXAL are part of broader efforts to
enhance diversity in AI models and ensure that emerging technologies reflect
the linguistic and cultural richness of the regions they serve.