Skip to main content
AHOTSA
2024 | December 09

Constantly improving speech recognition systems

Within the framework of the European SERMAS project, we have developed a system to recognize Basque, Spanish and English in noisy environments with the capacity to differentiate between speakers

One of Orai’s main lines is speech recognition, which is why it is constantly exploring ASR (Automatic Speech Recognition) technology, so as to go on improving. Speech recognition offers a whole range of possibilities: oral communication with virtual assistants, automatic subtitling of videos, transcription of lectures, development of reading support tools, etc. As part of the European SERMAS project, Orai is developing a system that will work in five languages, be suitable for noisy environments and be able to differentiate between speakers.

SERMAS (Socially-acceptable Extended Reality Models and Systems) is a research and innovation project financed by funds from the Horizon Europe programme and led by various players. It aims to develop socially acceptable extended reality (XR) models and systems. Extended reality systems encompass all immersive technologies that combine the physical and digital worlds, including augmented reality, virtual reality and mixed reality.

SERMAS sets out to build advanced intelligent or virtual assistants for various fields through the research and development of state-of-the-art artificial intelligence. Within the framework of the SERMAS project, Orai is immersed in the subproject LANGSWITCH (Multilingual Automatic Speech Recognition in Noisy Environments).

“We are creating voice recognition technology in five languages (Basque, Spanish, French, English and Italian), developed in such a way that it can be used in very noisy environments and distinguish between speakers. It will be possible to apply these speech recognisers in virtual assistant systems, avatars, robots, collaborative robots and augmented reality systems to facilitate human-machine interaction. It will also allow the voice to be used to increase the degree of personalisation, thanks to a user differentiation system. For example, it will be possible to take the machine's usage history or the preferences of a specific user into consideration,” explained Iñigo Morcillo, a researcher at Orai.

In the first phase Orai validated the ASR system for English. The result was very satisfactory, according to Igor Leturia, head of speech technologies at Orai: “We managed to improve the Whisper model.” (Whisper is a machine learning model for speech recognition and transcription created by OpenAI). The Orai team has also made this ASR system available to the SERMAS project via an API or an application programming interface.

In the second phase of the project the Orai team developed recognition systems for Basque and Spanish in noisy environments. “In the third phase we developed a system to differentiate between speakers. It is a system to find out whether there is a change of user in the virtual actors or who is speaking at a given moment among various potential users, and it works in several languages,” said Leturia. “We have now embarked on the fourth phase, in which we will be developing the speech recognition system for noisy environments for French and Italian, too, thus completing the system.” So the end result will be an ASR system that will work in five languages (eu, es, fr, it, en), be valid for noisy environments, and be able to distinguish between speakers. All this will be made available to the SERMAS systems.

Further information

The SERMAS consortium is made up of the Università Degli Studi di Modena e Reggio Emilia (UNIMORE, Italy), the Technische Universität Darmstadt (Germany), King’s College London (U.K.) and the University of Applied Sciences and Arts of Southern Switzerland (SUPSI, Switzerland), in addition to Deutsche Welle (the German broadcasting service), Poste Italiane (the Italian postal service), F6S (a global network that assists public sector organisations across the world in promoting, communicating and disseminating technical and research projects) and Spindox Labs (an innovation centre in Trento, Italy, devoted to technological exploration and the creation of prototypes). Orai joined the project through a SERMAS consortium call aiming to attract innovative organisations, high-tech start-ups, SMEs and industrial players.