Skip to main content

Research

test 1

Research lines

The emergence of neural language models has led to a paradigm change in natural language processing. Massive collections of text are used to train neural language models which learn the generic knowledge of languages. This generic knowledge can be successfully reused for neural language models enabling them to learn specific exercises in language processing. So in order to learn specific exercises, they do not need a lot of training data and they achieve very good results. In addition, multilingual neural language models can train by using examples of a single language, and the model learnt in this way will be capable of processing more languages.

The main research lines we have in progress are: 

  • Assessment of neural language models.
  • Transfer learning so that neural language models can learn specific tasks.
  • Transfer learning between languages.
  • Neural language models for poorly-resourced languages.

Solutions:

In the digital age, being able to extract structured information from sources in which human language is codified is hugely important. Extracting that knowledge from today's massive volumes of information (big data) is opening up new possibilities for us to conduct macro analyses, to offer innovative ways of consuming information or to facilitate decision-making processes. Our research is focused on NLU (Natural Language Understanding) tasks such as text classification, entity extraction, opinion mining or question answering. In recent years, neural approaches have been very successfully applied in NLU tasks, and they are in fact the techniques we use on a day-to-day basis.

The main research lines we have in progress are: 

  • Multilingual search systems.
  • Question-answer systems.
  • Emotion analysis.
  • Semantic metadata extraction.
  • Big data surveillance systems.

Solutions:

In this multilingual, global context, machine translation systems are going from strength to strength. The growth in neural networks over recent years has led to an unprecedented qualitative leap in translation quality, and so opportunities have opened up to develop more intelligent systems that are capable of detecting shades of meaning.  

That is why our research aims to develop state-of-the-art systems in the field of machine translation. To do this, we are using the latest neural paradigms to create monolingual as well as multilingual systems. These neural paradigms need large quantities of data while training. As a result, data extraction, filtering and cleaning are essential when exploiting quality data. We are aware that it is important to personalise systems so that they can be adapted to the users' needs; that is why one of our priorities is domain specialisation and specialised terminology. Most of today’s systems translate each sentence in isolation without taking the general context in which the sentence appears into consideration. We are also involved in whole document translations.

The main research lines we have in progress are:

  • Gender bias analysis
  • Whole document translation
  • Integration of specialised terminology
  • Data filtering and cleaning
  • Domain specialisation
  • Multilingual translation

Solutions:

Dialogue assistants can be of two types: those that aim to offer as natural a dialogue as possible and those that aim to fulfil commands and operations. The former types tend to be used for leisure purposes. The latter, by contrast, are used to help people perform specific tasks; for example, to complete administrative formalities, to make purchases or to respond to questions. Companies and the administration are offering more and more of the latter type of dialogue assistants to provide their customers or the general public with a better service.

Dialogue systems concentrate on certain aspects: user intention, dialogue context, understanding or production of language, and today, neural architectures are being successfully used to implement these components.

The main research lines we have in progress are: 

  • User intention detection.
  • Strategies based on limited training data.
  • Transfer learning between languages.

Solutions:

Speech processing is about making computers capable of handling speech, and one of these is ASR or Automatic Speech Recognition.

In speech recognition we explore automatic transcription and subtitling systems that go beyond systems that achieve good results under good conditions. So, in two languages, we work on methods to develop ASR systems designed to transcribe audio material in local forms of speech or in non-formal registers, and also on systems that work in noisy environments (for example, for the purpose of interacting with industry 4.0 machines via speech).

We are also working on personalisation so that when transcribers are given local words, place names and proper names, they can correctly transcribe them. We are also working on live transcription and subtitling, which are tremendously useful in all kinds of sessions, video calls or courses. Another aim is to enable people with mobility disabilities to take advantage of ASR as a dictation tool; in particular, this is geared towards education and children. Finally, we are also working on speaker identification so that the person who has made each utterance can be automatically tagged by means of subtitles or transcription.

The main research lines we have in progress are:

  • Personalised speech recognition
  • Speech recognition in local forms of speech 
  • Speech recognition in non-formal registers
  • Speech recognition in noisy, industrial environments
  • Infant speech recognition
  • Systems geared towards dictation (for accessibility)
  • Live transcription and subtitling
  • Identification of speakers

Solutions:

Speech processing is about making computers capable of processing speech. One of the ways of doing this involves language synthesis or creation (TTS or Text-to-Speech).

We have various lines of research on speech synthesis up and running. One of our aims is to achieve voice cloning with less and less material by using multi-speaker network systems. One of the main challenges right now is to achieve high quality speech synthesis of a speaker using a single sentence uttered by that speaker. We are also exploring cross-lingual techniques and through them we can change the language for any voice. With few sentences in a language of one voice we aim to synthesise that voice so that it will speak a different language. And then to tackle the gender bias of virtual assistants, we have created a voice prototype and ambiguous gender. One of our challenges is to improve the quality of that voice. Finally, we are aiming to incorporate emotion into synthesis systems. Most current synthesis systems use a neutral style and that places limitations on the use of dubbing. We want to prevent style loss in dubbing by transmitting emotions and expressiveness.

The main research lines we have in progress are:

  • Personalised speech synthesis
  • Neutral speech synthesis
  • Speech synthesis with emotion
  • Voice imitation using small samples

Solutions:

The process to produce text has been changing significantly in recent years, and computer tools to assist in text writing are being increasingly used. Automatic checkers are among these tools. These checkers detect errors in text and present possible corrections to the user. Checkers can be used on various levels: spelling, lexis, grammar or style. They are very effective tools in the text production process, in particular, to ensure a high quality of text.

The main research lines we have in progress are: 

  • Neural grammar checking based on synthetic data.

Solutions:

Baliabideak

Resources

resources

Speech Technologies

Strategic projects

Machine Translation

Information retrieval and extraction (IR-IE)

Lexicon and terminology extraction

Semantic and ontologies

Opinion Mining - Sentiment Analysis

Corpus resources

Corpus

test

It's not the future. It's Orai