Algorithms for prosodic discourse feature interpretation in case of its processing using low-speed codecs

In this article we propose two algorithms for discourse prosodic feature interpretation. The first algorithm based on wide phonetic categories and second algorithm based on audio signal melodic cross-correlation functions and short-timed energy series – as well as methodical recommendations for their use are proposed as a part of the problem of audio signal language identification based on a prosodic approach. An experimental evaluation of both algorithms is proposed. Neural networks are used as a decision rule. Wide phonetic categories were pause, pitch, noise. We have expanded wide phonetic categories to pause, pitch, noise, five levels of pitch, sites of decreasing energy, main maximum, adverse maximum. The total number of categories was 14. These algorithms can be applied for language identification or speaker identification. At the same time there is no requirement to restore the speech signal after processing it by low-speed codec. Certainly, frames of the speech codec must contain such parameters as pitch, tone-noise parameter, energy. The base of speech signals consists of 10 languages 10 speakers per language. Total time of the speech per speaker is 100 minutes. This time takes into account statistical regularities of languages. Tests for evaluation of the algorithms were carried out with a multilayer perceptron. © 2018 ASSA.

Авторы
Bessonov M.A. 1 , Bessonova N.A. 1 , Farkhadov M.P.2
Издательство
International Institute for General Systems Studies
Номер выпуска
1
Язык
Английский
Страницы
1-11
Статус
Опубликовано
Том
18
Год
2018
Организации
  • 1 Peoples' Friendship University of Russia, Moscow, Russian Federation
  • 2 V.A. Trapeznikov Institute for Control Sciences of Russian Academy of Science, Moscow, Russian Federation
Ключевые слова
Discourse prosodic feature; Language identification; Neural networks; Wide phonetic categories
Цитировать
Поделиться

Другие записи