В рамках решения задачи определения языка аудиосообщения на основе просодического подхода предложены два алгоритма интерпретации просодических признаков речи и методика их использования – алгоритм на основе широких фонетических категорий и алгоритм на основе кросскорреляционной функции от мелодики речевого сигнала и последовательности кратковременных энергий. Проводится экспериментальная оценка алгоритмов. В качестве решающего правила используются нейронные сети.
We study the language identification problem using prosodic features. Prosodic features such as melody, rhythm, timbre and others are difficult to formalize mathematically. Two algorithms for a complex description of prosodic features are proposed in the paper. The first is based on the broad phonetic categories, and the second is based on the cross-correlation of the speech melody and the short-term energy sequence. The fundamental frequency was estimated by MELP algorithm. The performance of the proposed algorithms was evaluated experimentally on a database of speech recordings obtained from Internet and therefore encoded by low-bitrate vocoders. The database includes ten different languages. The proposed algorithms provide a feature description and a multi-layer neural network was used as a language classifier. Both algorithms show satisfactory classification performance, but the broad phonetic categories approach performs slightly better than the cross-correlation function. These algorithms can be applied to a speech signal processed by low-bitrate vocoders without decoding to the original signal.