Automated language detection in the global monitoring systems of the infomation space with big data

The paper studies some aspects of building a global information monitoring system with big data by using the method of automated language determination. The language identification technology, which significantly reduces the consumption of resources at processing incoming data in systems for analyzing large amounts of multilingual information and increases their efficiency arc described. The technology is based on revealed thematically independent syntactic markers, which allow not only to highlight the grammatical basis of the text, but also to identify the language that was used to record information in it. A functional mathematical model was created that presents the essence and main components of the developed technology, and also proposed a system for assessing its effectiveness.

Авторы
Farkhadov M.1 , Kalegin S.2 , Farkhadova M. 3
Издательство
IEEE
Язык
Английский
Страницы
24-27
Статус
Опубликовано
Год
2018
Организации
  • 1 RAS, VA Trapeznikov Inst Control Sci, Automated Queuing Syst & Signal Proc, Moscow 117997, Russia
  • 2 RAS, VA Trapeznikov Inst Control Sci, ICS, Moscow Res Inst Televis, Moscow 117997, Russia
  • 3 RUDN Univ, Fac Russian Language & Gen Educ Disciplines, Moscow 117198, Russia
Ключевые слова
Big data processing; module of Big data processing systems; language marker; syntactic marker; language identification of information; linguistic affiliation of the unstructured text
Дата создания
24.12.2019
Дата изменения
24.12.2019
Постоянная ссылка
https://repository.rudn.ru/ru/records/article/record/55913/
Поделиться

Другие записи