Automated language detection in the global monitoring systems of the infomation space with big data

The paper studies some aspects of building a global information monitoring system with big data by using the method of automated language determination. The language identification technology, which significantly reduces the consumption of resources at processing incoming data in systems for analyzing large amounts of multilingual information and increases their efficiency arc described. The technology is based on revealed thematically independent syntactic markers, which allow not only to highlight the grammatical basis of the text, but also to identify the language that was used to record information in it. A functional mathematical model was created that presents the essence and main components of the developed technology, and also proposed a system for assessing its effectiveness.

Authors
Farkhadov M.1 , Kalegin S.2 , Farkhadova M. 3
Publisher
IEEE
Language
English
Pages
24-27
Status
Published
Year
2018
Organizations
  • 1 RAS, VA Trapeznikov Inst Control Sci, Automated Queuing Syst & Signal Proc, Moscow 117997, Russia
  • 2 RAS, VA Trapeznikov Inst Control Sci, ICS, Moscow Res Inst Televis, Moscow 117997, Russia
  • 3 RUDN Univ, Fac Russian Language & Gen Educ Disciplines, Moscow 117198, Russia
Keywords
Big data processing; module of Big data processing systems; language marker; syntactic marker; language identification of information; linguistic affiliation of the unstructured text
Share

Other records