Automated language detection in the global monitoring systems of the information space with big data

The paper studies some aspects of building a global information monitoring system with big data by using the method of automated language determination. The language identification technology, which significantly reduces the consumption of resources at processing incoming data in systems for analyzing large amounts of multilingual information and increases their efficiency are described. The technology is based on revealed thematically independent syntactic markers, which allow not only to highlight the grammatical basis of the text, but also to identify the language that was used to record information in it. A functional mathematical model was created that presents the essence and main components of the developed technology, and also proposed a system for assessing its effectiveness. © 2018 IEEE.

Authors
Farkhadov M.1 , Kalegin S.2 , Farkhadova M. 3
Publisher
Institute of Electrical and Electronics Engineers Inc.
Language
English
Status
Published
Number
8747043
Year
2018
Organizations
  • 1 Automated Queuing Systems and Signal Processing, V.A. Trapeznikov Institute of Control Sciences of RAS, Moscow, 117997, Russian Federation
  • 2 Moscow Research Institute of Television, V.A. Trapeznikov Institute of Control Sciences of RAS (ICS RAS), Moscow, 117997, Russian Federation
  • 3 Faculty of Russian Language and General Educational Disciplines, RUDN University, Moscow, 117998, Russian Federation
Keywords
Big data processing; language identification of information; language marker; linguistic affiliation of the unstructured text; module of Big data processing systems; syntactic marker
Share

Other records