Modern Approaches to the Language Data Analysis. Using Language Analysis Methods for Management and Planning Tasks

The article discusses promising directions for the use of modern automatic methods for analyzing natural language data for solving a wide range of practical problems. The technology of creating electronic corpora (collections) of texts is considered as a tool for the transition from model linguistics to tagged data linguistics. The principles of the creation of marked corpora of texts, the possibilities and limitations of their use are considered. Creation of a marked corpus of texts in which language data that is downloaded from the Internet is processed sequentially before issuing the results to users is described. The conveyor consists of the following steps: uploading data from the Internet; definition of the language in which the text is written; unloading metadata; splitting the texts into paragraphs and sentences; deduplication; tokenization; automatic language markup; uploading cleared and marked data to the network. The prospects for the development of language data analysis systems are presented. Requirements for the creation of corpora for solving problems of public administration and strategic planning are developed. Properties that should have such bodies are considered. Those include: corpus format, corpus volume, the degree of the linguistic analysis depth, corpus-manager structure. A description of the marked corpora of texts developed at the Artificial Intelligence Research Center (AIReC) of Ailamazyan Program Systems Institute of the Russian Academy of Sciences, with a reference to the tasks of extracting information about persons, events and situations from the texts of news reports is presented. A retrospective review of the development of systems for automatic processing of natural language texts in the areas of machine translation and human-machine interaction is given. © 2020, Springer Nature Switzerland AG.

Vinogradov A.N. 1 , Vlasova N.2 , Kurshev E.P. 2 , Podobryaev A.2
Сборник статей
  • 1 Peoples’ Friendship University of Russia (RUDN University), Moscow, Russian Federation
  • 2 Ailamazyan Program Systems Institute of RAS (PSI RAS), Pereslavl-Zalessky, Yaroslavl Region, Russian Federation
Ключевые слова
Artificial intelligence; Digital economy; Machine translation; Natural language processing; Strategic management
Дата создания
Дата изменения
Постоянная ссылка

Другие записи