Classification models for rsT discourse parsing of texts in Russian [КЛАССИФИКАЦИЯ РИТОРИЧЕСКИХ ОТНОШЕНИЙ ДЛЯ ДИСКУРСИВНОГО АНАЛИЗА ТЕКСТОВ НА РУССКОМ ЯЗЫКЕ]

The paper considers the task of automatic discourse parsing of texts in Russian. Discourse parsing is a well-known approach to capturing text semantics across boundaries of single sentences. Discourse annotation was found to be useful for various tasks including summarization, sentiment analysis, question-answering. Recently, the release of manually annotated Ru-RSTreebank corpus unlocked the possibility of leveraging supervised machine learning techniques for creating such parsers for Russian language. The corpus provides the discourse annotation in a widely adopted formalisation—Rhetorical Structure Theory. In this work, we develop feature sets for rhetorical relation classification in Russian-language texts, investigate importance of various types of features, and report results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank corpus. We consider various machine learning methods including gradient boosting, neural network, and ensembling of several models by soft voting. © 2019 ABBYY PRODUCTION LLC. All rights reserved.

Авторы
Chistova E.V. 1, 2 , Shelmanov A.O. 1, 3 , Kobozeva M.V.1 , Pisarevskaya D.B.1 , Smirnov I.V. 1 , Toldova S.Yu.4
Издательство
Rossiiskii Gosudarstvennyi Gumanitarnyi Universitet
Номер выпуска
18
Язык
Английский
Страницы
163-176
Статус
Опубликовано
Том
2019-May
Год
2019
Организации
  • 1 FRC CSC RAS, Moscow, Russian Federation
  • 2 RUDN University, Moscow, Russian Federation
  • 3 Skoltech, Moscow, Russian Federation
  • 4 NRU Higher School of Economics, Moscow, Russian Federation
Ключевые слова
Discourse parsing; Feature selection; Machine learning on annotated corpus; RST; Word embedding
Дата создания
02.11.2020
Дата изменения
02.11.2020
Постоянная ссылка
https://repository.rudn.ru/ru/records/article/record/65814/
Поделиться

Другие записи