Classification models for rsT discourse parsing of texts in Russian [КЛАССИФИКАЦИЯ РИТОРИЧЕСКИХ ОТНОШЕНИЙ ДЛЯ ДИСКУРСИВНОГО АНАЛИЗА ТЕКСТОВ НА РУССКОМ ЯЗЫКЕ]

Chistova, E.V.; Shelmanov, A.O.; Kobozeva, M.V.; Pisarevskaya, D.B.; Smirnov, I.V.; Toldova, S.Yu.

Classification models for rsT discourse parsing of texts in Russian [КЛАССИФИКАЦИЯ РИТОРИЧЕСКИХ ОТНОШЕНИЙ ДЛЯ ДИСКУРСИВНОГО АНАЛИЗА ТЕКСТОВ НА РУССКОМ ЯЗЫКЕ]

The paper considers the task of automatic discourse parsing of texts in Russian. Discourse parsing is a well-known approach to capturing text semantics across boundaries of single sentences. Discourse annotation was found to be useful for various tasks including summarization, sentiment analysis, question-answering. Recently, the release of manually annotated Ru-RSTreebank corpus unlocked the possibility of leveraging supervised machine learning techniques for creating such parsers for Russian language. The corpus provides the discourse annotation in a widely adopted formalisation—Rhetorical Structure Theory. In this work, we develop feature sets for rhetorical relation classification in Russian-language texts, investigate importance of various types of features, and report results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank corpus. We consider various machine learning methods including gradient boosting, neural network, and ensembling of several models by soft voting. © 2019 ABBYY PRODUCTION LLC. All rights reserved.

Авторы

Chistova E.V. ^1, ² , Shelmanov A.O. ^1, ³ , Kobozeva M.V. ¹ , Pisarevskaya D.B. ¹ , Smirnov I.V. ¹ , Toldova S.Yu. ⁴

Журнал

Komp'juternaja Lingvistika i Intellektual'nye Tehnologii

Издательство

Rossiiskii Gosudarstvennyi Gumanitarnyi Universitet

Номер выпуска

Язык

Английский

Страницы

163-176

Статус

Опубликовано

Том

2019-May

Год

2019

Организации

¹ FRC CSC RAS, Moscow, Russian Federation
² RUDN University, Moscow, Russian Federation
³ Skoltech, Moscow, Russian Federation
⁴ NRU Higher School of Economics, Moscow, Russian Federation

Ключевые слова

Discourse parsing; Feature selection; Machine learning on annotated corpus; RST; Word embedding

Цитировать

ГОСТ MLA RIS BibTex

Другие записи

AUTONOMY IN THE RUSSIAN FEDERATION: THEORY AND PRACTICE

Статья

Kartashkin V.A., Abashidze A.Kh.

International Journal on Minority and Group Rights. Том 10. 2003. С. 203-220

THE LP-DISSIPATIVITY OF CERTAIN DIFFERENTIAL AND INTEGRAL OPERATORS

Статья

Cialdea A., Maz’Ya V.

Contemporary Mathematics. Том 734. 2019. С. 77-93