The Hybrid Method for Accurate Patent Classification

This article is dedicated to stacking of two approaches of patent classification. First is based on linguistically-supported k-nearest neighbors algorithm using the method of search for topically similar documents based on a comparison of vectors of lexical descriptors. Second is the word embeddings based fastText, where the sentence (or a document) vector is obtained by averaging the n-gram embeddings, and then a multinomial logistic regression exploits these vectors as features. We show in Russian and English datasets that stacking classifier shows better results compared to single classifiers. © 2019, Pleiades Publishing, Ltd.

Авторы

Yadrintsev V.V. ^1, ² , Sochenkov I.V. ^1, ³

Журнал

Lobachevskii Journal of Mathematics

Издательство

Pleiades Publishing

Номер выпуска

Язык

Английский

Страницы

1873-1880

Статус

Опубликовано

Ссылка

Внешняя ссылка

DOI

10.1134/S1995080219110325

Том

Год

2019

Организации

¹ Federal Research Center Computer Science and Control of the Russian Academy of Sciences, Moscow, 119333, Russian Federation
² Peoples’ Friendship University of Russia (RUDN University), Moscow, 117198, Russian Federation
³ Lomonosov Moscow State University, Moscow, 119991, Russian Federation

Ключевые слова

fastText; KNN; patent classification; similarity search; stacking; word embeddings

Цитировать

ГОСТ MLA RIS BibTex

Другие записи

AUTONOMY IN THE RUSSIAN FEDERATION: THEORY AND PRACTICE

Статья

Kartashkin V.A., Abashidze A.Kh.

International Journal on Minority and Group Rights. Том 10. 2003. С. 203-220

AUTOMATED ANALYSIS OF THE PIGMENT NETWORK IN DERMATOSCOPIC IMAGES OF MELANOCYTIC SKIN TUMORS

Статья

Nikitaev V.G., Pronichev A.N., Tamrazova O.B., Sergeev V.Y., Sergeev Y.Y., Kozyreva A.V., Polyakov E.V., Druzhinina E.A.

Biomedical Engineering. Том 53. 2019. С. 254-257