The Hybrid Method for Accurate Patent Classification

This article is dedicated to stacking of two approaches of patent classification. First is based on linguistically-supported k-nearest neighbors algorithm using the method of search for topically similar documents based on a comparison of vectors of lexical descriptors. Second is the word embeddings based fastText, where the sentence (or a document) vector is obtained by averaging the n-gram embeddings, and then a multinomial logistic regression exploits these vectors as features. We show in Russian and English datasets that stacking classifier shows better results compared to single classifiers. © 2019, Pleiades Publishing, Ltd.

Авторы
Издательство
Pleiades Publishing
Номер выпуска
11
Язык
Английский
Страницы
1873-1880
Статус
Опубликовано
Том
40
Год
2019
Организации
  • 1 Federal Research Center Computer Science and Control of the Russian Academy of Sciences, Moscow, 119333, Russian Federation
  • 2 Peoples’ Friendship University of Russia (RUDN University), Moscow, 117198, Russian Federation
  • 3 Lomonosov Moscow State University, Moscow, 119991, Russian Federation
Ключевые слова
fastText; KNN; patent classification; similarity search; stacking; word embeddings
Цитировать
Поделиться

Другие записи