The Hybrid Method for Accurate Patent Classification

This article is dedicated to stacking of two approaches of patent classification. First is based on linguistically-supported k-nearest neighbors algorithm using the method of search for topically similar documents based on a comparison of vectors of lexical descriptors. Second is the word embeddings based fastText, where the sentence (or a document) vector is obtained by averaging the n-gram embeddings, and then a multinomial logistic regression exploits these vectors as features. We show in Russian and English datasets that stacking classifier shows better results compared to single classifiers. © 2019, Pleiades Publishing, Ltd.

Authors
Publisher
Pleiades Publishing
Number of issue
11
Language
English
Pages
1873-1880
Status
Published
Volume
40
Year
2019
Organizations
  • 1 Federal Research Center Computer Science and Control of the Russian Academy of Sciences, Moscow, 119333, Russian Federation
  • 2 Peoples’ Friendship University of Russia (RUDN University), Moscow, 117198, Russian Federation
  • 3 Lomonosov Moscow State University, Moscow, 119991, Russian Federation
Keywords
fastText; KNN; patent classification; similarity search; stacking; word embeddings
Share

Other records