Fast and Accurate Patent Classification in Search Engines

This article presents a new approach to large scale patent classification. The need to classify documents often takes place in professional information retrieval systems. In this paper we describe our approach, based on linguistically-supported k-nearest neighbors. We experimentally evaluate it on the Russian and English datasets and compare modern classification technique fastText. We show that KNN is a viable alternative to traditional text classifiers, achieving comparable accuracy while using less additional hardware resources. © Published under licence by IOP Publishing Ltd.

Авторы
Yadrintsev V. 1, 2 , Bakarov A.1, 3 , Suvorov R. 1 , Sochenkov I. 1, 4
Сборник материалов конференции
Издательство
Institute of Physics Publishing
Номер выпуска
1
Язык
Английский
Статус
Опубликовано
Номер
012004
Том
1117
Год
2018
Организации
  • 1 Federal Research Center Computer Science and Control, Russian Academy of Sciences, Moscow, Russian Federation
  • 2 Peoples Friendship University of Russia, RUDN University, Moscow, Russian Federation
  • 3 National Research University Higher School of Economics, Moscow, Russian Federation
  • 4 Skolkovo Institute of Science and Technology, Moscow, Russian Federation
Ключевые слова
Big data; Classification (of information); Information retrieval; Information retrieval systems; Nearest neighbor search; Patents and inventions; Tellurium compounds; Classification technique; Hardware resources; K-nearest neighbors; New approaches; Patent classifications; Text classifiers; Search engines
Дата создания
04.02.2019
Дата изменения
04.02.2019
Постоянная ссылка
https://repository.rudn.ru/ru/records/article/record/36232/
Поделиться

Другие записи