On methods for improving the accuracy of multiclass classification on imbalanced data [О МЕТОДАХ ПОВЫШЕНИЯ ТОЧНОСТИ МНОГОКЛАССОВОЙ КЛАССИФИКАЦИИ НА НЕСБАЛАНСИРОВАННЫХ ДАННЫХ]

This paper studies methods to overcome the imbalance of classes in order to improve the quality of classification with accuracy higher than the direct use of classification algorithms to unbalanced data. The scheme to improve the accuracy of classification is proposed, consisting in the use of a combination of classification algorithms and methods of selection of features such as RFE (Recursive Feature Elimination), Random Forest, and Boruta with the preliminary use of balancing classes by random sampling methods, SMOTE (Synthetic Minority Oversamplimg TEchnique) and ADASYN (ADAptive SYNthetic sampling). By the example of data on skin diseases, computer experiments were conducted which showed that the use of sampling algorithms to eliminate the imbalance of classes as well as the selection of the most informative features significantly increases the accuracy of the classification results. The most effective classification accuracy was the Random Forest algorithm for sampling data using the ADASYN algorithm. © 2020 Federal Research Center "Computer Science and Control" of Russian Academy of Sciences. All rights reserved.

Авторы
Sevastianov L.A. 1 , Shchetinin E.Yu.2
Издательство
Федеральный исследовательский центр "Информатика и управление" РАН
Номер выпуска
1
Язык
Русский
Страницы
63-70
Статус
Опубликовано
Том
14
Год
2020
Организации
  • 1 Peoples’ Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Str., Moscow, 117198, Russian Federation
  • 2 Financial University under the, Government of the Russian Federation, 49 Leningradsky Prospekt, Moscow, 125993, Russian Federation
Ключевые слова
ADASYN; Classification; Imbalanced data; Random forest; Sampling; SMOTE
Дата создания
02.11.2020
Дата изменения
02.11.2020
Постоянная ссылка
https://repository.rudn.ru/ru/records/article/record/65188/