Detecting Near-duplicates in Russian Documents through Using Fingerprint Algorithm Simhash

Plagiarism is one of the major problems in the age of communication. In many languages such as English, this issue is seriously of high importance and many powerful devices have been invented to prevent this problem from occurring. This article aims at discovering plagiarism in Russian texts based on fingerprint algorithm. The fingerprint algorithms have high speeds in finding out the plagiarism due to the compact features it creates and purely because of the comparison of these properties between original documents and dubious documents. Increasing the power and accuracy of plagiarism discovery, there must be elimination of general words and word rooting before pre-processing applications such as words separation, numbers replacement, and homogenization. In this article, four Simhash algorithms have been used. The implementation of these algorithms confirmed on 800 articles with the scientific topics was found to have satisfactory results. © 2017 The Authors.

Авторы
Сборник материалов конференции
Издательство
Elsevier B.V.
Язык
Английский
Страницы
421-425
Статус
Опубликовано
Том
103
Год
2017
Организации
  • 1 RUDN University, 6 Miklukho-Maklaya str., Moscowf, 117198, Russian Federation
Ключевые слова
fingerprint algorithm; plagiarism; Simhash
Дата создания
19.10.2018
Дата изменения
19.10.2018
Постоянная ссылка
https://repository.rudn.ru/ru/records/article/record/5688/
Поделиться

Другие записи