The ParaPlag: Russian dataset for paraphrased plagiarism detection

The paper presents the ParaPlag: a large text dataset in Russian to evaluate and compare quality metrics of different plagiarism detection approaches that deal with big data. The competition PlagEvalRus-2017 aimed to evaluate plagiarism detection methods uses the ParaPlag as a main dataset for source retrieval and text alignment tasks. The ParaPlag is open and available on the Web. We propose a guide for writers who want to contribute to the ParaPlag and extend it. The analysis of text rewrite techniques used by unscrupulous authors is also presented in our research.

Авторы
Издательство
Rossiiskii Gosudarstvennyi Gumanitarnyi Universitet
Номер выпуска
16
Язык
Английский
Страницы
284-296
Статус
Опубликовано
Том
1
Год
2017
Организации
  • 1 RUDN University, Moscow, Russian Federation
  • 2 Federal Research Center Computer Science and Control, Russian Academy of Sciences, Moscow, Russian Federation
  • 3 Institute for Systems Analysis, Federal Research Center Computer Science and Control, Russian Academy of Sciences, Moscow, Russian Federation
Ключевые слова
Dataset for plagiarism detection evaluation; Paraphrased plagiarism detection; Text reuse detection
Цитировать
Поделиться

Другие записи