The ParaPlag: Russian dataset for paraphrased plagiarism detection

Sochenkov, I.V.; Zubarev, D.V.; Smirnov, I.V.

The ParaPlag: Russian dataset for paraphrased plagiarism detection

The paper presents the ParaPlag: a large text dataset in Russian to evaluate and compare quality metrics of different plagiarism detection approaches that deal with big data. The competition PlagEvalRus-2017 aimed to evaluate plagiarism detection methods uses the ParaPlag as a main dataset for source retrieval and text alignment tasks. The ParaPlag is open and available on the Web. We propose a guide for writers who want to contribute to the ParaPlag and extend it. The analysis of text rewrite techniques used by unscrupulous authors is also presented in our research.

Авторы

Sochenkov I.V. ^1, ² , Zubarev D.V. ^1, ² , Smirnov I.V. ^1, ²

Журнал

Komp'juternaja Lingvistika i Intellektual'nye Tehnologii

Издательство

Rossiiskii Gosudarstvennyi Gumanitarnyi Universitet

Номер выпуска

Язык

Английский

Страницы

284-296

Статус

Опубликовано

Том

Год

2017

Организации

¹ RUDN University, Moscow, Russian Federation
² Federal Research Center Computer Science and Control, Russian Academy of Sciences, Moscow, Russian Federation
³ Institute for Systems Analysis, Federal Research Center Computer Science and Control, Russian Academy of Sciences, Moscow, Russian Federation

Ключевые слова

Dataset for plagiarism detection evaluation; Paraphrased plagiarism detection; Text reuse detection

Цитировать

ГОСТ MLA RIS BibTex

Другие записи

SPECTRA OF SHORT MONADIC SENTENCES ABOUT SPARSE RANDOM GRAPHS

Статья

Zhukovskii M.E., Kupavskii A.B.

Doklady Mathematics. Том 95. 2017. С. 60-61

RAPID ACCESS TO OXAZINE FUSED FUROCOUMARINS AND IN VIVO AND IN SILICO STUDIES OF THEIRS BIOLOGICAL ACTIVITY

Статья

Lipeeva A.V., Baev D.S., Dolgikh M.P., Tolstikova T.G., Shults E.E.

Medicinal Chemistry. Том 13. 2017. С. 625-632