Showing posts with label Sorokina Daria. Show all posts
Showing posts with label Sorokina Daria. Show all posts

February 1, 2007

Plagiarism Detection in arXiv (2007)

Sorokina Daria, Gehrke Johannes, Warner Simeon, Ginsparg Paul

Abstract
We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger>>>

Random Posts



.
.

Popular Posts