October 11, 2007

Plagiarism: text-matching program offers an answer - Correspondance: NATURE

John Bechhoefer1
The removal of almost 70 papers from the arXiv server on suspicion of plagiarism is dismaying (Nature 449, 8; doi:10.1038/449008b 2007). But, in a similar way to that currently being tested by the cooperative group of publishers CrossRef ('Academic accused of living on borrowed lines' Nature 448, 632–633; doi:10.1038/448632b 2007), the search technology that led to this removal could be used to reduce future problems.
Every paper submitted to arXiv could be examined by a search engine that looks for overlap or correlation with all previous arXiv submissions. If enough of a match is found, a message could be sent to the submitter, listing the work(s) in which similarities have been detected. Should the submitter wish to proceed with their submission, the program would notify the editorial board and trigger an automatic review. The submitter would also be given the chance to explain that the flagged papers were not copied or that the copying was for some reason legitimate.
Such a system would address the problem of plagiarism only among papers published in arXiv, but apparently that would already be an improvement. And although plagiarists might opt to copy and translate from foreign-language journals, or simply alter wording enough to pass muster, making it more difficult will at least discourage the lazier offenders.
As journals should welcome eliminating plagiarism at the preprint stage before publication, they could support the effort by giving the arXiv site search access to their own full-text databases.


