December 11, 2014

Breaking news and analysis from the world of science policy : Study of massive preprint archive hints at the geography of plagiarism - ScienceInsider

New analyses of the hundreds of thousands of technical manuscripts submitted to arXiv, the repository of digital preprint articles, are offering some intriguing insights into the consequences—and geography—of scientific plagiarism. It appears that copying text from other papers is more common in some nations than others, but the outcome is generally the same for authors who copy extensively: Their papers don’t get cited much.
Since its founding in 1991, arXiv has become the world's largest venue for sharing findings in physics, math, and other mathematical fields. It publishes hundreds of papers daily and is fast approaching its millionth submission. Anyone can send in a paper, and submissions don’t get full peer review. However, the papers do go through a quality-control process. The final check is a computer program that compares the paper's text with the text of every other paper already published on arXiv. The goal is to flag papers that have a high likelihood of having plagiarized published work.
"Text overlap" is the technical term, and sometimes it turns out to be innocent. For example, a review article might quote generously from a paper the author cites, or the author might recycle and slightly update sentences from their own previous work. The arXiv plagiarism detector gives such papers a pass. "It's a fairly sophisticated machine learning logistic classifier," says arXiv founder Paul Ginsparg, a physicist at Cornell University. "It has special ways of detecting block quotes, italicized text, text in quotation marks, as well statements of mathematical theorems, to avoid false positives."
Only when there is no obvious reason for an author to have copied significant chunks of text from already published work—particularly if that previous work is not cited and has no overlap in authorship—does the software affix a “flag” to the article, including links to the papers from which it has text overlap. That standard “is much more lenient" than those used by most scientific journals, Ginsparg says.
To explore some of the consequences of "text reuse," Ginsparg and Cornell physics Ph.D. student Daniel Citron compared the text from each of the 757,000 articles submitted to arXiv between 1991 and 2012. The headline from that study, published Monday in the Proceedings of the National Academy of Sciences (PNAS) is that the more text a paper poaches from already published work, the less frequently that paper tends to be cited. (The full paper is also available for free on arXiv.) It also found that text reuse is surprisingly common. After filtering out review articles and legitimate quoting, about one in 16 arXiv authors were found to have copied long phrases and sentences from their own previously published work that add up to about the same amount of text as this entire article. More worryingly, about one out of every 1000 of the submitting authors copied the equivalent of a paragraph's worth of text from other people's papers without citing them.
So where in the world is all this text reuse happening? Conspicuously missing from the PNAS paper is a global map of potential plagiarism. Whenever an author submits a paper to arXiv, the author declares his or her country of residence. So it should be possible to reveal which countries have the highest proportion of plagiarists. The reason no map was included, Ginsparg told ScienceInsider, is that all the text overlap detected in their study is not necessarily plagiarism.
Ginsparg did agree, however, to share arXiv’s flagging data with ScienceInsider. Since 1 August 2011, when arXiv began systematically flagging for text overlap, 106,262 authors from 151 nations have submitted a total of 301,759 articles. (Each paper can have many more co-authors.) Overall, 3.2% (9591) of the papers were flagged. It's not just papers submitted en masse by a few bad apples, either. Those flagged papers came from 6% (6737) of the submitting authors. Put another way, one out of every 16 researchers who have submitted a paper to arXiv since August 2011 has been flagged by the plagiarism detector at least once.
The map above, prepared by ScienceInsider, takes a conservative approach. It shows only the incidence of flagged authors for the 57 nations with at least 100 submitted papers, to minimize distortion from small sample sizes. (In Ethiopia, for example, there are only three submitting authors and two of them have been flagged.)
Researchers from countries that submit the lion's share of arXiv papers—the United States, Canada, and a small number of industrialized countries in Europe and Asia—tend to plagiarize less often than researchers elsewhere. For example, more than 20% (38 of 186) of authors who submitted papers from Bulgaria were flagged, more than eight times the proportion from New Zealand (five of 207). In Japan, about 6% (269 of 4759) of submitting authors were flagged, compared with over 15% (164 out of 1054) from Iran.
Such disparities may be due in part to different academic cultures, Ginsparg and Citron say in their PNAS study. They chalk up scientific plagiarism to "differences in academic infrastructure and mentoring, or incentives that emphasize quantity of publication over quality."
*Correction, 11 December, 4:57 p.m.:  The map has been corrected to reflect current national boundaries.

Random Posts


  • Letter to Editors

    From: "ODTU Rektor" To: jhep-eo@jhep.sissa.itSubject: Ethics CommitteeDate: Wed, 21 Mar 2007 10:22:46 +0200 Dear Editors,We are writing this message concerning a serious plagiarism case that we have come across with in a paper published by JHEP. First of all, we would like to express our disappointm... READ MORE>>

  • Experimenting with plagiarism detection on the arXiv:PHYSICS TODAY

    Toni Feder Starting this summer, submissions to the arXiv, the online server where many physicists check daily for new preprints, will be compared with the server's existing 400 000—and counting—manuscripts to check for plagiarism. When plagiarism is suspected, the submission will be flagged, and ... READ MORE>>

  • Plagiarism Detection in arXiv (2007)

    Sorokina Daria, Gehrke Johannes, Warner Simeon, Ginsparg Paul Abstract We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few... READ MORE>>

  • Will anything really change? Views from a journal editor

    Michael 2007;4:53–56The title of this conference is «Research misconduct: learning the lessons». However, the organizers seem to be somewhat confused, because the title of this last session today is: «Will anything really change?» If you really have learnt your lessons, then things will change. Howe... READ MORE>>

  • Plagiarists face clampdown : TIMES HIGHER EDUCATION

    Phil Baty More cases of academic fraud come to light as institutions embrace zero-tolerance culture, reports Phil Baty.>>> READ MORE>>

  • Trolling the arXiv for plagiarism

    John Timmer In a subscription-only report on an upcoming conference presentation, Nature spills the beans on what may be our best handle yet on plagiarism in the world of academic science. Most research into this area has been limited by the inaccessibility of many of the peer-reviewed journals, wh... READ MORE>>

  • Corruption and Fraud in Science

    Water, Air & Soil Pollution (2006) DOI 10.1007/s11270-006-9209-8 J. T. Trevors & M. H. Saier, Jr. Science is conducted by people, not all of whom are honest and credible, and some of whom unfortunately do not place the interests of humanity and our common biosphere ahead of their own self... READ MORE>>

.

.
.

Popular Posts