Ontology highlight
ABSTRACT:
SUBMITTER: Gienapp L
PROVIDER: S-EPMC9879940 | biostudies-literature | 2023 Jan
REPOSITORIES: biostudies-literature
Gienapp Lukas L Kircheis Wolfgang W Sievers Bjarne B Stein Benno B Potthast Martin M
Scientific data 20230126 1
We present the Webis-STEREO-21 dataset, a massive collection of Scientific Text Reuse in Open-access publications. It contains 91 million cases of reused text passages found in 4.2 million unique open-access publications. Cases range from overlap of as few as eight words to near-duplicate publications and include a variety of reuse types, ranging from boilerplate text to verbatim copying to quotations and paraphrases. Featuring a high coverage of scientific disciplines and varieties of reuse, as ...[more]