This website collects all the material of our research in one central place.
The Wikidata vandalism Corpus 2016 (WDVC-16) is a corpus for the evaluation of automatic vandalism detectors for Wikidata.
The dataset is available as part of the WSDM Cup 2017.
Stefan Heindorf, Martin Potthast, Hannah Bast, Björn Buchhold, and Elmar Haussmann. WSDM Cup 2017: Vandalism Detection and Triple Scoring. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM 17), pages 827-828, February 2017. ACM. [BibTex] [Paper]
The Wikidata Vandalism Corpus 2015 (WDVC-15) is a corpus for the evaluation of automatic vandalism detectors for Wikidata. It contains 24 million Wikidata revisions of which 103 thousand were labeled as vandalism.
If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to refer to the corpus via [BibTex].
Download (4.5 GB)
Wikidata Vandalism Corpus 2015 by Stefan Heindorf, Martin Potthast, Benno Stein, Gregor Engels is licensed under a Creative Commons Attribution 4.0 International License.
Stefan Heindorf, Martin Potthast, Benno Stein, and Gregor Engels. Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis. In Ricardo Baeza-Yates, Mounia Lalmas, Alistair Moffat, and Berthier Ribeiro-Neto, editors, 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15), pages 831-834, August 2015. ACM. ISBN 978-1-4503-3621-5 [BibTex] [Paper] [Poster]
The Wikidata Vandalism Detector 2016 (WDVD-2016) is a machine learning-based approach for automatic vandalism detection in Wikidata.
The source code is available here:
The data is available here:
Stefan Heindorf, Martin Potthast, Benno Stein, and Gregor Engels. Vandalism Detection in Wikidata. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM 16), pages 327-336, October 2016. ACM. [BibTex] [Paper] [Slides] [Code]