Wikidata Vandalism Detection

This website collects all the material of our research in one central place.

Vandalism Corpora

Vandalism Corpus WDVC-16

The Wikidata vandalism Corpus 2016 (WDVC-16) is a corpus for the evaluation of automatic vandalism detectors for Wikidata.

The dataset is available as part of the WSDM Cup 2017.

Publications

Stefan Heindorf, Martin Potthast, Gregor Engels, Benno Stein. Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017. In WSDM Cup 2017 Notebook Papers (WSDMCUP 17), 2017. [BibTex] [Paper]

Stefan Heindorf, Martin Potthast, Hannah Bast, Björn Buchhold, and Elmar Haussmann. WSDM Cup 2017: Vandalism Detection and Triple Scoring. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM 17), pages 827-828, February 2017. ACM. [BibTex] [Paper]

Vandalism Corpus WDVC-15

The Wikidata Vandalism Corpus 2015 (WDVC-15) is a corpus for the evaluation of automatic vandalism detectors for Wikidata. It contains 24 million Wikidata revisions of which 103 thousand were labeled as vandalism.

If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to refer to the corpus via [BibTex].

Download (4.5 GB)
(SHA1: ce416496054b06c50bec6790fad17e2cd5137a4b)

Wikidata Vandalism Corpus 2015 by Stefan Heindorf, Martin Potthast, Benno Stein, Gregor Engels is licensed under a Creative Commons Attribution 4.0 International License.

Publication

Stefan Heindorf, Martin Potthast, Benno Stein, and Gregor Engels. Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis. In Ricardo Baeza-Yates, Mounia Lalmas, Alistair Moffat, and Berthier Ribeiro-Neto, editors, 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15), pages 831-834, August 2015. ACM. ISBN 978-1-4503-3621-5 [BibTex] [Paper] [Poster]

Vandalism Detector

Vandalism Detector WDVD-2017

The Wikidata Vandalism Detector 2017 is a machine learning-based approach for automatic vandalism detection in Wikidata. It uses the same features and hyperparameters as the previous Wikidata Vandalism Detector 2016.

The source code is available here:

The data is available here:

Publications

Stefan Heindorf, Martin Potthast, Gregor Engels, Benno Stein. Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017. In WSDM Cup 2017 Notebook Papers (WSDMCUP 17), 2017. [BibTex] [Paper] [Code]

Stefan Heindorf, Martin Potthast, Hannah Bast, Björn Buchhold, and Elmar Haussmann. WSDM Cup 2017: Vandalism Detection and Triple Scoring. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM 17), pages 827-828, February 2017. ACM. [BibTex] [Paper]

Vandalism Detector WDVD-2016

The Wikidata Vandalism Detector 2016 (WDVD-2016) is a machine learning-based approach for automatic vandalism detection in Wikidata.

The source code is available here:

The data is available here:

Publication

Stefan Heindorf, Martin Potthast, Benno Stein, and Gregor Engels. Vandalism Detection in Wikidata. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM 16), pages 327-336, October 2016. ACM. [BibTex] [Paper] [Slides] [Code]