# Wikidata Research Articles Dataset ## Dataset Overview The "Wikidata Research Articles Dataset" comprises peer-reviewed full research papers about Wikidata from its first decade of existence (2012-2022). This dataset was curated to provide insights into the research focus of Wikidata, identify any gaps, and highlight the institutions actively involved in researching Wikidata. ### Dataset Contents - ` WD_Research_Articles_Dataset.csv`: CSV file containing the raw data with features and labels. - `LICENSE`: Creative Commons License file. ### Data Source The contents of this dataset were collected from the following digital libraries: ACM Digital Library, Springer Link, DBLP and Google Scholar. ### Data Preprocessing The raw data was cleaned by removing duplicates, non-english entries, non-papers (posters, presentations), short papers (less than 5 pages) and papers not focused on Wikidata. ### Data Dictionary | Column Name| Data Type| Description | |------------|----------|-------------| | item_type | String | Type of article | | publication_year | Integer | The year article was published | | country | String | The country where the research was performed | | institution | String | The institution where the research was performed | | author | String | The author(s) of the article | | title | String | The title of the article | | category | String | The research focus that the article is categorised| | publication_title | String | Title of the journal or publication containing the article | | conference_name | String | Name of the conference (for conference papers) | | publisher | String | The online publisher of the article | ### License This dataset is released under the [CC BY-SA 4.0.](LICENSE). ### Contact Information For any questions or feedback, please contact dataset maintainer: - Name: Mariam Farda-Sarbas - Email: mariam.fs@fu-berlin.de