dc.contributor.author
Bizer, Christian
dc.contributor.author
Mühleisen, Hannes
dc.date.accessioned
2018-06-08T07:17:02Z
dc.date.available
2012-10-25
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/17576
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-21460
dc.description.abstract
More and more websites embed structured data describing for instance products,
people, organizations, places, events, resumes, and cooking recipes into their
HTML pages using encoding standards such as Microformats, Microdatas and RDFa.
The Web Data Commons project extracts all Microformat, Microdata and RDFa data
from the Common Crawl web corpus, the largest and most up-todata web corpus
that is currently available to the public, and provides the extracted data for
download in the form of RDF-quads. In this paper, we give an overview of the
project and present statistics about the popularity of the different encoding
standards as well as the kinds of data that are published using each format.
de
dc.rights.uri
http://www.fu-berlin.de/sites/refubium/rechtliches/Nutzungsbedingungen
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
Web Data Commons – Extracting Structured Data from Two Large Web Corpora
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation
LDOW2012, April 16, 2012, Lyon, France
dcterms.bibliographicCitation.url
http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/MuehleisenBizerWebDataCommonsLdow2012.pdf
refubium.affiliation
Wirtschaftswissenschaft
de
refubium.affiliation.other
Wirtschaftsinformatik

refubium.mycore.fudocsId
FUDOCS_document_000000014832
refubium.resourceType.isindependentpub
no
refubium.mycore.derivateId
FUDOCS_derivate_000000002126
dcterms.accessRights.openaire
open access