dc.contributor.author
Maier, Daniel
dc.contributor.author
Baden, Christian
dc.contributor.author
Stoltenberg, Daniela
dc.contributor.author
De Vries-Kedem, Maya
dc.contributor.author
Waldherr, Annie
dc.date.accessioned
2022-03-31T12:29:00Z
dc.date.available
2022-03-31T12:29:00Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/32510
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-32235
dc.description.abstract
The goal of this paper is to evaluate two methods for the topic modeling of multilingual document collections: (1) machine translation (MT), and (2) the coding of semantic concepts using a multilingual dictionary (MD) prior to topic modeling. We empirically assess the consequences of these approaches based on both a quantitative comparison of models and a qualitative validation of each method’s potentials and weaknesses. Our case study uses two text collections (of tweets and news articles) in three languages (English, Hebrew, Arabic), covering the ongoing local conflicts between Israeli authorities, settlers, and Palestinian Bedouins in the West Bank. We find that both methods produce a large share of equivalent topics, especially in the context of fairly homogenous news discourse, yet show limited but systematic differences when applied to highly heterogenous social media discourse. While the MD model delivers a more nuanced picture of conflict-related topics, it misses several more peripheral topics, especially those unrelated to the dictionary’s focus, which are picked up by the MT model. Our study is a first step toward instrument validation, indicating that both methods yield valid, comparable results, while method-specific differences remain.
en
dc.format.extent
20 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject
machine translation
en
dc.subject
multilingual dictionary
en
dc.subject
multilingual document collections
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
Machine Translation Vs. Multilingual Dictionaries Assessing Two Strategies for the Topic Modeling of Multilingual Text Collections
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.doi
10.1080/19312458.2021.1955845
dcterms.bibliographicCitation.journaltitle
Communication Methods and Measures
dcterms.bibliographicCitation.number
1
dcterms.bibliographicCitation.pagestart
19
dcterms.bibliographicCitation.pageend
38
dcterms.bibliographicCitation.volume
16
dcterms.bibliographicCitation.url
https://doi.org/10.1080/19312458.2021.1955845
refubium.affiliation
Politik- und Sozialwissenschaften
refubium.affiliation.other
Institut für Publizistik- und Kommunikationswissenschaft / Arbeitsstelle Kommunikationstheorie/Medienwirkungsforschung
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
1931-2466
refubium.resourceType.provider
WoS-Alert