D. Maier, C. Baden, Daniela Stoltenberg, Maya De Vries-Kedem, A. Waldherr
{"title":"机器翻译与多语言词典:评估多语言文本集主题建模的两种策略","authors":"D. Maier, C. Baden, Daniela Stoltenberg, Maya De Vries-Kedem, A. Waldherr","doi":"10.1080/19312458.2021.1955845","DOIUrl":null,"url":null,"abstract":"ABSTRACT The goal of this paper is to evaluate two methods for the topic modeling of multilingual document collections: (1) machine translation (MT), and (2) the coding of semantic concepts using a multilingual dictionary (MD) prior to topic modeling. We empirically assess the consequences of these approaches based on both a quantitative comparison of models and a qualitative validation of each method’s potentials and weaknesses. Our case study uses two text collections (of tweets and news articles) in three languages (English, Hebrew, Arabic), covering the ongoing local conflicts between Israeli authorities, settlers, and Palestinian Bedouins in the West Bank. We find that both methods produce a large share of equivalent topics, especially in the context of fairly homogenous news discourse, yet show limited but systematic differences when applied to highly heterogenous social media discourse. While the MD model delivers a more nuanced picture of conflict-related topics, it misses several more peripheral topics, especially those unrelated to the dictionary’s focus, which are picked up by the MT model. Our study is a first step toward instrument validation, indicating that both methods yield valid, comparable results, while method-specific differences remain.","PeriodicalId":47552,"journal":{"name":"Communication Methods and Measures","volume":"16 1","pages":"19 - 38"},"PeriodicalIF":6.3000,"publicationDate":"2021-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Machine Translation Vs. Multilingual Dictionaries Assessing Two Strategies for the Topic Modeling of Multilingual Text Collections\",\"authors\":\"D. Maier, C. Baden, Daniela Stoltenberg, Maya De Vries-Kedem, A. Waldherr\",\"doi\":\"10.1080/19312458.2021.1955845\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT The goal of this paper is to evaluate two methods for the topic modeling of multilingual document collections: (1) machine translation (MT), and (2) the coding of semantic concepts using a multilingual dictionary (MD) prior to topic modeling. We empirically assess the consequences of these approaches based on both a quantitative comparison of models and a qualitative validation of each method’s potentials and weaknesses. Our case study uses two text collections (of tweets and news articles) in three languages (English, Hebrew, Arabic), covering the ongoing local conflicts between Israeli authorities, settlers, and Palestinian Bedouins in the West Bank. We find that both methods produce a large share of equivalent topics, especially in the context of fairly homogenous news discourse, yet show limited but systematic differences when applied to highly heterogenous social media discourse. While the MD model delivers a more nuanced picture of conflict-related topics, it misses several more peripheral topics, especially those unrelated to the dictionary’s focus, which are picked up by the MT model. Our study is a first step toward instrument validation, indicating that both methods yield valid, comparable results, while method-specific differences remain.\",\"PeriodicalId\":47552,\"journal\":{\"name\":\"Communication Methods and Measures\",\"volume\":\"16 1\",\"pages\":\"19 - 38\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2021-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communication Methods and Measures\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1080/19312458.2021.1955845\",\"RegionNum\":1,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMMUNICATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communication Methods and Measures","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/19312458.2021.1955845","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMMUNICATION","Score":null,"Total":0}
Machine Translation Vs. Multilingual Dictionaries Assessing Two Strategies for the Topic Modeling of Multilingual Text Collections
ABSTRACT The goal of this paper is to evaluate two methods for the topic modeling of multilingual document collections: (1) machine translation (MT), and (2) the coding of semantic concepts using a multilingual dictionary (MD) prior to topic modeling. We empirically assess the consequences of these approaches based on both a quantitative comparison of models and a qualitative validation of each method’s potentials and weaknesses. Our case study uses two text collections (of tweets and news articles) in three languages (English, Hebrew, Arabic), covering the ongoing local conflicts between Israeli authorities, settlers, and Palestinian Bedouins in the West Bank. We find that both methods produce a large share of equivalent topics, especially in the context of fairly homogenous news discourse, yet show limited but systematic differences when applied to highly heterogenous social media discourse. While the MD model delivers a more nuanced picture of conflict-related topics, it misses several more peripheral topics, especially those unrelated to the dictionary’s focus, which are picked up by the MT model. Our study is a first step toward instrument validation, indicating that both methods yield valid, comparable results, while method-specific differences remain.
期刊介绍:
Communication Methods and Measures aims to achieve several goals in the field of communication research. Firstly, it aims to bring attention to and showcase developments in both qualitative and quantitative research methodologies to communication scholars. This journal serves as a platform for researchers across the field to discuss and disseminate methodological tools and approaches.
Additionally, Communication Methods and Measures seeks to improve research design and analysis practices by offering suggestions for improvement. It aims to introduce new methods of measurement that are valuable to communication scientists or enhance existing methods. The journal encourages submissions that focus on methods for enhancing research design and theory testing, employing both quantitative and qualitative approaches.
Furthermore, the journal is open to articles devoted to exploring the epistemological aspects relevant to communication research methodologies. It welcomes well-written manuscripts that demonstrate the use of methods and articles that highlight the advantages of lesser-known or newer methods over those traditionally used in communication.
In summary, Communication Methods and Measures strives to advance the field of communication research by showcasing and discussing innovative methodologies, improving research practices, and introducing new measurement methods.