Chung-hong Chan, Jing Zeng, Hartmut Wessler, Marc Jungblut, Kasper Welbers, Joseph W. Bajjalieh, Wouter van Atteveldt, Scott L. Althaus
{"title":"Reproducible Extraction of Cross-lingual Topics (rectr)","authors":"Chung-hong Chan, Jing Zeng, Hartmut Wessler, Marc Jungblut, Kasper Welbers, Joseph W. Bajjalieh, Wouter van Atteveldt, Scott L. Althaus","doi":"10.1080/19312458.2020.1812555","DOIUrl":null,"url":null,"abstract":"ABSTRACT With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and proposes a new method – Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source-aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news using our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.","PeriodicalId":47552,"journal":{"name":"Communication Methods and Measures","volume":null,"pages":null},"PeriodicalIF":6.3000,"publicationDate":"2020-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/19312458.2020.1812555","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communication Methods and Measures","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/19312458.2020.1812555","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMMUNICATION","Score":null,"Total":0}
引用次数: 18
Abstract
ABSTRACT With global media content databases and online content being available, analyzing topical structures in different languages simultaneously has become an urgent computational task. Some previous studies have analyzed topics in a multilingual corpus by translating all items into a single language using a machine translation service, such as Google Translate. We argue that this method is not reproducible in the long run and proposes a new method – Reproducible Extraction of Cross-lingual Topics Using R (rectr). Our method utilizes open-source-aligned word embeddings to understand the cross-lingual meanings of words and has a mechanism to normalize residual influence from language differences. We present a benchmark that compares the topics extracted from a corpus of English, German, and French news using our method with methods used in the literature. We show that our method is not only reproducible but can also generate high-quality cross-lingual topics. We demonstrate how our method can be applied in tracking news topics across time and languages.
期刊介绍:
Communication Methods and Measures aims to achieve several goals in the field of communication research. Firstly, it aims to bring attention to and showcase developments in both qualitative and quantitative research methodologies to communication scholars. This journal serves as a platform for researchers across the field to discuss and disseminate methodological tools and approaches.
Additionally, Communication Methods and Measures seeks to improve research design and analysis practices by offering suggestions for improvement. It aims to introduce new methods of measurement that are valuable to communication scientists or enhance existing methods. The journal encourages submissions that focus on methods for enhancing research design and theory testing, employing both quantitative and qualitative approaches.
Furthermore, the journal is open to articles devoted to exploring the epistemological aspects relevant to communication research methodologies. It welcomes well-written manuscripts that demonstrate the use of methods and articles that highlight the advantages of lesser-known or newer methods over those traditionally used in communication.
In summary, Communication Methods and Measures strives to advance the field of communication research by showcasing and discussing innovative methodologies, improving research practices, and introducing new measurement methods.