{"title":"德国政治话语中的框架检测:没有大规模人工语料库标注,我们还能走多远?","authors":"Qi Yu, Anselm Fliethmann","doi":"10.21248/jlcl.35.2022.227","DOIUrl":null,"url":null,"abstract":"Automated detection of frames in political discourses has gained increasing attention in natural language processing (NLP). Earlier studies in this area however focus heavily on frame detection in English using supervised machine learning approaches. Addressing the difficulty of the lack of annotated data for training and/or evaluating supervised models for low-resource languages, we investigate the potential of two NLP approaches that do not require large-scale manual corpus annotation from scratch: 1) LDA-based topic modelling, and 2) a combination of word2vec embeddings and handcrafted framing keywords based on a novel, expert-curated framing schema. We test these approaches using a novel corpus consisting of German-language news articles on the “Eu-ropean Refugee Crisis” between 2014-2018. We show that while topic modelling is insufficient in detecting frames in a dataset with highly homogeneous vocabulary, our second approach yields intriguing and more humanly interpretable results. This approach offers a promising opportunity to incorporate domain knowledge from political science and NLP techniques for bottom-up, explorative political text analyses.","PeriodicalId":137584,"journal":{"name":"Journal for Language Technology and Computational Linguistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Frame Detection in German Political Discourses: How Far Can We Go Without Large-Scale Manual Corpus Annotation?\",\"authors\":\"Qi Yu, Anselm Fliethmann\",\"doi\":\"10.21248/jlcl.35.2022.227\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated detection of frames in political discourses has gained increasing attention in natural language processing (NLP). Earlier studies in this area however focus heavily on frame detection in English using supervised machine learning approaches. Addressing the difficulty of the lack of annotated data for training and/or evaluating supervised models for low-resource languages, we investigate the potential of two NLP approaches that do not require large-scale manual corpus annotation from scratch: 1) LDA-based topic modelling, and 2) a combination of word2vec embeddings and handcrafted framing keywords based on a novel, expert-curated framing schema. We test these approaches using a novel corpus consisting of German-language news articles on the “Eu-ropean Refugee Crisis” between 2014-2018. We show that while topic modelling is insufficient in detecting frames in a dataset with highly homogeneous vocabulary, our second approach yields intriguing and more humanly interpretable results. This approach offers a promising opportunity to incorporate domain knowledge from political science and NLP techniques for bottom-up, explorative political text analyses.\",\"PeriodicalId\":137584,\"journal\":{\"name\":\"Journal for Language Technology and Computational Linguistics\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal for Language Technology and Computational Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21248/jlcl.35.2022.227\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal for Language Technology and Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21248/jlcl.35.2022.227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Frame Detection in German Political Discourses: How Far Can We Go Without Large-Scale Manual Corpus Annotation?
Automated detection of frames in political discourses has gained increasing attention in natural language processing (NLP). Earlier studies in this area however focus heavily on frame detection in English using supervised machine learning approaches. Addressing the difficulty of the lack of annotated data for training and/or evaluating supervised models for low-resource languages, we investigate the potential of two NLP approaches that do not require large-scale manual corpus annotation from scratch: 1) LDA-based topic modelling, and 2) a combination of word2vec embeddings and handcrafted framing keywords based on a novel, expert-curated framing schema. We test these approaches using a novel corpus consisting of German-language news articles on the “Eu-ropean Refugee Crisis” between 2014-2018. We show that while topic modelling is insufficient in detecting frames in a dataset with highly homogeneous vocabulary, our second approach yields intriguing and more humanly interpretable results. This approach offers a promising opportunity to incorporate domain knowledge from political science and NLP techniques for bottom-up, explorative political text analyses.