Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)最新文献

筛选
英文 中文
OLALA: Object-Level Active Learning for Efficient Document Layout Annotation OLALA:高效文档布局标注的对象级主动学习
Zejiang Shen, Jian Zhao, Melissa Dell, Yaoliang Yu, Weining Li
{"title":"OLALA: Object-Level Active Learning for Efficient Document Layout Annotation","authors":"Zejiang Shen, Jian Zhao, Melissa Dell, Yaoliang Yu, Weining Li","doi":"10.18653/v1/2022.nlpcss-1.19","DOIUrl":"https://doi.org/10.18653/v1/2022.nlpcss-1.19","url":null,"abstract":"Layout detection is an essential step for accurately extracting structured contents from historical documents. The intricate and varied layouts present in these document images make it expensive to label the numerous layout regions that can be densely arranged on each page. Current active learning methods typically rank and label samples at the image level, where the annotation budget is not optimally spent due to the overexposure of common objects per image. Inspired by recent progress in semi-supervised learning and self-training, we propose OLALA, an Object-Level Active Learning framework for efficient document layout Annotation. OLALA aims to optimize the annotation process by selectively annotating only the most ambiguous regions within an image, while using automatically generated labels for the rest. Central to OLALA is a perturbation-based scoring function that determines which objects require manual annotation. Extensive experiments show that OLALA can significantly boost model performance and improve annotation efficiency, facilitating the extraction of masses of structured text for downstream NLP applications.","PeriodicalId":438120,"journal":{"name":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117273338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Analysis of Acknowledgments in NLP Conference Proceedings NLP会议论文集致谢分析
Winston Wu
{"title":"An Analysis of Acknowledgments in NLP Conference Proceedings","authors":"Winston Wu","doi":"10.18653/v1/2022.nlpcss-1.17","DOIUrl":"https://doi.org/10.18653/v1/2022.nlpcss-1.17","url":null,"abstract":"While acknowledgments are often overlooked and sometimes entirely missing from publications, this short section of a paper can provide insights on the state of a field. We characterize and perform a textual analysis of acknowledgments in NLP conference proceedings across the last 17 years, revealing broader trends in funding and research directions in NLP as well as interesting phenomena including career incentives and the influence of defaults.","PeriodicalId":438120,"journal":{"name":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","volume":"59 21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115593012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utilizing Weak Supervision to Create S3D: A Sarcasm Annotated Dataset 利用弱监督创建S3D:一个讽刺注释数据集
Jordan Painter, H. Treharne, Diptesh Kanojia
{"title":"Utilizing Weak Supervision to Create S3D: A Sarcasm Annotated Dataset","authors":"Jordan Painter, H. Treharne, Diptesh Kanojia","doi":"10.18653/v1/2022.nlpcss-1.22","DOIUrl":"https://doi.org/10.18653/v1/2022.nlpcss-1.22","url":null,"abstract":"Sarcasm is prevalent in all corners of social media, posing many challenges within Natural Language Processing (NLP), particularly for sentiment analysis. Sarcasm detection remains a largely unsolved problem in many NLP tasks due to its contradictory and typically derogatory nature as a figurative language construct. With recent strides in NLP, many pre-trained language models exist that have been trained on data from specific social media platforms, i.e., Twitter. In this paper, we evaluate the efficacy of multiple sarcasm detection datasets using machine and deep learning models. We create two new datasets - a manually annotated gold standard Sarcasm Annotated Dataset (SAD) and a Silver-Standard Sarcasm-annotated Dataset (S3D). Using a combination of existing sarcasm datasets with SAD, we train a sarcasm detection model over a social-media domain pre-trained language model, BERTweet, which yields an F1-score of 78.29%. Using an Ensemble model with an underlying majority technique, we further label S3D to produce a weakly supervised dataset containing over $100,000$ tweets. We publicly release all the code, our manually annotated and weakly supervised datasets, and fine-tuned models for further research.","PeriodicalId":438120,"journal":{"name":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130524409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
To Prefer or to Choose? Generating Agency and Power Counterfactuals Jointly for Gender Bias Mitigation 喜欢还是选择?发电机构和电力公司共同应对性别偏见
Maja Stahl, Maximilian Spliethöver, Henning Wachsmuth
{"title":"To Prefer or to Choose? Generating Agency and Power Counterfactuals Jointly for Gender Bias Mitigation","authors":"Maja Stahl, Maximilian Spliethöver, Henning Wachsmuth","doi":"10.18653/v1/2022.nlpcss-1.6","DOIUrl":"https://doi.org/10.18653/v1/2022.nlpcss-1.6","url":null,"abstract":"Gender bias may emerge from an unequal representation of agency and power, for example, by portraying women frequently as passive and powerless (“She accepted her future”) and men as proactive and powerful (“He chose his future”). When language models learn from respective texts, they may reproduce or even amplify the bias. An effective way to mitigate bias is to generate counterfactual sentences with opposite agency and power to the training. Recent work targeted agency-specific verbs from a lexicon to this end. We argue that this is insufficient, due to the interaction of agency and power and their dependence on context. In this paper, we thus develop a new rewriting model that identifies verbs with the desired agency and power in the context of the given sentence. The verbs’ probability is then boosted to encourage the model to rewrite both connotations jointly. According to automatic metrics, our model effectively controls for power while being competitive in agency to the state of the art. In our main evaluation, human annotators favored its counterfactuals in terms of both connotations, also deeming its meaning preservation better.","PeriodicalId":438120,"journal":{"name":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116360683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fine-Grained Extraction and Classification of Skill Requirements in German-Speaking Job Ads 德语招聘广告中技能要求的细粒度提取与分类
A. Gnehm, Eva Bühlmann, Helen Buchs, S. Clematide
{"title":"Fine-Grained Extraction and Classification of Skill Requirements in German-Speaking Job Ads","authors":"A. Gnehm, Eva Bühlmann, Helen Buchs, S. Clematide","doi":"10.18653/v1/2022.nlpcss-1.2","DOIUrl":"https://doi.org/10.18653/v1/2022.nlpcss-1.2","url":null,"abstract":"Monitoring the development of labor market skill requirements is an information need that is more and more approached by applying text mining methods to job advertisement data. We present an approach for fine-grained extraction and classification of skill requirements from German-speaking job advertisements. We adapt pre-trained transformer-based language models to the domain and task of computing meaningful representations of sentences or spans. By using context from job advertisements and the large ESCO domain ontology we improve our similarity-based unsupervised multi-label classification results. Our best model achieves a mean average precision of 0.969 on the skill class level.","PeriodicalId":438120,"journal":{"name":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130946495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Detecting Dissonant Stance in Social Media: The Role of Topic Exposure 社交媒体中不和谐立场的发现:话题曝光的作用
Vasudha Varadarajan, Nikita Soni, Weixi Wang, C. Luhmann, H. A. Schwartz, Naoya Inoue
{"title":"Detecting Dissonant Stance in Social Media: The Role of Topic Exposure","authors":"Vasudha Varadarajan, Nikita Soni, Weixi Wang, C. Luhmann, H. A. Schwartz, Naoya Inoue","doi":"10.18653/v1/2022.nlpcss-1.16","DOIUrl":"https://doi.org/10.18653/v1/2022.nlpcss-1.16","url":null,"abstract":"We address dissonant stance detection, classifying conflicting stance between two input statements.Computational models for traditional stance detection have typically been trained to indicate pro/con for a given target topic (e.g. gun control) and thus do not generalize well to new topics.In this paper, we systematically evaluate the generalizability of dissonant stance detection to situations where examples of the topic have not been seen at all or have only been seen a few times.We show that dissonant stance detection models trained on only 8 topics, none of which are the target topic, can perform as well as those trained only on a target topic. Further, adding non-target topics boosts performance further up to approximately 32 topics where accuracies start to plateau. Taken together, our experiments suggest dissonant stance detection models can generalize to new unanticipated topics, an important attribute for the social scientific study of social media where new topics emerge daily.","PeriodicalId":438120,"journal":{"name":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115842793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Conspiracy Narratives in the Protest Movement Against COVID-19 Restrictions in Germany. A Long-term Content Analysis of Telegram Chat Groups. 德国反新冠肺炎限制抗议运动中的阴谋叙事。电报聊天群的长期内容分析。
Manuel Weigand, Maximilian Weber, Johannes B. Gruber
{"title":"Conspiracy Narratives in the Protest Movement Against COVID-19 Restrictions in Germany. A Long-term Content Analysis of Telegram Chat Groups.","authors":"Manuel Weigand, Maximilian Weber, Johannes B. Gruber","doi":"10.18653/v1/2022.nlpcss-1.8","DOIUrl":"https://doi.org/10.18653/v1/2022.nlpcss-1.8","url":null,"abstract":"From the start of the COVID-19 pandemic in Germany, different groups have been protesting measures implemented by different government bodies in Germany to control the pandemic. It was widely claimed that many of the offline and online protests were driven by conspiracy narratives disseminated through groups and channels on the messenger app Telegram. We investigate this claim by measuring the frequency of conspiracy narratives in messages from open Telegram chat groups of the Querdenken movement, set up to organize protests against COVID-19 restrictions in Germany. We furthermore explore the content of these messages using topic modelling. To this end, we collected 822k text messages sent between April 2020 and May 2022 in 34 chat groups. By fine-tuning a Distilbert model, using self-annotated data, we find that 8.24% of the sent messages contain signs of conspiracy narratives. This number is not static, however, as the share of conspiracy messages grew while the overall number of messages shows a downward trend since its peak at the end of 2020. We further find a mix of known conspiracy narratives make up the topics in our topic model. Our findings suggest that the Querdenken movement is getting smaller over time, but its remaining members focus even more on conspiracy narratives.","PeriodicalId":438120,"journal":{"name":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","volume":"24 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114133200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding Narratives from Demographic Survey Data: a Comparative Study with Multiple Neural Topic Models 从人口调查数据中理解叙事:多神经主题模型的比较研究
Xiao Xu, Gert Stulp, Antal van den Bosch, Anne Gauthier
{"title":"Understanding Narratives from Demographic Survey Data: a Comparative Study with Multiple Neural Topic Models","authors":"Xiao Xu, Gert Stulp, Antal van den Bosch, Anne Gauthier","doi":"10.18653/v1/2022.nlpcss-1.4","DOIUrl":"https://doi.org/10.18653/v1/2022.nlpcss-1.4","url":null,"abstract":"Fertility intentions as verbalized in surveys are a poor predictor of actual fertility outcomes, the number of children people have. This can partly be explained by the uncertainty people have in their intentions. Such uncertainties are hard to capture through traditional survey questions, although open-ended questions can be used to get insight into people’s subjective narratives of the future that determine their intentions. Analyzing such answers to open-ended questions can be done through Natural Language Processing techniques. Traditional topic models (e.g., LSA and LDA), however, often fail to do since they rely on co-occurrences, which are often rare in short survey responses. The aim of this study was to apply and evaluate topic models on demographic survey data. In this study, we applied neural topic models (e.g. BERTopic, CombinedTM) based on language models to responses from Dutch women on their fertility plans, and compared the topics and their coherence scores from each model to expert judgments. Our results show that neural models produce topics more in line with human interpretation compared to LDA. However, the coherence score could only partly reflect on this, depending on the corpus used for calculation. This research is important because, first, it helps us develop more informed strategies on model selection and evaluation for topic modeling on survey data; and second, it shows that the field of demography has much to gain from adopting NLP methods.","PeriodicalId":438120,"journal":{"name":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129676624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional Language Models for Community-Level Linguistic Variation 社区层面语言变异的条件语言模型
Bill Noble, Jean-Philippe Bernardy
{"title":"Conditional Language Models for Community-Level Linguistic Variation","authors":"Bill Noble, Jean-Philippe Bernardy","doi":"10.18653/v1/2022.nlpcss-1.9","DOIUrl":"https://doi.org/10.18653/v1/2022.nlpcss-1.9","url":null,"abstract":"Community-level linguistic variation is a core concept in sociolinguistics. In this paper, we use conditioned neural language models to learn vector representations for 510 online communities. We use these representations to measure linguistic variation between commu-nities and investigate the degree to which linguistic variation corresponds with social connections between communities. We find that our sociolinguistic embeddings are highly correlated with a social network-based representation that does not use any linguistic input.","PeriodicalId":438120,"journal":{"name":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","volume":"22 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116858370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can Contextualizing User Embeddings Improve Sarcasm and Hate Speech Detection? 语境化用户嵌入能提高讽刺和仇恨言论的检测吗?
Kim Breitwieser
{"title":"Can Contextualizing User Embeddings Improve Sarcasm and Hate Speech Detection?","authors":"Kim Breitwieser","doi":"10.18653/v1/2022.nlpcss-1.14","DOIUrl":"https://doi.org/10.18653/v1/2022.nlpcss-1.14","url":null,"abstract":"While implicit embeddings so far have been mostly concerned with creating an overall representation of the user, we evaluate a different approach. By only considering content directed at a specific topic, we create sub-user embeddings, and measure their usefulness on the tasks of sarcasm and hate speech detection. In doing so, we show that task-related topics can have a noticeable effect on model performance, especially when dealing with intended expressions like sarcasm, but less so for hate speech, which is usually labelled as such on the receiving end.","PeriodicalId":438120,"journal":{"name":"Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123187715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信