Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval最新文献

筛选
英文 中文
iGrade: an automated short answer grading system grade:一个自动的简答评分系统
Dina H Alhamed, Aljawharah Mohammad Alajmi, Y. Alali, T. A. Alqahtani, M. R. Alnassar, Dina A. Alabbad
{"title":"iGrade: an automated short answer grading system","authors":"Dina H Alhamed, Aljawharah Mohammad Alajmi, Y. Alali, T. A. Alqahtani, M. R. Alnassar, Dina A. Alabbad","doi":"10.1145/3582768.3582790","DOIUrl":"https://doi.org/10.1145/3582768.3582790","url":null,"abstract":"During the COVID-19 pandemic, most countries rely on E-Learning to apply social distance policy which affects the exams evaluation process. This project aimed to assist instructors in grading the short answer questions for CCSIT courses. By implanting a website application that the instructors could use to upload the students' answers and the ‘iGrade” software model will grade it. Moreover, the system will reduce the workload on the facilities members by saving time and effort as well as guarantee an objective grading for students. The model used in this project is a state-of-the-art BERT Neural Network model along with layers of BiLSTM that was trained using a dataset that has been collected from previous midterm and final exams of the CIS 211 course. The dataset consists of three categories which are (0, 0.5, 1) with around 1,128 instances. The \"iGrade\" test obtained an accuracy score of 85,4%, demonstrating BERT's superiority and independence from features during short answer grading as a default method in NLP. CCS CONCEPTS • Computing methodologies • Artificial intelligence • Natural language processing","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132170747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing the Impact of User Behaviors on the Popularity of Tweets: A Use Case from Masking Conversations During the Covid-19 Pandemic 分析用户行为对推文受欢迎程度的影响:在Covid-19大流行期间屏蔽对话的用例
Julia Warnken, S. Gokhale
{"title":"Analyzing the Impact of User Behaviors on the Popularity of Tweets: A Use Case from Masking Conversations During the Covid-19 Pandemic","authors":"Julia Warnken, S. Gokhale","doi":"10.1145/3582768.3582779","DOIUrl":"https://doi.org/10.1145/3582768.3582779","url":null,"abstract":"The Covid-19 pandemic has unleashed an infodemic of misinformation especially about important health measures such as vaccines and masks. Social media companies have struggled to keep up with identifying content that separates these falsehoods from the volumes of information that is shared over their platforms. Because automated detection approaches can only reach moderate accuracy (∼80%), some manual examination of the content to separate misinformation becomes necessary. This manual assessment can be efficient if it is limited to only those posts that are likely to be successful in gaining popularity. Predicting the future popularity of posts is certainly a function of their content, but it also depends on the actions of the users. In this paper, we analyze which users’ actions are significantly correlated with the popularity of their tweets, where the popularity is assessed using the numbers of likes and retweets. The investigation is conducted on a year-long data gathered by sampling Twitter conversations on the controversial issue of face masks during the acute, first year of the pandemic. User parameters are grouped into two – those that involve including various artifacts in the tweets to boost their popularity, and those that represent how users interact with other users and their content. After providing the context by which these short- and long-term actions build social relationships which help drive popularity, Pearson's correlation coefficients between these parameters and the numbers of likes and retweets are computed, along with their statistical significance. Our results indicate that the artifacts that users incorporate into their tweets including hashtags, mentions, URLs, and media have no significant influence on their popularity compared to how they interact with other users. Moreover, users may like other users’ tweets when they share follower-followee (impersonal) relationships, but they look for stronger, trusted friendships to actively retweet other users’ content. Thus, “liking” a tweet may be considered a much more casual endorsement compared to “retweeting”. These findings contradict observations from the pre-Covid era, perhaps suggesting that online behaviors during the pandemic may have altered fundamentally, underscoring the need for further research.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114569936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CWITR: A Corpus for Automatic Complex Word Identification in Turkish Texts 土耳其语文本复杂词自动识别的语料库
B. Ilgen, Chris Biemann
{"title":"CWITR: A Corpus for Automatic Complex Word Identification in Turkish Texts","authors":"B. Ilgen, Chris Biemann","doi":"10.1145/3582768.3582802","DOIUrl":"https://doi.org/10.1145/3582768.3582802","url":null,"abstract":"The Complex Word Identification (CWI) task aims to provide support to resolve accessibility barriers for people who experience difficulties with cognitive, language, and learning disabilities. The task is concerned with the detection and identification of complex words that are unusual and difficult to understand by certain target groups. CWI systems have a large impact on the output of Text Simplification (TS) systems. This paper revisits the CWI task by extending available datasets by creating a new CWI corpus. In this study, we collect a new CWI dataset (CWITR) of complex single and multi-token words consisting of different text genres for Turkish and prepare it for investigation of computational methods on discrimination between complex and non-complex words forms.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129553602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contextualised Modelling for Effective Citation Function Classification 有效引文功能分类的语境化建模
Xiaorui Jiang, Chaoxiang Cai, Wenwen Fan, Tong Liu, Jingqiang Chen
{"title":"Contextualised Modelling for Effective Citation Function Classification","authors":"Xiaorui Jiang, Chaoxiang Cai, Wenwen Fan, Tong Liu, Jingqiang Chen","doi":"10.1145/3582768.3582769","DOIUrl":"https://doi.org/10.1145/3582768.3582769","url":null,"abstract":"Citation function classification is an important task in scientific text mining. The past two decades have witnessed many computerised algorithms working on various citation function datasets tailored to various annotation schemes. Recently, deep learning has pushed the state of the art by a large margin. Several pitfalls exist. Due to annotation difficulty, data sizes, especially the minority classes, are often not big enough for training effective deep learning models. Being less discussed, most state-of-the-art deep learning solutions in fact generate a feature representation for the citation sentence or context, instead of modelling individual in-text citations. This is conceptually flawed as it is common to see multiple in-text citations with different functions in the same citation sentence. In addition, existing deep learning studies have only explored a rather limited design space of encoding citation and its surrounding context. This paper explored a wide range of modelling options based on SciBERT, the popular cross-disciplinary pre-trained scientific language model, and their performances on citation function classification, for the purpose of determining the most effective way of modeling citation and its context. To deal with the data size issue, we created a large-scale citation function dataset by mapping, merging and re-annotating six publicly available datasets from the computational linguistics domain by adapting Teufel et al.’s 12-class scheme. The best F1 scores we achieved were around 66.16%, 71.39% and 73.56% on a 11-class annotation scheme slightly adapted from Teufel et al.’s 12-class scheme, a reduced 7-class scheme by merging comparison functions, and Jurgens et al.’s 6-class scheme respectively. A useful observation is that there is no single best model that is superior for all functions, therefore the trained model variants allow for applications which emphasise on a specific type of or a specific group of citation functions.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132330768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Measuring the Cognitive Loads of Different Dialog Acts through Dependency Distance 依赖距离对不同对话行为认知负荷的测量
Dang Qi, Haitao Liu
{"title":"Towards Measuring the Cognitive Loads of Different Dialog Acts through Dependency Distance","authors":"Dang Qi, Haitao Liu","doi":"10.1145/3582768.3582775","DOIUrl":"https://doi.org/10.1145/3582768.3582775","url":null,"abstract":"Although relevance theory has called attention to the analysis of cognitive aspects of pragmatic phenomena, few investigations have explored whether distinct dialog acts (DAs) require different degrees of cognitive loads, not to mention examining them with objective indices. The current paper then adopted a syntactic cognitive index – dependency distance – to analyze whether distinct categories of DAs differ in cognitive loads. Specifically, this paper adopted mean dependency distance (MDD), mean hierarchical distance (MHD), and normalized dependency distance (NDD) to examine the language data in the Switchboard Dialog Act Corpus (SwDA). The results showed that MDD, MHD and NDD are all effective in differentiating four genres of DAs – Information Request (IR), Agreement (Ag), Understanding (Un), and Answering (An), among which IR has the highest values of the three indicators, Un has the lowest, and Ag and An are somewhere in between. A follow-up ANOVA further corroborated that the forward DA (IR) significantly differed from the backward ones (Ag, Un, and An). With these results obtained, this paper may shed light on the relationship between DAs and cognitive resources, providing a new perspective for the research under the paradigm of pragmatics.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115148190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Named Entity Recognition on COVID-19 Scientific Papers COVID-19科学论文的命名实体识别
A. Dao, Akiko Aizawa, Yuji Matsumoto
{"title":"Named Entity Recognition on COVID-19 Scientific Papers","authors":"A. Dao, Akiko Aizawa, Yuji Matsumoto","doi":"10.1145/3582768.3582786","DOIUrl":"https://doi.org/10.1145/3582768.3582786","url":null,"abstract":"Text mining techniques, especially named entity recognition (NER), play a vital role in supporting researchers for keeping track of hundred thousand of papers on COVID-19 related literature. Although a few research has been performed NER on COVID-19 scientific papers, very little is currently known concerning the behaviors of current entity recognition models in this new domain. Therefore, this ongoing study attempts to analyze current NER models’ performance and limitations on the CORD-19 dataset. By examining three NER models, this study showed that NER performance is improved with the similarity between the testing and pretraining data. When there are little manually annotated resources for COVID-19 NER exist, our analysis suggested that for training purposes, enhancing the dictionary for seed annotation is effective (not necessarily requiring costly human annotation).","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129988199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing BERT Performance with Contextual Valence Shifters for Panic Detection in COVID-19 Tweets 使用上下文价移位器增强BERT性能,用于COVID-19推文的恐慌检测
Sandra Mitrovic, Vani Kanjirangat
{"title":"Enhancing BERT Performance with Contextual Valence Shifters for Panic Detection in COVID-19 Tweets","authors":"Sandra Mitrovic, Vani Kanjirangat","doi":"10.1145/3582768.3582801","DOIUrl":"https://doi.org/10.1145/3582768.3582801","url":null,"abstract":"Panic phenomenon is one of the main challenges in the current pandemic time. In this work, we aim to explore the approaches to detect the panic-related COVID-19 tweets. Aligned to this, we propose an unsupervised clustering approach considering negation cues as an extracted feature input to the pre-trained model. This task cannot be done by simply applying state-of-the-art transformer models, since we observed that they occasionally fail in handling negations. Hence, we propose to utilize features based on Contextual Valence Shifters (CVS) along with the pre-trained BERT embeddings. We evaluate and compare the approaches in an unsupervised setup, using standard clustering metrics on a large set of COVID-19 tweets. The obtained results show that CVS effectively facilitates negation handling (positive/negative tweet discrimination).","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121740036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentimental Analysis on Social Media Comments with Recurring Models and Pretrained Word Embeddings in Portuguese 基于循环模型和预训练词嵌入的葡萄牙语社交媒体评论情感分析
Cristian Muoz Villalobos, Leonardo Mendoza Forero, Harold De Mello, Cesar Valencia, Alvaro Orjuela, R. Tanscheit, Marco Pacheco Cavalcanti
{"title":"Sentimental Analysis on Social Media Comments with Recurring Models and Pretrained Word Embeddings in Portuguese","authors":"Cristian Muoz Villalobos, Leonardo Mendoza Forero, Harold De Mello, Cesar Valencia, Alvaro Orjuela, R. Tanscheit, Marco Pacheco Cavalcanti","doi":"10.1145/3582768.3582805","DOIUrl":"https://doi.org/10.1145/3582768.3582805","url":null,"abstract":"Natural Language Processing (NLP) techniques are increasingly powerful for interpreting a person’s feelings and reaction to a product or service. Sentiment analysis has become a fundamental tool for this interpretation, and it has studies in languages other than English. This type of application is uncommon and unheard of in Portuguese. This article presents a sentiment analysis classification based on Portuguese social media comments. Representation of word embeddings with both pre-trained Glove and Word2Vec models were generated through a corpus entirely in Portuguese. This article presents a set of results with different models of pre-trained layers and deep learning models exclusive to the Portuguese language on social networks. Two classification models were used and compared: (i) Bidirectional Long Short-Term Memory (BI-LSTM) and (ii) Bidirectional Gated Recurrent Unit (BI-GRU), achieving accuracy results of 99.1","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131829785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting Source Information From News Articles: Information Extraction 从新闻文章中提取源信息:信息提取
Tabassum Sultana, Eric R. Harley, Gavin Adamson, Asmaa Malik
{"title":"Extracting Source Information From News Articles: Information Extraction","authors":"Tabassum Sultana, Eric R. Harley, Gavin Adamson, Asmaa Malik","doi":"10.1145/3582768.3582774","DOIUrl":"https://doi.org/10.1145/3582768.3582774","url":null,"abstract":"One of the factors influencing the credibility of news is source attribution. Ideally, news would be based on a balanced variety of sources. In this work we use spaCy1 and Python2 to identify sources of information cited in news articles and assign the sources to categories, as a first step in building software that assesses the balance and breadth of the sourcing in news articles. The preliminary testing of the software indicates that identification of the sources has a recall of 73% and accuracy of 95%, and the sources are categorized with overall accuracy of 78%.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133144251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sanitization of Sepsis News Sentences with the help of Paraphrasing 用释义对脓毒症新闻句子进行净化处理
Soma Das, S. Chatterji
{"title":"Sanitization of Sepsis News Sentences with the help of Paraphrasing","authors":"Soma Das, S. Chatterji","doi":"10.1145/3582768.3582773","DOIUrl":"https://doi.org/10.1145/3582768.3582773","url":null,"abstract":"The arrival of the internet in the late twentieth century, followed by social media in the twenty-first century, greatly increased the hazards of misinformation, disinformation, propaganda, and hoaxes. New ways of writing news have emerged to insert bias intelligently without making the news a piece of fake news. The correct news is usually manipulated to benefit a person, a group of individuals, a political party, or other factors, or changed to reflect sentiment or prominence. It is a challenging task to Sanitize such news content before presenting it to the reader. In this paper, we deal with the problematic English news sentences defined as Septic sentences. We have identified the Septic sentences and their Septic phrases using Machine Learning algorithms. Sanitization is the process of converting a Septic sentence into a Pure sentence. We illustrate the process of Sanitization in this paper with the help of paraphrasing. The model is able to Sanitize 76% of Septic sentences.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121473750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信