Dina H Alhamed, Aljawharah Mohammad Alajmi, Y. Alali, T. A. Alqahtani, M. R. Alnassar, Dina A. Alabbad
{"title":"iGrade: an automated short answer grading system","authors":"Dina H Alhamed, Aljawharah Mohammad Alajmi, Y. Alali, T. A. Alqahtani, M. R. Alnassar, Dina A. Alabbad","doi":"10.1145/3582768.3582790","DOIUrl":"https://doi.org/10.1145/3582768.3582790","url":null,"abstract":"During the COVID-19 pandemic, most countries rely on E-Learning to apply social distance policy which affects the exams evaluation process. This project aimed to assist instructors in grading the short answer questions for CCSIT courses. By implanting a website application that the instructors could use to upload the students' answers and the ‘iGrade” software model will grade it. Moreover, the system will reduce the workload on the facilities members by saving time and effort as well as guarantee an objective grading for students. The model used in this project is a state-of-the-art BERT Neural Network model along with layers of BiLSTM that was trained using a dataset that has been collected from previous midterm and final exams of the CIS 211 course. The dataset consists of three categories which are (0, 0.5, 1) with around 1,128 instances. The \"iGrade\" test obtained an accuracy score of 85,4%, demonstrating BERT's superiority and independence from features during short answer grading as a default method in NLP. CCS CONCEPTS • Computing methodologies • Artificial intelligence • Natural language processing","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132170747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing the Impact of User Behaviors on the Popularity of Tweets: A Use Case from Masking Conversations During the Covid-19 Pandemic","authors":"Julia Warnken, S. Gokhale","doi":"10.1145/3582768.3582779","DOIUrl":"https://doi.org/10.1145/3582768.3582779","url":null,"abstract":"The Covid-19 pandemic has unleashed an infodemic of misinformation especially about important health measures such as vaccines and masks. Social media companies have struggled to keep up with identifying content that separates these falsehoods from the volumes of information that is shared over their platforms. Because automated detection approaches can only reach moderate accuracy (∼80%), some manual examination of the content to separate misinformation becomes necessary. This manual assessment can be efficient if it is limited to only those posts that are likely to be successful in gaining popularity. Predicting the future popularity of posts is certainly a function of their content, but it also depends on the actions of the users. In this paper, we analyze which users’ actions are significantly correlated with the popularity of their tweets, where the popularity is assessed using the numbers of likes and retweets. The investigation is conducted on a year-long data gathered by sampling Twitter conversations on the controversial issue of face masks during the acute, first year of the pandemic. User parameters are grouped into two – those that involve including various artifacts in the tweets to boost their popularity, and those that represent how users interact with other users and their content. After providing the context by which these short- and long-term actions build social relationships which help drive popularity, Pearson's correlation coefficients between these parameters and the numbers of likes and retweets are computed, along with their statistical significance. Our results indicate that the artifacts that users incorporate into their tweets including hashtags, mentions, URLs, and media have no significant influence on their popularity compared to how they interact with other users. Moreover, users may like other users’ tweets when they share follower-followee (impersonal) relationships, but they look for stronger, trusted friendships to actively retweet other users’ content. Thus, “liking” a tweet may be considered a much more casual endorsement compared to “retweeting”. These findings contradict observations from the pre-Covid era, perhaps suggesting that online behaviors during the pandemic may have altered fundamentally, underscoring the need for further research.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114569936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CWITR: A Corpus for Automatic Complex Word Identification in Turkish Texts","authors":"B. Ilgen, Chris Biemann","doi":"10.1145/3582768.3582802","DOIUrl":"https://doi.org/10.1145/3582768.3582802","url":null,"abstract":"The Complex Word Identification (CWI) task aims to provide support to resolve accessibility barriers for people who experience difficulties with cognitive, language, and learning disabilities. The task is concerned with the detection and identification of complex words that are unusual and difficult to understand by certain target groups. CWI systems have a large impact on the output of Text Simplification (TS) systems. This paper revisits the CWI task by extending available datasets by creating a new CWI corpus. In this study, we collect a new CWI dataset (CWITR) of complex single and multi-token words consisting of different text genres for Turkish and prepare it for investigation of computational methods on discrimination between complex and non-complex words forms.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129553602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contextualised Modelling for Effective Citation Function Classification","authors":"Xiaorui Jiang, Chaoxiang Cai, Wenwen Fan, Tong Liu, Jingqiang Chen","doi":"10.1145/3582768.3582769","DOIUrl":"https://doi.org/10.1145/3582768.3582769","url":null,"abstract":"Citation function classification is an important task in scientific text mining. The past two decades have witnessed many computerised algorithms working on various citation function datasets tailored to various annotation schemes. Recently, deep learning has pushed the state of the art by a large margin. Several pitfalls exist. Due to annotation difficulty, data sizes, especially the minority classes, are often not big enough for training effective deep learning models. Being less discussed, most state-of-the-art deep learning solutions in fact generate a feature representation for the citation sentence or context, instead of modelling individual in-text citations. This is conceptually flawed as it is common to see multiple in-text citations with different functions in the same citation sentence. In addition, existing deep learning studies have only explored a rather limited design space of encoding citation and its surrounding context. This paper explored a wide range of modelling options based on SciBERT, the popular cross-disciplinary pre-trained scientific language model, and their performances on citation function classification, for the purpose of determining the most effective way of modeling citation and its context. To deal with the data size issue, we created a large-scale citation function dataset by mapping, merging and re-annotating six publicly available datasets from the computational linguistics domain by adapting Teufel et al.’s 12-class scheme. The best F1 scores we achieved were around 66.16%, 71.39% and 73.56% on a 11-class annotation scheme slightly adapted from Teufel et al.’s 12-class scheme, a reduced 7-class scheme by merging comparison functions, and Jurgens et al.’s 6-class scheme respectively. A useful observation is that there is no single best model that is superior for all functions, therefore the trained model variants allow for applications which emphasise on a specific type of or a specific group of citation functions.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132330768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Measuring the Cognitive Loads of Different Dialog Acts through Dependency Distance","authors":"Dang Qi, Haitao Liu","doi":"10.1145/3582768.3582775","DOIUrl":"https://doi.org/10.1145/3582768.3582775","url":null,"abstract":"Although relevance theory has called attention to the analysis of cognitive aspects of pragmatic phenomena, few investigations have explored whether distinct dialog acts (DAs) require different degrees of cognitive loads, not to mention examining them with objective indices. The current paper then adopted a syntactic cognitive index – dependency distance – to analyze whether distinct categories of DAs differ in cognitive loads. Specifically, this paper adopted mean dependency distance (MDD), mean hierarchical distance (MHD), and normalized dependency distance (NDD) to examine the language data in the Switchboard Dialog Act Corpus (SwDA). The results showed that MDD, MHD and NDD are all effective in differentiating four genres of DAs – Information Request (IR), Agreement (Ag), Understanding (Un), and Answering (An), among which IR has the highest values of the three indicators, Un has the lowest, and Ag and An are somewhere in between. A follow-up ANOVA further corroborated that the forward DA (IR) significantly differed from the backward ones (Ag, Un, and An). With these results obtained, this paper may shed light on the relationship between DAs and cognitive resources, providing a new perspective for the research under the paradigm of pragmatics.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115148190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Named Entity Recognition on COVID-19 Scientific Papers","authors":"A. Dao, Akiko Aizawa, Yuji Matsumoto","doi":"10.1145/3582768.3582786","DOIUrl":"https://doi.org/10.1145/3582768.3582786","url":null,"abstract":"Text mining techniques, especially named entity recognition (NER), play a vital role in supporting researchers for keeping track of hundred thousand of papers on COVID-19 related literature. Although a few research has been performed NER on COVID-19 scientific papers, very little is currently known concerning the behaviors of current entity recognition models in this new domain. Therefore, this ongoing study attempts to analyze current NER models’ performance and limitations on the CORD-19 dataset. By examining three NER models, this study showed that NER performance is improved with the similarity between the testing and pretraining data. When there are little manually annotated resources for COVID-19 NER exist, our analysis suggested that for training purposes, enhancing the dictionary for seed annotation is effective (not necessarily requiring costly human annotation).","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129988199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing BERT Performance with Contextual Valence Shifters for Panic Detection in COVID-19 Tweets","authors":"Sandra Mitrovic, Vani Kanjirangat","doi":"10.1145/3582768.3582801","DOIUrl":"https://doi.org/10.1145/3582768.3582801","url":null,"abstract":"Panic phenomenon is one of the main challenges in the current pandemic time. In this work, we aim to explore the approaches to detect the panic-related COVID-19 tweets. Aligned to this, we propose an unsupervised clustering approach considering negation cues as an extracted feature input to the pre-trained model. This task cannot be done by simply applying state-of-the-art transformer models, since we observed that they occasionally fail in handling negations. Hence, we propose to utilize features based on Contextual Valence Shifters (CVS) along with the pre-trained BERT embeddings. We evaluate and compare the approaches in an unsupervised setup, using standard clustering metrics on a large set of COVID-19 tweets. The obtained results show that CVS effectively facilitates negation handling (positive/negative tweet discrimination).","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121740036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristian Muoz Villalobos, Leonardo Mendoza Forero, Harold De Mello, Cesar Valencia, Alvaro Orjuela, R. Tanscheit, Marco Pacheco Cavalcanti
{"title":"Sentimental Analysis on Social Media Comments with Recurring Models and Pretrained Word Embeddings in Portuguese","authors":"Cristian Muoz Villalobos, Leonardo Mendoza Forero, Harold De Mello, Cesar Valencia, Alvaro Orjuela, R. Tanscheit, Marco Pacheco Cavalcanti","doi":"10.1145/3582768.3582805","DOIUrl":"https://doi.org/10.1145/3582768.3582805","url":null,"abstract":"Natural Language Processing (NLP) techniques are increasingly powerful for interpreting a person’s feelings and reaction to a product or service. Sentiment analysis has become a fundamental tool for this interpretation, and it has studies in languages other than English. This type of application is uncommon and unheard of in Portuguese. This article presents a sentiment analysis classification based on Portuguese social media comments. Representation of word embeddings with both pre-trained Glove and Word2Vec models were generated through a corpus entirely in Portuguese. This article presents a set of results with different models of pre-trained layers and deep learning models exclusive to the Portuguese language on social networks. Two classification models were used and compared: (i) Bidirectional Long Short-Term Memory (BI-LSTM) and (ii) Bidirectional Gated Recurrent Unit (BI-GRU), achieving accuracy results of 99.1","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131829785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tabassum Sultana, Eric R. Harley, Gavin Adamson, Asmaa Malik
{"title":"Extracting Source Information From News Articles: Information Extraction","authors":"Tabassum Sultana, Eric R. Harley, Gavin Adamson, Asmaa Malik","doi":"10.1145/3582768.3582774","DOIUrl":"https://doi.org/10.1145/3582768.3582774","url":null,"abstract":"One of the factors influencing the credibility of news is source attribution. Ideally, news would be based on a balanced variety of sources. In this work we use spaCy1 and Python2 to identify sources of information cited in news articles and assign the sources to categories, as a first step in building software that assesses the balance and breadth of the sourcing in news articles. The preliminary testing of the software indicates that identification of the sources has a recall of 73% and accuracy of 95%, and the sources are categorized with overall accuracy of 78%.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133144251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sanitization of Sepsis News Sentences with the help of Paraphrasing","authors":"Soma Das, S. Chatterji","doi":"10.1145/3582768.3582773","DOIUrl":"https://doi.org/10.1145/3582768.3582773","url":null,"abstract":"The arrival of the internet in the late twentieth century, followed by social media in the twenty-first century, greatly increased the hazards of misinformation, disinformation, propaganda, and hoaxes. New ways of writing news have emerged to insert bias intelligently without making the news a piece of fake news. The correct news is usually manipulated to benefit a person, a group of individuals, a political party, or other factors, or changed to reflect sentiment or prominence. It is a challenging task to Sanitize such news content before presenting it to the reader. In this paper, we deal with the problematic English news sentences defined as Septic sentences. We have identified the Septic sentences and their Septic phrases using Machine Learning algorithms. Sanitization is the process of converting a Septic sentence into a Pure sentence. We illustrate the process of Sanitization in this paper with the help of paraphrasing. The model is able to Sanitize 76% of Septic sentences.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121473750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}