Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval最新文献

筛选
英文 中文
Explaining Math Word Problem Solvers 解释数学单词问题的解决方法
Abby Newcomb, J. Kalita
{"title":"Explaining Math Word Problem Solvers","authors":"Abby Newcomb, J. Kalita","doi":"10.1145/3582768.3582777","DOIUrl":"https://doi.org/10.1145/3582768.3582777","url":null,"abstract":"Automated math word problem solvers based on neural networks have successfully managed to obtain 70-80% accuracy in solving arithmetic word problems. However, it has been shown that these solvers may rely on superficial patterns to obtain their equations. In order to determine what information math word problem solvers use to generate solutions, we remove parts of the input and measure the model’s performance on the perturbed dataset. Our results show that the model is not sensitive to the removal of many words from the input and can still manage to find a correct answer when given a nonsense question. This indicates that automatic solvers do not follow the semantic logic of math word problems, and may be overfitting to the presence of specific words.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122721757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hate Speech Detection on Indonesian Social Media: A Preliminary Study on Code-Mixed Language Issue 印尼社交媒体上的仇恨言论检测:语码混合问题的初步研究
Endang Wahyu Pamungkas, A. Fatmawati, Farah Danisha Salam
{"title":"Hate Speech Detection on Indonesian Social Media: A Preliminary Study on Code-Mixed Language Issue","authors":"Endang Wahyu Pamungkas, A. Fatmawati, Farah Danisha Salam","doi":"10.1145/3582768.3582771","DOIUrl":"https://doi.org/10.1145/3582768.3582771","url":null,"abstract":"Nowadays, social media becomes an important media for online communication, facilitating its users to publish content and providing a medium to express their opinions and feelings about anything. At the same time, abusive language is becoming a relevant problem on social media platforms such as Facebook and Twitter. Geographically, Indonesia consists of several regions with their own local languages. A recent report shows 718 local languages used by different regions and tribes in Indonesia. Indonesian tend to use a mix of their own local language and Bahasa to communicate on social media platforms, such as Twitter. Similar to other languages, code-mixed is also becoming the main issue and challenge of detecting hate speech in Indonesian social media. In this study, we conduct a preliminary experiment to study the detection of hate speech in Indonesian social media, specifically Twitter. Our experiment used 6,115 tweets in Indonesian-Javanese code-mixed and 2,945 tweets in Indonesian-Sundanese code-mixed. The overall results show that the traditional machine learning model with lexical-based features obtained the best performance in Javanese-Indonesian, while the LSTM network achieved the best performance in Sundanese-Indonesian. We also found that translating the code-mixed data into more resource-rich languages could not help to improve the classification performance.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116920989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Task-specific pre-training improves models for paraphrase generation 特定任务的预训练改进了释义生成模型
O. Skurzhanskyi, O. Marchenko
{"title":"Task-specific pre-training improves models for paraphrase generation","authors":"O. Skurzhanskyi, O. Marchenko","doi":"10.1145/3582768.3582791","DOIUrl":"https://doi.org/10.1145/3582768.3582791","url":null,"abstract":"Paraphrase generation is a fundamental and longstanding problem in the Natural Language Processing field. With the huge success of transfer learning, the pre-train → fine-tune approach has become a standard choice. At the same time, popular task-agnostic pre-trainings usually require gigabyte datasets and hundreds of GPUs, while available pre-trained models are limited by fixed architecture and size (i.e. base, large). We propose a simple and efficient pre-training approach specifically for paraphrase generation, which noticeably boosts model quality and matches the performance of general-purpose pre-trained models. We also investigate how this procedure influences the scores across different architectures and show that it works for all of them.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124227496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Community Asset Ontology for Modeling Community Data using Information Extraction 基于信息抽取的社区数据建模的社区资产本体
Towhid Chowdhury, Naveen Sharma
{"title":"Community Asset Ontology for Modeling Community Data using Information Extraction","authors":"Towhid Chowdhury, Naveen Sharma","doi":"10.1145/3582768.3582778","DOIUrl":"https://doi.org/10.1145/3582768.3582778","url":null,"abstract":"In this paper, we analyze some data-related challenges to building resilient and sustainable communities, particularly how to computationally model the social and economical dynamic that exists within a community. To that end, we propose the Community Asset Ontology (CAO) for a knowledge graph that can encapsulate community data as modeled in existing social science literature. We utilize existing information extraction paradigms to map natural language community data to CAO and evaluate the usefulness of such an ontology-based approach compared to a baseline open information extraction approach.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116615655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Detection and Visualization of Information Structure in English 英语信息结构的自动检测与可视化
J. Blake, Evgeny Pyshkin, Šimon Pavlík
{"title":"Automatic Detection and Visualization of Information Structure in English","authors":"J. Blake, Evgeny Pyshkin, Šimon Pavlík","doi":"10.1145/3582768.3582784","DOIUrl":"https://doi.org/10.1145/3582768.3582784","url":null,"abstract":"This paper describes the design and development of an online tool that identifies and visualizes information structure in user-submitted texts written in English. Non-native users of English find it difficult to distinguish between structures that are marked and unmarked. Markedness is evaluated based on acceptability and frequency of a sequence of word tokens. Marked sentences stand out as being unnatural to native speakers, but few native speakers can explain why. Information structure can, however, frequently explain markedness. The tool detects the three principles of information structure: information focus, information flow and end weight. Information focus explains the sequence of elements within sentences. Information flow explains the sequence of elements within paragraphs. End weight explains the relative position of phrases and clauses within a sentence. Through exposure to these principles in context, this tool aims to help writers of English understand which structural language features may be judged as marked.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122694949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Responding to customer queries automatically by customer reviews’ based Question Answering 通过基于客户评论的问答自动响应客户查询
Kunal Moharkar, Kartik Kshirsagar, Suruchi Shrey, Neha Pasine, Rishu Kumar, Mansi A. Radke
{"title":"Responding to customer queries automatically by customer reviews’ based Question Answering","authors":"Kunal Moharkar, Kartik Kshirsagar, Suruchi Shrey, Neha Pasine, Rishu Kumar, Mansi A. Radke","doi":"10.1145/3582768.3582780","DOIUrl":"https://doi.org/10.1145/3582768.3582780","url":null,"abstract":"The entire world has been undergoing its own digital transformation over the past few decades as technology has advanced in leaps and bounds. Following this, an increase in the number of people using digital platforms for buying products online likewise increases the number of questions or enquiries posted about a product on an online shopping platform like Amazon on a day to day basis. Though we have gone completely digital in posting these questions, the answering of these questions is still manual. The forums are rarely active. By the time the user gets an answer to his question, either he has bought that product already through offline means or has lost interest in buying that product since it is time consuming. Moreover, the questions which are asked are mostly repetitive. At times the answers are already out there since they have already been given to some other user who had asked the same question. Also, lot of answers are embedded in the user reviews. Therefore, the answers can be extracted from the existing product reviews. This may lead to increase in sale and greater customer satisfaction as his query is resolved in much lower response time. We have review-based question answering systems that aim at answering the questions from the reviews given on the product by other customers. However, the existing systems have certain drawbacks due to the use of RNN, like missing attention mechanism etc. In this work, we enhance the performance of the existing review based QA systems by carrying out some prototypical experiments with the basic models of NLP and then moving towards more advanced Language Models while identifying and rectifying the shortcomings of the existing model. Further, in this work a thorough comparative analysis of the models and approaches that have been worked on is presented. We have enhanced the current state of the art existing review QA systems by using BERT, BART and also applied various heuristics for comparison. We achieved the best BLEU score of 0.58 by using BERT, which is an improvement of 0.19 on the current existing system.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131896030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Word Embedding in Nepali Language using Word2Vec 使用Word2Vec的尼泊尔语词嵌入
Bipesh Subedi, Prakash Poudyal
{"title":"Word Embedding in Nepali Language using Word2Vec","authors":"Bipesh Subedi, Prakash Poudyal","doi":"10.1145/3582768.3582799","DOIUrl":"https://doi.org/10.1145/3582768.3582799","url":null,"abstract":"Word embedding is a technique for understanding the relationship among words by mapping words to numbers. Several kinds of research have been carried out in this field in different languages such as English, Hindi, Bengali etc. but very few works are available in the Nepali language domain. In this work, the word embedding technique using Word2Vec is implemented for Nepali news data. The methodology involved in this work includes Dataset preparation and Word2Vec modelling. Gensim package is used for implementing the Word2Vec model and its output shows the similarity between Nepali words. The work mainly focuses on developing word embedding on Nepali words generated by scraping the health section of Nepali news portals and has shown promising results.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115409903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Natural Language Processing of COVID-19 Reports Involving China in New York Times —a Machine-based Framing Study of Media Language 《纽约时报》新冠肺炎涉华报道的自然语言处理——基于机器框架的媒体语言研究
Zhixian Yang, Haiyan Men
{"title":"Natural Language Processing of COVID-19 Reports Involving China in New York Times —a Machine-based Framing Study of Media Language","authors":"Zhixian Yang, Haiyan Men","doi":"10.1145/3582768.3582785","DOIUrl":"https://doi.org/10.1145/3582768.3582785","url":null,"abstract":"Natural Language Processing (NLP) is a most promising and powerful method for big data analysis. It is gaining increasing attention from language researchers with its potentiality in information extraction, automatic indexing, textual framing, topic modeling, sensitivity analysis and other machine analytics studies. Through employing the LDA topic modeling and NLTK (Natural Language Toolkit) Vader SentimentAnalyser, this research makes a contrastive study of the overall news coverage in New York Times (NYT) against the backdrop of Covid-19 and its China-specific reports, with the aim of addressing what areas of concern were respectively selected and foregrounded to the public in these two types, what sensitivities were revealed and how linguistic devices were used to frame China's response to Covid-19. Analysis of metaphorical expressions in NYT shows that metaphors tended to be employed as a device to realize the dominant negative polarity latent in the reports and thus establish unfavourable images of China. This study deepens the methodological endeavors in media and linguistic studies through combining content analysis and machine-based analysis.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123248268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
False Positive Intent Detection Framework for Chatbot Annotation 聊天机器人标注误报意图检测框架
L. Lim, Samarth Agarwal, Xuejie Zhang, John Jianan Lu
{"title":"False Positive Intent Detection Framework for Chatbot Annotation","authors":"L. Lim, Samarth Agarwal, Xuejie Zhang, John Jianan Lu","doi":"10.1145/3582768.3582798","DOIUrl":"https://doi.org/10.1145/3582768.3582798","url":null,"abstract":"For chatbots answering thousands of user queries daily, it requires huge annotation efforts or explicit signals from users to identify incorrect chatbot predictions. Identification of such False Positives is key to improving chatbot accuracy and is a challenging problem due to the high cost and limited explicit signals from users. In this paper, we present a framework for automatically detecting False Positive intents in an employee chatbot through implicit feedback by capturing specific user behavior using techniques such as detection of repeated queries and leveraging on active learning sampling strategies to find cases where the chatbot might have provided an incorrect response. Using this approach within the bank, annotators can prioritize their efforts and detect False Positive intent approximately three times better than manual screening of random chatbot dialogues. This framework can be reused across different chatbot applications.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126014240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring Text-to-SQL Semantic Parsing Model on the Question Generalizability 基于问题泛化性的文本到sql语义解析模型度量
Thanakrit Julavanich, Akiko Aizawa
{"title":"Measuring Text-to-SQL Semantic Parsing Model on the Question Generalizability","authors":"Thanakrit Julavanich, Akiko Aizawa","doi":"10.1145/3582768.3582782","DOIUrl":"https://doi.org/10.1145/3582768.3582782","url":null,"abstract":"One of the challenges in NLP tasks, such as text-to-SQL semantic parsing, is generalization. In the text-to-SQL task, having separate training and testing data can measure one aspect of the generalization: how well the model generalizes to unseen databases. Other aspects, however, remain unaccounted for. We propose a new dataset and a more challenging and thorough evaluation process that focuses on the two challenges of generalizing the text-to-SQL model: database content references and question patterns. We create SPIDER-QG, an augmented dataset that employs three techniques, to assess generalizability. First, we replace the set of values in the existing test set with other values from the same column in the same database. Second, we use the synonym of each value as a replacement instead. Third, we generate new questions for the existing SQL query by back-translating the original question. Our evaluation setup demonstrates the generalization challenges and struggles of the current models.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126766438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信