Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval最新文献

筛选
英文 中文
Dependency Graphs for Summarization and Keyphrase Extraction: We present a real-time long document summarization and key-phrase extraction algorithm that utilizes a unified dependency graph. 摘要和关键字提取的依赖图:我们提出了一种利用统一依赖图的实时长文档摘要和关键字提取算法。
Yifan Guo, David Brock, Alicia Lin, Tam Doan, Ali Khan, Paul Tarau
{"title":"Dependency Graphs for Summarization and Keyphrase Extraction: We present a real-time long document summarization and key-phrase extraction algorithm that utilizes a unified dependency graph.","authors":"Yifan Guo, David Brock, Alicia Lin, Tam Doan, Ali Khan, Paul Tarau","doi":"10.1145/3582768.3582792","DOIUrl":"https://doi.org/10.1145/3582768.3582792","url":null,"abstract":"We introduce a graph-based summarization and keyphrase extraction system that uses dependency trees as inputs for building a document graph. The document graph is built by connecting nodes containing lemmas and sentence identifiers after redirecting dependency links to emphasize semantically important entities. After applying a ranking algorithm to the document graph, we extract the highest ranked sentences as the summary. At the same time, the highest ranked lemmas are aggregated into keyphrases using their context in the dependency graph. Our algorithm specializes in handling long documents, including scientific, technical, legal, and medical documents.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121824932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification of advertisement articles using sentiment analysis: (Research-based on Korean natural language processing and deep learning technology) 基于情感分析的广告文章分类:(基于韩语自然语言处理和深度学习技术的研究)
Yongjun Kim, Y. Byun
{"title":"Classification of advertisement articles using sentiment analysis: (Research-based on Korean natural language processing and deep learning technology)","authors":"Yongjun Kim, Y. Byun","doi":"10.1145/3582768.3582800","DOIUrl":"https://doi.org/10.1145/3582768.3582800","url":null,"abstract":"We live in a flood of big data and information through computers, communications, social media, and mass media. In other words, we can get the information we want quickly and easily, but we have many questions about the accuracy and reliability of this information. That is, there are many problems in trying to obtain accurate knowledge of such reckless details, and in particular, advertisement articles provided by online newspapers need to be clearer and more manageable when individuals try to find precise information and reports. Such experiences are threatened even to the foundation of existence due to distrust of Internet newspapers and advertisement evasion. To solve this problem, this study used emotion analysis of natural language processing to classify general and advertisement articles. Getting going Existing similar studies have mainly been undertaken to classify such advertisement articles, such as spam mail classification, and most of these studies used general natural language processing. However, this paper is a study that analyzes text data to understand further the meaning of the words, sentences, and phrases and adds steps to explore emotions to provide more accurate information that individuals want.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126939010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extraction of Common Physical Properties of Everyday Objects from Structured Sources 从结构化资源中提取日常对象的共同物理属性
Viktor Losing, J. Eggert
{"title":"Extraction of Common Physical Properties of Everyday Objects from Structured Sources","authors":"Viktor Losing, J. Eggert","doi":"10.1145/3582768.3582772","DOIUrl":"https://doi.org/10.1145/3582768.3582772","url":null,"abstract":"Commonsense knowledge is essential for the reasoning of AI systems, particularly in the context of action planning for robots. The focus of this paper is on common-sense object properties, which are especially useful to restrict the search space of planning algorithms. Popular sources for such knowledge are commonsense knowledge bases that provide the information in a structured form. However, the utility of the provided object-property pairs is limited as they can be simply incorrect, subjective, unspecific, or relate only to a narrow context. In this paper, we suggest a methodology to create a highly accurate dataset of object properties that are related to common physical attributes. The approach is based on filtering non-physical properties within commonsense knowledge bases and improving the accuracy of the remaining object-property pairs based on supervised machine learning using annotated data. Thereby, we evaluate different types of features and models and significantly increase the correctness of object-property pairs compared to the original sources.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127528384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Vietnamese Text Summarization Based on Elementary Discourse Units 基于初级语篇单元的越南语文本摘要
Khang Nhut Lam, Tai Ngoc Nguyen, J. Kalita
{"title":"Vietnamese Text Summarization Based on Elementary Discourse Units","authors":"Khang Nhut Lam, Tai Ngoc Nguyen, J. Kalita","doi":"10.1145/3582768.3582793","DOIUrl":"https://doi.org/10.1145/3582768.3582793","url":null,"abstract":"This paper presents text summarization models based on elementary discourse units (EDUs) to construct extractive and abstractive summarization for Vietnamese documents. First, we introduce algorithms using the POS information for constructing EDUs in Vietnamese. Then, the EDUs created are fed into an extractive summarization model using a pointer network and an abstractive summarization model using a pointer generator model. A reinforcement learning method is used to improve the quality of the models. We perform experiments on the CTUNLPSUM dataset, including 1,053,702 Vietnamese documents extracted from online magazines. The extractive summarization models based on EDUs outperform other extractive summarization models based on words or sentences. The ROUGE-1, ROUGE-2, and ROUGE-L of the best extractive and abstractive summarization models are 0.567, 0.241, 0.461; and 0.530, 0.213, 0.394, respectively.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121263618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Semantic Approach to Negation Detection and Word Disambiguation with Natural Language Processing 基于自然语言处理的否定检测和消歧的语义方法
Izunna Okpala, Guillermo Romera Rodriguez, Andrea Tapia, S. Halse, Jessica Kropczynski
{"title":"A Semantic Approach to Negation Detection and Word Disambiguation with Natural Language Processing","authors":"Izunna Okpala, Guillermo Romera Rodriguez, Andrea Tapia, S. Halse, Jessica Kropczynski","doi":"10.1145/3582768.3582789","DOIUrl":"https://doi.org/10.1145/3582768.3582789","url":null,"abstract":"This study aims to demonstrate the methods for detecting negations in a sentence by uniquely evaluating the lexical structure of the text via word-sense disambiguation. The proposed framework examines all the unique features in the various expressions within a text to resolve the contextual usage of all tokens and decipher the effect of negation on sentiment analysis. The application of popular expression detectors skips this important step, thereby neglecting the root words caught in the web of negation and making text classification difficult for machine learning and sentiment analysis. This study adopts the Natural Language Processing (NLP) approach to discover and antonimize words that were negated for better accuracy in text classification using a knowledge base provided by an NLP library called WordHoard. Early results show that our initial analysis improved on traditional sentiment analysis, which sometimes neglects negations or assigns an inverse polarity score. The SentiWordNet analyzer was improved by 35%, the Vader analyzer by 20% and the TextBlob by 6%.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122952877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Preventing RNN from Using Sequence Length as a Feature 防止RNN使用序列长度作为特征
Jean-Thomas Baillargeon, Hélène Cossette, Luc Lamontagne
{"title":"Preventing RNN from Using Sequence Length as a Feature","authors":"Jean-Thomas Baillargeon, Hélène Cossette, Luc Lamontagne","doi":"10.1145/3582768.3582776","DOIUrl":"https://doi.org/10.1145/3582768.3582776","url":null,"abstract":"Recurrent neural networks are deep learning topologies that can be trained to classify long documents. However, in our recent work, we found a critical problem with these cells: they can use the length differences between texts of different classes as a prominent classification feature. This has the effect of producing models that are brittle and fragile to concept drift, can provide misleading performances and are trivially explainable regardless of text content. This paper illustrates the problem using synthetic and real-world data and provides a simple solution using weight decay regularization.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"37 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113937966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches 评估无监督文本分类:零概率和基于相似性的方法
Tim Schopf, Daniel Braun, F. Matthes
{"title":"Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches","authors":"Tim Schopf, Daniel Braun, F. Matthes","doi":"10.1145/3582768.3582795","DOIUrl":"https://doi.org/10.1145/3582768.3582795","url":null,"abstract":"Text classification of unseen classes is a challenging Natural Language Processing task and is mainly attempted using two different types of approaches. Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations. Zero-shot text classification approaches aim to generalize knowledge gained from a training task by assigning appropriate labels of unknown classes to text documents. Although existing studies have already investigated individual approaches to these categories, the experiments in literature do not provide a consistent comparison. This paper addresses this gap by conducting a systematic evaluation of different similarity-based and zero-shot approaches for text classification of unseen classes. Different state-of-the-art approaches are benchmarked on four text classification datasets, including a new dataset from the medical domain. Additionally, novel SimCSE [7] and SBERT-based [26] baselines are proposed, as other baselines used in existing work yield weak classification results and are easily outperformed. Finally, the novel similarity-based Lbl2TransformerVec approach is presented, which outperforms previous state-of-the-art approaches in unsupervised text classification. Our experiments show that similarity-based approaches significantly outperform zero-shot approaches in most cases. Additionally, using SimCSE or SBERT embeddings instead of simpler text representations increases similarity-based classification results even further.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130036007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval 2022年第六届自然语言处理与信息检索国际会议论文集
{"title":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","authors":"","doi":"10.1145/3582768","DOIUrl":"https://doi.org/10.1145/3582768","url":null,"abstract":"","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121728963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信