Yifan Guo, David Brock, Alicia Lin, Tam Doan, Ali Khan, Paul Tarau
{"title":"Dependency Graphs for Summarization and Keyphrase Extraction: We present a real-time long document summarization and key-phrase extraction algorithm that utilizes a unified dependency graph.","authors":"Yifan Guo, David Brock, Alicia Lin, Tam Doan, Ali Khan, Paul Tarau","doi":"10.1145/3582768.3582792","DOIUrl":"https://doi.org/10.1145/3582768.3582792","url":null,"abstract":"We introduce a graph-based summarization and keyphrase extraction system that uses dependency trees as inputs for building a document graph. The document graph is built by connecting nodes containing lemmas and sentence identifiers after redirecting dependency links to emphasize semantically important entities. After applying a ranking algorithm to the document graph, we extract the highest ranked sentences as the summary. At the same time, the highest ranked lemmas are aggregated into keyphrases using their context in the dependency graph. Our algorithm specializes in handling long documents, including scientific, technical, legal, and medical documents.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121824932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of advertisement articles using sentiment analysis: (Research-based on Korean natural language processing and deep learning technology)","authors":"Yongjun Kim, Y. Byun","doi":"10.1145/3582768.3582800","DOIUrl":"https://doi.org/10.1145/3582768.3582800","url":null,"abstract":"We live in a flood of big data and information through computers, communications, social media, and mass media. In other words, we can get the information we want quickly and easily, but we have many questions about the accuracy and reliability of this information. That is, there are many problems in trying to obtain accurate knowledge of such reckless details, and in particular, advertisement articles provided by online newspapers need to be clearer and more manageable when individuals try to find precise information and reports. Such experiences are threatened even to the foundation of existence due to distrust of Internet newspapers and advertisement evasion. To solve this problem, this study used emotion analysis of natural language processing to classify general and advertisement articles. Getting going Existing similar studies have mainly been undertaken to classify such advertisement articles, such as spam mail classification, and most of these studies used general natural language processing. However, this paper is a study that analyzes text data to understand further the meaning of the words, sentences, and phrases and adds steps to explore emotions to provide more accurate information that individuals want.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126939010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extraction of Common Physical Properties of Everyday Objects from Structured Sources","authors":"Viktor Losing, J. Eggert","doi":"10.1145/3582768.3582772","DOIUrl":"https://doi.org/10.1145/3582768.3582772","url":null,"abstract":"Commonsense knowledge is essential for the reasoning of AI systems, particularly in the context of action planning for robots. The focus of this paper is on common-sense object properties, which are especially useful to restrict the search space of planning algorithms. Popular sources for such knowledge are commonsense knowledge bases that provide the information in a structured form. However, the utility of the provided object-property pairs is limited as they can be simply incorrect, subjective, unspecific, or relate only to a narrow context. In this paper, we suggest a methodology to create a highly accurate dataset of object properties that are related to common physical attributes. The approach is based on filtering non-physical properties within commonsense knowledge bases and improving the accuracy of the remaining object-property pairs based on supervised machine learning using annotated data. Thereby, we evaluate different types of features and models and significantly increase the correctness of object-property pairs compared to the original sources.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127528384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vietnamese Text Summarization Based on Elementary Discourse Units","authors":"Khang Nhut Lam, Tai Ngoc Nguyen, J. Kalita","doi":"10.1145/3582768.3582793","DOIUrl":"https://doi.org/10.1145/3582768.3582793","url":null,"abstract":"This paper presents text summarization models based on elementary discourse units (EDUs) to construct extractive and abstractive summarization for Vietnamese documents. First, we introduce algorithms using the POS information for constructing EDUs in Vietnamese. Then, the EDUs created are fed into an extractive summarization model using a pointer network and an abstractive summarization model using a pointer generator model. A reinforcement learning method is used to improve the quality of the models. We perform experiments on the CTUNLPSUM dataset, including 1,053,702 Vietnamese documents extracted from online magazines. The extractive summarization models based on EDUs outperform other extractive summarization models based on words or sentences. The ROUGE-1, ROUGE-2, and ROUGE-L of the best extractive and abstractive summarization models are 0.567, 0.241, 0.461; and 0.530, 0.213, 0.394, respectively.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121263618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Izunna Okpala, Guillermo Romera Rodriguez, Andrea Tapia, S. Halse, Jessica Kropczynski
{"title":"A Semantic Approach to Negation Detection and Word Disambiguation with Natural Language Processing","authors":"Izunna Okpala, Guillermo Romera Rodriguez, Andrea Tapia, S. Halse, Jessica Kropczynski","doi":"10.1145/3582768.3582789","DOIUrl":"https://doi.org/10.1145/3582768.3582789","url":null,"abstract":"This study aims to demonstrate the methods for detecting negations in a sentence by uniquely evaluating the lexical structure of the text via word-sense disambiguation. The proposed framework examines all the unique features in the various expressions within a text to resolve the contextual usage of all tokens and decipher the effect of negation on sentiment analysis. The application of popular expression detectors skips this important step, thereby neglecting the root words caught in the web of negation and making text classification difficult for machine learning and sentiment analysis. This study adopts the Natural Language Processing (NLP) approach to discover and antonimize words that were negated for better accuracy in text classification using a knowledge base provided by an NLP library called WordHoard. Early results show that our initial analysis improved on traditional sentiment analysis, which sometimes neglects negations or assigns an inverse polarity score. The SentiWordNet analyzer was improved by 35%, the Vader analyzer by 20% and the TextBlob by 6%.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122952877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preventing RNN from Using Sequence Length as a Feature","authors":"Jean-Thomas Baillargeon, Hélène Cossette, Luc Lamontagne","doi":"10.1145/3582768.3582776","DOIUrl":"https://doi.org/10.1145/3582768.3582776","url":null,"abstract":"Recurrent neural networks are deep learning topologies that can be trained to classify long documents. However, in our recent work, we found a critical problem with these cells: they can use the length differences between texts of different classes as a prominent classification feature. This has the effect of producing models that are brittle and fragile to concept drift, can provide misleading performances and are trivially explainable regardless of text content. This paper illustrates the problem using synthetic and real-world data and provides a simple solution using weight decay regularization.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"37 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113937966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches","authors":"Tim Schopf, Daniel Braun, F. Matthes","doi":"10.1145/3582768.3582795","DOIUrl":"https://doi.org/10.1145/3582768.3582795","url":null,"abstract":"Text classification of unseen classes is a challenging Natural Language Processing task and is mainly attempted using two different types of approaches. Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations. Zero-shot text classification approaches aim to generalize knowledge gained from a training task by assigning appropriate labels of unknown classes to text documents. Although existing studies have already investigated individual approaches to these categories, the experiments in literature do not provide a consistent comparison. This paper addresses this gap by conducting a systematic evaluation of different similarity-based and zero-shot approaches for text classification of unseen classes. Different state-of-the-art approaches are benchmarked on four text classification datasets, including a new dataset from the medical domain. Additionally, novel SimCSE [7] and SBERT-based [26] baselines are proposed, as other baselines used in existing work yield weak classification results and are easily outperformed. Finally, the novel similarity-based Lbl2TransformerVec approach is presented, which outperforms previous state-of-the-art approaches in unsupervised text classification. Our experiments show that similarity-based approaches significantly outperform zero-shot approaches in most cases. Additionally, using SimCSE or SBERT embeddings instead of simpler text representations increases similarity-based classification results even further.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130036007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","authors":"","doi":"10.1145/3582768","DOIUrl":"https://doi.org/10.1145/3582768","url":null,"abstract":"","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121728963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}