Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval最新文献_第3页

NLP And IR Applications For Financial Reporting And Non-Financial Disclosure. Framework Implementation And Roadmap For Feasible Integration With The Accounting Process NLP和IR在财务报告和非财务披露中的应用。与会计流程可行整合的框架实施和路线图

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2022-12-16 DOI: 10.1145/3582768.3582796

A. Faccia, P. Petratos

{"title":"NLP And IR Applications For Financial Reporting And Non-Financial Disclosure. Framework Implementation And Roadmap For Feasible Integration With The Accounting Process","authors":"A. Faccia, P. Petratos","doi":"10.1145/3582768.3582796","DOIUrl":"https://doi.org/10.1145/3582768.3582796","url":null,"abstract":"Corporations produce financial and non-financial reports containing structured and unstructured data. In general, all organisations report information of some kind. Natural Language Processing (NLP) and Information Retrieval (IR) were fields developed from approximately the 1950s and have presented important applications, especially in the last three decades. Nevertheless, applications in accounting and finance have not developed accordingly, and a comprehensive framework is missing in the existing literature. This paper examines how NLP and IR can facilitate reporting and disclosure, both Financial and Non-Financial. The paper provides a brief literature review on NLP/IR applications in accounting and finance. It better informs and expands on the discussion of NLP/IR applications in academic research, professional organisations (i.e., IFRS), and industry. It explores some innovative applications of NLP/IR in unstructured data and its use in reporting and disclosure and FinTech applications. The main contribution is the definition of a complete framework that consistently analyses the possible NLP/IR applications in the accounting processes. We find that there can be many more applications of NLP/IR in accounting and finance and suggest future directions for research.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130385843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DIFFSTRACT: distinguishing the content of texts 摘要:区分文本的内容

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2022-12-16 DOI: 10.1145/3582768.3582787

Yanakorn Ruamsuk, A. Mingkhwan, H. Unger

{"title":"DIFFSTRACT: distinguishing the content of texts","authors":"Yanakorn Ruamsuk, A. Mingkhwan, H. Unger","doi":"10.1145/3582768.3582787","DOIUrl":"https://doi.org/10.1145/3582768.3582787","url":null,"abstract":"Nowadays, it is almost a standard issue to generate summaries of texts automatically. In contrast, it is still a problem to identify the differences in the statements of the two publications. For the most part, this still requires a human being to read and evaluate at least excerpts of the relevant passages. Finding a so-called text differentiation with appropriate tools is becoming an increasingly interesting and important task to effectively cope with the daily flood of information on the WWW. For years, co-occurrence graphs have been a proven means of deriving statements of various kinds from texts. So-called text- representing centroids (TRC's) has often been an effective tool for identifying, comparing and categorizing texts or sections. The present article examines how a different form of co-occurrence graphs can take place and be helpful. First, different co-occurrence graphs are built from a larger corpus and various individual texts or text groups. Subsequently, the calculated difference graphs can be used to create summaries that precisely characterize the differences between texts. Experimental results show that this new method works well.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132624157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph Based Pattern Classification for NLU and Slot Filling: Approach and Analysis 基于图的NLU模式分类与槽填充:方法与分析

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2022-12-16 DOI: 10.1145/3582768.3582770

J. Eggert, Johane Takeuchi

引用次数: 1

A Stylometric Dataset for Bengali Poems 孟加拉语诗歌的文体测量数据集

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2022-12-16 DOI: 10.1145/3582768.3582788

M. K. B. Shuhan, Rupasree Dey, Sourav Saha, Md Shafa Ul Anjum, T. S. Zaman

{"title":"A Stylometric Dataset for Bengali Poems","authors":"M. K. B. Shuhan, Rupasree Dey, Sourav Saha, Md Shafa Ul Anjum, T. S. Zaman","doi":"10.1145/3582768.3582788","DOIUrl":"https://doi.org/10.1145/3582768.3582788","url":null,"abstract":"Poetry is a form of literature that conveys feelings using different styles, aesthetics, and rhythms. The Bengali language has an enriched collection of poems. Every poet has an individual style of expressing their thoughts and emotions. However, stylometric research in this branch of the Bengali language is still in its early stage of development. In this paper, we have presented a stylometric dataset, which has 6,070 poems of 137 poets stored in the textual format. To the best of our knowledge, this is the first stylometric dataset for Bengali poems which will add an extra dimension to the expanding research arena of the Bengali language. To explore the usability of this dataset, we developed poem genre classifiers using deep learning that can classify these poems. Performance analysis of some deep learning classifiers has been presented in addition to classification. The classifiers include GRU and CNN. Among these two, GRU showed better performance by 91.48 in terms of the F1-score. The dataset will be publicly available at https://github.com/shuhanmirza/Bengali-Poem-Dataset after publishing this article.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115951729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A corpus of drafts of NLP papers from non-native English speakers 来自非英语母语者的NLP论文草稿的语料库

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2022-12-16 DOI: 10.1145/3582768.3582797

Haotong Wang, Liyan Wang, Lepage Yves

引用次数: 0

Changing Topics for Changing Times: Thematic and Temporal-Based Analysis of the Philippine Senatorial and Midterm Elections 变化时代的变化话题:菲律宾参议院和中期选举的主题和时间分析

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2022-12-16 DOI: 10.1145/3582768.3582781

Lamar Clarence Cruz, Jessica Nicole Dela Cruz, Shane Francis Maglangit, Mico Magtira, Joseph Marvin Imperial, Ramon Rodriguez

{"title":"Changing Topics for Changing Times: Thematic and Temporal-Based Analysis of the Philippine Senatorial and Midterm Elections","authors":"Lamar Clarence Cruz, Jessica Nicole Dela Cruz, Shane Francis Maglangit, Mico Magtira, Joseph Marvin Imperial, Ramon Rodriguez","doi":"10.1145/3582768.3582781","DOIUrl":"https://doi.org/10.1145/3582768.3582781","url":null,"abstract":"Elections in the Philippines are one of the most prominent events in the country. It is a time when people choose their leader through the democratic election process. During the electoral campaign, candidates try several strategies to win the election. However, victors emerge are those familiar and have mass-based support. In this paper, we have investigated the perception of netizens in the Philippines for the last two election seasons, 2019 and 2022, based on Twitter data. Thematic analyses were conducted to determine the themes related to winning candidates and create a link on how they may affect their election success. The study found that the top candidates (Cynthia Villar in 2019 and Robinhood Padilla in 2022) mainly received criticism and opposition towards their campaigns from the themes. Moreover, findings suggest that candidates with more negative or positive publicity emerge as winners. Long-time politicians and organized groups endorsed most winners during the 2019 and 2022 elections.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115123599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Analogy based Approach for Solving Target Sense Verification 一种基于类比的目标感验证方法

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2022-12-16 DOI: 10.1145/3582768.3582794

Georgios Zervakis, Emmanuel Vincent, Miguel Couceiro, Marc Schoenauer, Esteban Marquer

{"title":"An Analogy based Approach for Solving Target Sense Verification","authors":"Georgios Zervakis, Emmanuel Vincent, Miguel Couceiro, Marc Schoenauer, Esteban Marquer","doi":"10.1145/3582768.3582794","DOIUrl":"https://doi.org/10.1145/3582768.3582794","url":null,"abstract":"Contextualized language models have emerged as a de facto standard in natural language processing due to the vast amount of knowledge they acquire during pretraining. Nonetheless, their ability to solve tasks that require reasoning over this knowledge is limited. Certain tasks can be improved by analogical reasoning over concepts, e.g., understanding the underlying relations in “Man is to Woman as King is to Queen”. In this work, we propose a way to formulate target sense verification as an analogy detection task, by transforming the input data into quadruples. We present AB4TSV (Analogy and BERT for TSV), a model that uses BERT to represent the objects in these quadruples combined with a convolutional neural network to decide whether they constitute valid analogies. We test our system on the WiC-TSV evaluation benchmark, and show that it can outperform existing approaches. Our empirical study shows the importance of the input encoding for BERT. This dependence gets alleviated by integrating the axiomatic properties of analogies during training, while preserving performance and improving interpretability.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126485022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

E-VAN : Enhanced Variational AutoEncoder Network for Mitigating Gender Bias in Static Word Embeddings E-VAN:用于减少静态词嵌入中性别偏见的增强变分自编码器网络

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2022-12-16 DOI: 10.1145/3582768.3582804

Swati Tyagi, Jiaheng Xie, Rick Andrews

引用次数: 0

Random Division: An Effective Method for Chinese Text Classification 随机分类:一种有效的中文文本分类方法

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2022-12-16 DOI: 10.1145/3582768.3582783

Gao Mo

{"title":"Random Division: An Effective Method for Chinese Text Classification","authors":"Gao Mo","doi":"10.1145/3582768.3582783","DOIUrl":"https://doi.org/10.1145/3582768.3582783","url":null,"abstract":"As a fundamental part of natural language processing, text classification is the backbone of tasks and applications such as machine translation and classification. Among the text classification tasks of all languages, the one for Chinese appears to be one of the most challenging due to the complex structures and expressions within the nature of Chinese. Researchers generally require a significant amount of data for model training and tuning, while most of the time, that desired amount of data cannot be fulfilled and satisfied. Given the circumstances, we propose an effective data enhancement technique to lower the demand for data. The central principle is as follows: Randomize the acquired word vectors and tokens from tokenizing the text based on a certain density level (i.e., every group contains five words), then use the randomized results as data input. During the above process, a considerable number of data variations would be generated, easing the demand for data. From the experiments, we tested our theory on multiple Chinese natural language processing datasets and received signs of improvements in model performance across all the datasets used, thus proving the validity of the previously mentioned method.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123013788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Building term hierarchies using graph-based clustering 使用基于图的聚类构建术语层次结构

Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2022-12-16 DOI: 10.1145/3582768.3582807

Mark Hloch, Markus Van Meegen, M. Kubek, H. Unger

{"title":"Building term hierarchies using graph-based clustering","authors":"Mark Hloch, Markus Van Meegen, M. Kubek, H. Unger","doi":"10.1145/3582768.3582807","DOIUrl":"https://doi.org/10.1145/3582768.3582807","url":null,"abstract":"Classical tasks of a librarian, such as screening and categorizing new documents based on their content, are increasingly replaced by search engines or through the use of cataloging software. A first overview of a corpus topical orientation can be achieved by combining graph-based search engines and clustering methods. Existing classical clustering methods, however, often require an a priori specification of the desired number of clusters to be output and do not consider term relationships in graphs, which is deficient from a practical point of view. Therefore, fully unsupervised graph-based clustering approaches at the term level offer new possibilities that mitigate these shortcomings. Within this work, a set of novel graph-based clustering algorithms have been developed. The hierarchical clustering algorithm (HCA) forms term hierarchies by iteratively isolating nodes of a given co-occurrence graph based on the evaluation of the edge weight between the nodes. Based on the co-occurrence graph inherent relationships of terms, a new graph is built agglomerative forming individual term clusters of related terms. The feasibility of the outlined methods for text analysis is shown.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122205727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0