{"title":"NLP And IR Applications For Financial Reporting And Non-Financial Disclosure. Framework Implementation And Roadmap For Feasible Integration With The Accounting Process","authors":"A. Faccia, P. Petratos","doi":"10.1145/3582768.3582796","DOIUrl":"https://doi.org/10.1145/3582768.3582796","url":null,"abstract":"Corporations produce financial and non-financial reports containing structured and unstructured data. In general, all organisations report information of some kind. Natural Language Processing (NLP) and Information Retrieval (IR) were fields developed from approximately the 1950s and have presented important applications, especially in the last three decades. Nevertheless, applications in accounting and finance have not developed accordingly, and a comprehensive framework is missing in the existing literature. This paper examines how NLP and IR can facilitate reporting and disclosure, both Financial and Non-Financial. The paper provides a brief literature review on NLP/IR applications in accounting and finance. It better informs and expands on the discussion of NLP/IR applications in academic research, professional organisations (i.e., IFRS), and industry. It explores some innovative applications of NLP/IR in unstructured data and its use in reporting and disclosure and FinTech applications. The main contribution is the definition of a complete framework that consistently analyses the possible NLP/IR applications in the accounting processes. We find that there can be many more applications of NLP/IR in accounting and finance and suggest future directions for research.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130385843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DIFFSTRACT: distinguishing the content of texts","authors":"Yanakorn Ruamsuk, A. Mingkhwan, H. Unger","doi":"10.1145/3582768.3582787","DOIUrl":"https://doi.org/10.1145/3582768.3582787","url":null,"abstract":"Nowadays, it is almost a standard issue to generate summaries of texts automatically. In contrast, it is still a problem to identify the differences in the statements of the two publications. For the most part, this still requires a human being to read and evaluate at least excerpts of the relevant passages. Finding a so-called text differentiation with appropriate tools is becoming an increasingly interesting and important task to effectively cope with the daily flood of information on the WWW. For years, co-occurrence graphs have been a proven means of deriving statements of various kinds from texts. So-called text- representing centroids (TRC's) has often been an effective tool for identifying, comparing and categorizing texts or sections. The present article examines how a different form of co-occurrence graphs can take place and be helpful. First, different co-occurrence graphs are built from a larger corpus and various individual texts or text groups. Subsequently, the calculated difference graphs can be used to create summaries that precisely characterize the differences between texts. Experimental results show that this new method works well.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132624157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph Based Pattern Classification for NLU and Slot Filling: Approach and Analysis","authors":"J. Eggert, Johane Takeuchi","doi":"10.1145/3582768.3582770","DOIUrl":"https://doi.org/10.1145/3582768.3582770","url":null,"abstract":"In Natural Language Understanding, semantic parsing refers to the task of extracting meaningful words from text, with slot filling being a special case. At the same time, syntactic parsing deals with the identification of syntactic structure in the text. In this paper, we analyze to which extent syntactic structure can be used for semantic parsing. For this purpose, we represent the syntactic structure of sentences from an annotated database as graphs, incrementally store a large number of graph prototypes in a knowledge base, and approach the slot filling problem as a graph matching problem. We analyze how this approach scales with the number of prototypes and show that it provides a lightweight framework that can be efficiently used to support semantic parsing tasks.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127411264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. K. B. Shuhan, Rupasree Dey, Sourav Saha, Md Shafa Ul Anjum, T. S. Zaman
{"title":"A Stylometric Dataset for Bengali Poems","authors":"M. K. B. Shuhan, Rupasree Dey, Sourav Saha, Md Shafa Ul Anjum, T. S. Zaman","doi":"10.1145/3582768.3582788","DOIUrl":"https://doi.org/10.1145/3582768.3582788","url":null,"abstract":"Poetry is a form of literature that conveys feelings using different styles, aesthetics, and rhythms. The Bengali language has an enriched collection of poems. Every poet has an individual style of expressing their thoughts and emotions. However, stylometric research in this branch of the Bengali language is still in its early stage of development. In this paper, we have presented a stylometric dataset, which has 6,070 poems of 137 poets stored in the textual format. To the best of our knowledge, this is the first stylometric dataset for Bengali poems which will add an extra dimension to the expanding research arena of the Bengali language. To explore the usability of this dataset, we developed poem genre classifiers using deep learning that can classify these poems. Performance analysis of some deep learning classifiers has been presented in addition to classification. The classifiers include GRU and CNN. Among these two, GRU showed better performance by 91.48 in terms of the F1-score. The dataset will be publicly available at https://github.com/shuhanmirza/Bengali-Poem-Dataset after publishing this article.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115951729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A corpus of drafts of NLP papers from non-native English speakers","authors":"Haotong Wang, Liyan Wang, Lepage Yves","doi":"10.1145/3582768.3582797","DOIUrl":"https://doi.org/10.1145/3582768.3582797","url":null,"abstract":"We created an English parallel corpus of 3,005 sentence pairs, each containing a well-polished text from ACL Anthology Reference Corpus (ACL-ARC) [1] and corresponding restated drafts collected from 26 non-native writers. The purpose of this paper is to explore the writing features of the drafts from non-native English speakers, so as to benefit research in Academic Writing Aid Systems. We present a feature analysis of the corpus based on handcrafted features. To assess utility, we formulate a draft identification task to automatically recognize drafts from ground truth texts based on hybrid features. We show that the combination of deep semantic features with the optimal handcrafted features improves identification accuracy on the collected data, up to 84.57%.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128702519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lamar Clarence Cruz, Jessica Nicole Dela Cruz, Shane Francis Maglangit, Mico Magtira, Joseph Marvin Imperial, Ramon Rodriguez
{"title":"Changing Topics for Changing Times: Thematic and Temporal-Based Analysis of the Philippine Senatorial and Midterm Elections","authors":"Lamar Clarence Cruz, Jessica Nicole Dela Cruz, Shane Francis Maglangit, Mico Magtira, Joseph Marvin Imperial, Ramon Rodriguez","doi":"10.1145/3582768.3582781","DOIUrl":"https://doi.org/10.1145/3582768.3582781","url":null,"abstract":"Elections in the Philippines are one of the most prominent events in the country. It is a time when people choose their leader through the democratic election process. During the electoral campaign, candidates try several strategies to win the election. However, victors emerge are those familiar and have mass-based support. In this paper, we have investigated the perception of netizens in the Philippines for the last two election seasons, 2019 and 2022, based on Twitter data. Thematic analyses were conducted to determine the themes related to winning candidates and create a link on how they may affect their election success. The study found that the top candidates (Cynthia Villar in 2019 and Robinhood Padilla in 2022) mainly received criticism and opposition towards their campaigns from the themes. Moreover, findings suggest that candidates with more negative or positive publicity emerge as winners. Long-time politicians and organized groups endorsed most winners during the 2019 and 2022 elections.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115123599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georgios Zervakis, Emmanuel Vincent, Miguel Couceiro, Marc Schoenauer, Esteban Marquer
{"title":"An Analogy based Approach for Solving Target Sense Verification","authors":"Georgios Zervakis, Emmanuel Vincent, Miguel Couceiro, Marc Schoenauer, Esteban Marquer","doi":"10.1145/3582768.3582794","DOIUrl":"https://doi.org/10.1145/3582768.3582794","url":null,"abstract":"Contextualized language models have emerged as a de facto standard in natural language processing due to the vast amount of knowledge they acquire during pretraining. Nonetheless, their ability to solve tasks that require reasoning over this knowledge is limited. Certain tasks can be improved by analogical reasoning over concepts, e.g., understanding the underlying relations in “Man is to Woman as King is to Queen”. In this work, we propose a way to formulate target sense verification as an analogy detection task, by transforming the input data into quadruples. We present AB4TSV (Analogy and BERT for TSV), a model that uses BERT to represent the objects in these quadruples combined with a convolutional neural network to decide whether they constitute valid analogies. We test our system on the WiC-TSV evaluation benchmark, and show that it can outperform existing approaches. Our empirical study shows the importance of the input encoding for BERT. This dependence gets alleviated by integrating the axiomatic properties of analogies during training, while preserving performance and improving interpretability.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126485022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"E-VAN : Enhanced Variational AutoEncoder Network for Mitigating Gender Bias in Static Word Embeddings","authors":"Swati Tyagi, Jiaheng Xie, Rick Andrews","doi":"10.1145/3582768.3582804","DOIUrl":"https://doi.org/10.1145/3582768.3582804","url":null,"abstract":"Recent research has shown that pre-trained context-independent word embeddings display biases such as racial bias, gender bias, etc. Using a novel, tunable algorithm, this study attempts to mitigate the hidden gender bias in static embeddings. In order to train the model, an enhanced variational autoencoder (E-VAN) is used to learn the latent space of the embedding. Then the latent distributions are used while adaptively resampling and re-weighting the rare/under-represented data. While the word embeddings retain semantic information, E-VAN effectively mitigates unwanted biased gendered associations. Our method E-VAN outperforms previous state-of-the-art methods in both quantitative and human evaluation.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131376663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Random Division: An Effective Method for Chinese Text Classification","authors":"Gao Mo","doi":"10.1145/3582768.3582783","DOIUrl":"https://doi.org/10.1145/3582768.3582783","url":null,"abstract":"As a fundamental part of natural language processing, text classification is the backbone of tasks and applications such as machine translation and classification. Among the text classification tasks of all languages, the one for Chinese appears to be one of the most challenging due to the complex structures and expressions within the nature of Chinese. Researchers generally require a significant amount of data for model training and tuning, while most of the time, that desired amount of data cannot be fulfilled and satisfied. Given the circumstances, we propose an effective data enhancement technique to lower the demand for data. The central principle is as follows: Randomize the acquired word vectors and tokens from tokenizing the text based on a certain density level (i.e., every group contains five words), then use the randomized results as data input. During the above process, a considerable number of data variations would be generated, easing the demand for data. From the experiments, we tested our theory on multiple Chinese natural language processing datasets and received signs of improvements in model performance across all the datasets used, thus proving the validity of the previously mentioned method.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123013788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building term hierarchies using graph-based clustering","authors":"Mark Hloch, Markus Van Meegen, M. Kubek, H. Unger","doi":"10.1145/3582768.3582807","DOIUrl":"https://doi.org/10.1145/3582768.3582807","url":null,"abstract":"Classical tasks of a librarian, such as screening and categorizing new documents based on their content, are increasingly replaced by search engines or through the use of cataloging software. A first overview of a corpus topical orientation can be achieved by combining graph-based search engines and clustering methods. Existing classical clustering methods, however, often require an a priori specification of the desired number of clusters to be output and do not consider term relationships in graphs, which is deficient from a practical point of view. Therefore, fully unsupervised graph-based clustering approaches at the term level offer new possibilities that mitigate these shortcomings. Within this work, a set of novel graph-based clustering algorithms have been developed. The hierarchical clustering algorithm (HCA) forms term hierarchies by iteratively isolating nodes of a given co-occurrence graph based on the evaluation of the edge weight between the nodes. Based on the co-occurrence graph inherent relationships of terms, a new graph is built agglomerative forming individual term clusters of related terms. The feasibility of the outlined methods for text analysis is shown.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122205727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}