Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval最新文献

筛选
英文 中文
NLP And IR Applications For Financial Reporting And Non-Financial Disclosure. Framework Implementation And Roadmap For Feasible Integration With The Accounting Process NLP和IR在财务报告和非财务披露中的应用。与会计流程可行整合的框架实施和路线图
A. Faccia, P. Petratos
{"title":"NLP And IR Applications For Financial Reporting And Non-Financial Disclosure. Framework Implementation And Roadmap For Feasible Integration With The Accounting Process","authors":"A. Faccia, P. Petratos","doi":"10.1145/3582768.3582796","DOIUrl":"https://doi.org/10.1145/3582768.3582796","url":null,"abstract":"Corporations produce financial and non-financial reports containing structured and unstructured data. In general, all organisations report information of some kind. Natural Language Processing (NLP) and Information Retrieval (IR) were fields developed from approximately the 1950s and have presented important applications, especially in the last three decades. Nevertheless, applications in accounting and finance have not developed accordingly, and a comprehensive framework is missing in the existing literature. This paper examines how NLP and IR can facilitate reporting and disclosure, both Financial and Non-Financial. The paper provides a brief literature review on NLP/IR applications in accounting and finance. It better informs and expands on the discussion of NLP/IR applications in academic research, professional organisations (i.e., IFRS), and industry. It explores some innovative applications of NLP/IR in unstructured data and its use in reporting and disclosure and FinTech applications. The main contribution is the definition of a complete framework that consistently analyses the possible NLP/IR applications in the accounting processes. We find that there can be many more applications of NLP/IR in accounting and finance and suggest future directions for research.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130385843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DIFFSTRACT: distinguishing the content of texts 摘要:区分文本的内容
Yanakorn Ruamsuk, A. Mingkhwan, H. Unger
{"title":"DIFFSTRACT: distinguishing the content of texts","authors":"Yanakorn Ruamsuk, A. Mingkhwan, H. Unger","doi":"10.1145/3582768.3582787","DOIUrl":"https://doi.org/10.1145/3582768.3582787","url":null,"abstract":"Nowadays, it is almost a standard issue to generate summaries of texts automatically. In contrast, it is still a problem to identify the differences in the statements of the two publications. For the most part, this still requires a human being to read and evaluate at least excerpts of the relevant passages. Finding a so-called text differentiation with appropriate tools is becoming an increasingly interesting and important task to effectively cope with the daily flood of information on the WWW. For years, co-occurrence graphs have been a proven means of deriving statements of various kinds from texts. So-called text- representing centroids (TRC's) has often been an effective tool for identifying, comparing and categorizing texts or sections. The present article examines how a different form of co-occurrence graphs can take place and be helpful. First, different co-occurrence graphs are built from a larger corpus and various individual texts or text groups. Subsequently, the calculated difference graphs can be used to create summaries that precisely characterize the differences between texts. Experimental results show that this new method works well.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132624157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Based Pattern Classification for NLU and Slot Filling: Approach and Analysis 基于图的NLU模式分类与槽填充:方法与分析
J. Eggert, Johane Takeuchi
{"title":"Graph Based Pattern Classification for NLU and Slot Filling: Approach and Analysis","authors":"J. Eggert, Johane Takeuchi","doi":"10.1145/3582768.3582770","DOIUrl":"https://doi.org/10.1145/3582768.3582770","url":null,"abstract":"In Natural Language Understanding, semantic parsing refers to the task of extracting meaningful words from text, with slot filling being a special case. At the same time, syntactic parsing deals with the identification of syntactic structure in the text. In this paper, we analyze to which extent syntactic structure can be used for semantic parsing. For this purpose, we represent the syntactic structure of sentences from an annotated database as graphs, incrementally store a large number of graph prototypes in a knowledge base, and approach the slot filling problem as a graph matching problem. We analyze how this approach scales with the number of prototypes and show that it provides a lightweight framework that can be efficiently used to support semantic parsing tasks.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127411264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Stylometric Dataset for Bengali Poems 孟加拉语诗歌的文体测量数据集
M. K. B. Shuhan, Rupasree Dey, Sourav Saha, Md Shafa Ul Anjum, T. S. Zaman
{"title":"A Stylometric Dataset for Bengali Poems","authors":"M. K. B. Shuhan, Rupasree Dey, Sourav Saha, Md Shafa Ul Anjum, T. S. Zaman","doi":"10.1145/3582768.3582788","DOIUrl":"https://doi.org/10.1145/3582768.3582788","url":null,"abstract":"Poetry is a form of literature that conveys feelings using different styles, aesthetics, and rhythms. The Bengali language has an enriched collection of poems. Every poet has an individual style of expressing their thoughts and emotions. However, stylometric research in this branch of the Bengali language is still in its early stage of development. In this paper, we have presented a stylometric dataset, which has 6,070 poems of 137 poets stored in the textual format. To the best of our knowledge, this is the first stylometric dataset for Bengali poems which will add an extra dimension to the expanding research arena of the Bengali language. To explore the usability of this dataset, we developed poem genre classifiers using deep learning that can classify these poems. Performance analysis of some deep learning classifiers has been presented in addition to classification. The classifiers include GRU and CNN. Among these two, GRU showed better performance by 91.48 in terms of the F1-score. The dataset will be publicly available at https://github.com/shuhanmirza/Bengali-Poem-Dataset after publishing this article.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115951729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A corpus of drafts of NLP papers from non-native English speakers 来自非英语母语者的NLP论文草稿的语料库
Haotong Wang, Liyan Wang, Lepage Yves
{"title":"A corpus of drafts of NLP papers from non-native English speakers","authors":"Haotong Wang, Liyan Wang, Lepage Yves","doi":"10.1145/3582768.3582797","DOIUrl":"https://doi.org/10.1145/3582768.3582797","url":null,"abstract":"We created an English parallel corpus of 3,005 sentence pairs, each containing a well-polished text from ACL Anthology Reference Corpus (ACL-ARC) [1] and corresponding restated drafts collected from 26 non-native writers. The purpose of this paper is to explore the writing features of the drafts from non-native English speakers, so as to benefit research in Academic Writing Aid Systems. We present a feature analysis of the corpus based on handcrafted features. To assess utility, we formulate a draft identification task to automatically recognize drafts from ground truth texts based on hybrid features. We show that the combination of deep semantic features with the optimal handcrafted features improves identification accuracy on the collected data, up to 84.57%.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128702519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Changing Topics for Changing Times: Thematic and Temporal-Based Analysis of the Philippine Senatorial and Midterm Elections 变化时代的变化话题:菲律宾参议院和中期选举的主题和时间分析
Lamar Clarence Cruz, Jessica Nicole Dela Cruz, Shane Francis Maglangit, Mico Magtira, Joseph Marvin Imperial, Ramon Rodriguez
{"title":"Changing Topics for Changing Times: Thematic and Temporal-Based Analysis of the Philippine Senatorial and Midterm Elections","authors":"Lamar Clarence Cruz, Jessica Nicole Dela Cruz, Shane Francis Maglangit, Mico Magtira, Joseph Marvin Imperial, Ramon Rodriguez","doi":"10.1145/3582768.3582781","DOIUrl":"https://doi.org/10.1145/3582768.3582781","url":null,"abstract":"Elections in the Philippines are one of the most prominent events in the country. It is a time when people choose their leader through the democratic election process. During the electoral campaign, candidates try several strategies to win the election. However, victors emerge are those familiar and have mass-based support. In this paper, we have investigated the perception of netizens in the Philippines for the last two election seasons, 2019 and 2022, based on Twitter data. Thematic analyses were conducted to determine the themes related to winning candidates and create a link on how they may affect their election success. The study found that the top candidates (Cynthia Villar in 2019 and Robinhood Padilla in 2022) mainly received criticism and opposition towards their campaigns from the themes. Moreover, findings suggest that candidates with more negative or positive publicity emerge as winners. Long-time politicians and organized groups endorsed most winners during the 2019 and 2022 elections.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115123599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Analogy based Approach for Solving Target Sense Verification 一种基于类比的目标感验证方法
Georgios Zervakis, Emmanuel Vincent, Miguel Couceiro, Marc Schoenauer, Esteban Marquer
{"title":"An Analogy based Approach for Solving Target Sense Verification","authors":"Georgios Zervakis, Emmanuel Vincent, Miguel Couceiro, Marc Schoenauer, Esteban Marquer","doi":"10.1145/3582768.3582794","DOIUrl":"https://doi.org/10.1145/3582768.3582794","url":null,"abstract":"Contextualized language models have emerged as a de facto standard in natural language processing due to the vast amount of knowledge they acquire during pretraining. Nonetheless, their ability to solve tasks that require reasoning over this knowledge is limited. Certain tasks can be improved by analogical reasoning over concepts, e.g., understanding the underlying relations in “Man is to Woman as King is to Queen”. In this work, we propose a way to formulate target sense verification as an analogy detection task, by transforming the input data into quadruples. We present AB4TSV (Analogy and BERT for TSV), a model that uses BERT to represent the objects in these quadruples combined with a convolutional neural network to decide whether they constitute valid analogies. We test our system on the WiC-TSV evaluation benchmark, and show that it can outperform existing approaches. Our empirical study shows the importance of the input encoding for BERT. This dependence gets alleviated by integrating the axiomatic properties of analogies during training, while preserving performance and improving interpretability.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126485022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
E-VAN : Enhanced Variational AutoEncoder Network for Mitigating Gender Bias in Static Word Embeddings E-VAN:用于减少静态词嵌入中性别偏见的增强变分自编码器网络
Swati Tyagi, Jiaheng Xie, Rick Andrews
{"title":"E-VAN : Enhanced Variational AutoEncoder Network for Mitigating Gender Bias in Static Word Embeddings","authors":"Swati Tyagi, Jiaheng Xie, Rick Andrews","doi":"10.1145/3582768.3582804","DOIUrl":"https://doi.org/10.1145/3582768.3582804","url":null,"abstract":"Recent research has shown that pre-trained context-independent word embeddings display biases such as racial bias, gender bias, etc. Using a novel, tunable algorithm, this study attempts to mitigate the hidden gender bias in static embeddings. In order to train the model, an enhanced variational autoencoder (E-VAN) is used to learn the latent space of the embedding. Then the latent distributions are used while adaptively resampling and re-weighting the rare/under-represented data. While the word embeddings retain semantic information, E-VAN effectively mitigates unwanted biased gendered associations. Our method E-VAN outperforms previous state-of-the-art methods in both quantitative and human evaluation.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131376663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random Division: An Effective Method for Chinese Text Classification 随机分类:一种有效的中文文本分类方法
Gao Mo
{"title":"Random Division: An Effective Method for Chinese Text Classification","authors":"Gao Mo","doi":"10.1145/3582768.3582783","DOIUrl":"https://doi.org/10.1145/3582768.3582783","url":null,"abstract":"As a fundamental part of natural language processing, text classification is the backbone of tasks and applications such as machine translation and classification. Among the text classification tasks of all languages, the one for Chinese appears to be one of the most challenging due to the complex structures and expressions within the nature of Chinese. Researchers generally require a significant amount of data for model training and tuning, while most of the time, that desired amount of data cannot be fulfilled and satisfied. Given the circumstances, we propose an effective data enhancement technique to lower the demand for data. The central principle is as follows: Randomize the acquired word vectors and tokens from tokenizing the text based on a certain density level (i.e., every group contains five words), then use the randomized results as data input. During the above process, a considerable number of data variations would be generated, easing the demand for data. From the experiments, we tested our theory on multiple Chinese natural language processing datasets and received signs of improvements in model performance across all the datasets used, thus proving the validity of the previously mentioned method.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123013788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building term hierarchies using graph-based clustering 使用基于图的聚类构建术语层次结构
Mark Hloch, Markus Van Meegen, M. Kubek, H. Unger
{"title":"Building term hierarchies using graph-based clustering","authors":"Mark Hloch, Markus Van Meegen, M. Kubek, H. Unger","doi":"10.1145/3582768.3582807","DOIUrl":"https://doi.org/10.1145/3582768.3582807","url":null,"abstract":"Classical tasks of a librarian, such as screening and categorizing new documents based on their content, are increasingly replaced by search engines or through the use of cataloging software. A first overview of a corpus topical orientation can be achieved by combining graph-based search engines and clustering methods. Existing classical clustering methods, however, often require an a priori specification of the desired number of clusters to be output and do not consider term relationships in graphs, which is deficient from a practical point of view. Therefore, fully unsupervised graph-based clustering approaches at the term level offer new possibilities that mitigate these shortcomings. Within this work, a set of novel graph-based clustering algorithms have been developed. The hierarchical clustering algorithm (HCA) forms term hierarchies by iteratively isolating nodes of a given co-occurrence graph based on the evaluation of the edge weight between the nodes. Based on the co-occurrence graph inherent relationships of terms, a new graph is built agglomerative forming individual term clusters of related terms. The feasibility of the outlined methods for text analysis is shown.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122205727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信