{"title":"Toward Privacy-preserving Text Embedding Similarity with Homomorphic Encryption","authors":"Donggyu Kim, Garam Lee, Sungwoo Oh","doi":"10.18653/v1/2022.finnlp-1.4","DOIUrl":"https://doi.org/10.18653/v1/2022.finnlp-1.4","url":null,"abstract":"Text embedding is an essential component to build efficient natural language applications based on text similarities such as search engines and chatbots. Certain industries like finance and healthcare demand strict privacy-preserving conditions that user’s data should not be exposed to any potential malicious users even including service providers. From a privacy standpoint, text embeddings seem impossible to be interpreted but there is still a privacy risk that they can be recovered to original texts through inversion attacks. To satisfy such privacy requirements, in this paper, we study a Homomorphic Encryption (HE) based text similarity inference. To validate our method, we perform extensive experiments on two vital text similarity tasks. Through text embedding inversion tests, we prove that the benchmark datasets are vulnerable to inversion attacks and another privacy preserving approach, dχ-privacy, a relaxed version of Local Differential Privacy method fails to prevent them. We show that our approach preserves the performance of models compared to that the baseline has degradation up to 10% of scores for the minimum security.","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130460450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinan Zou, Hai Cao, Yanxi Liu, Lingqiao Liu, Ehsan Abbasnejad, Javen Qinfeng Shi
{"title":"UOA at the FinNLP-2022 ERAI Task: Leveraging the Class Label Description for Financial Opinion Mining","authors":"Jinan Zou, Hai Cao, Yanxi Liu, Lingqiao Liu, Ehsan Abbasnejad, Javen Qinfeng Shi","doi":"10.18653/v1/2022.finnlp-1.15","DOIUrl":"https://doi.org/10.18653/v1/2022.finnlp-1.15","url":null,"abstract":"Evaluating the Rationales of Amateur Investors (ERAI) is a task about mining expert-like viewpoints from social media. This paper summarizes our solutions to the ERAI shared task, which is co-located with the FinNLP workshop at EMNLP 2022. There are 2 sub-tasks in ERAI. Sub-task 1 is a pair-wised comparison task, where we propose a BERT-based pre-trained model projecting opinion pairs in a common space for classification. Sub-task 2 is an unsupervised learning task ranking the opinions’ maximal potential profit (MPP) and maximal loss (ML), where our model leverages the regression method and multi-layer perceptron to rank the MPP and ML values. The proposed approaches achieve competitive accuracy of 54.02% on ML Accuracy and 51.72% on MPP Accuracy for pairwise tasks, also 12.35% and -9.39% regression unsupervised ranking task for MPP and ML.","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130911402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stock Price Volatility Prediction: A Case Study with AutoML","authors":"Hilal Pataci, Yunyao Li, Yannis Katsis, Yada Zhu, Lucian Popa","doi":"10.18653/v1/2022.finnlp-1.6","DOIUrl":"https://doi.org/10.18653/v1/2022.finnlp-1.6","url":null,"abstract":"Accurate prediction of the stock price volatility, the rate at which the price of a stock increases or decreases over a particular period, is an important problem in finance. Inaccurate prediction of stock price volatility might lead to investment risk and financial loss, while accurate prediction might generate significant returns for investors. Several studies investigated stock price volatility prediction in a regression task by using the transcripts of earning calls (quarterly conference calls held by public companies) with Natural Language Processing (NLP) techniques. Existing studies use the entire transcript and this degrades the performance due to noise caused by irrelevant information that might not have a significant impact on stock price volatility. In order to overcome these limitations, by considering stock price volatility prediction as a classification task, we explore several denoising approaches, ranging from general-purpose approaches to techniques specific to finance to remove the noise, and leverage AutoML systems that enable auto-exploration of a wide variety of models. Our preliminary findings indicate that domain-specific denoising approaches provide better results than general-purpose approaches, moreover AutoML systems provide promising results.","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133686012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Jetsons at the FinNLP-2022 ERAI Task: BERT-Chinese for mining high MPP posts","authors":"Alolika Gon, Sihan Zha, Sai Krishna Rallabandi, Parag Dakle, Preethi Raghavan","doi":"10.18653/v1/2022.finnlp-1.19","DOIUrl":"https://doi.org/10.18653/v1/2022.finnlp-1.19","url":null,"abstract":"In this paper, we discuss the various approaches by the Jetsons team for the “Pairwise Comparison” sub-task of the ERAI shared task to compare financial opinions for profitability and loss. Our BERT-Chinese model considers a pair of opinions and predicts the one with a higher maximum potential profit (MPP) with 62.07% accuracy. We analyze the performance of our approaches on both the MPP and maximal loss (ML) problems and deeply dive into why BERT-Chinese outperforms other models.","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130246000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LDPP at the FinNLP-2022 ERAI Task: Determinantal Point Processes and Variational Auto-encoders for Identifying High-Quality Opinions from a pool of Social Media Posts","authors":"Paul Trust, R. Minghim","doi":"10.18653/v1/2022.finnlp-1.18","DOIUrl":"https://doi.org/10.18653/v1/2022.finnlp-1.18","url":null,"abstract":"Social media and online forums have made it easier for people to share their views and opinions on various topics in society. In this paper, we focus on posts discussing investment related topics. When it comes to investment , people can now easily share their opinions about online traded items and also provide rationales to support their arguments on social media. However, there are millions of posts to read with potential of having some posts from amateur investors or completely unrelated posts. Identifying the most important posts that could lead to higher maximal potential profit (MPP) and lower maximal loss for investment is not a trivial task. In this paper, propose to use determinantal point processes and variational autoencoders to identify high quality posts from the given rationales. Experimental results suggest that our method mines quality posts compared to random selection and also latent variable modeling improves improves the quality of selected posts.","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134141720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LIPI at the FinNLP-2022 ERAI Task: Ensembling Sentence Transformers for Assessing Maximum Possible Profit and Loss from Online Financial Posts","authors":"Sohom Ghosh, S. Naskar","doi":"10.18653/v1/2022.finnlp-1.13","DOIUrl":"https://doi.org/10.18653/v1/2022.finnlp-1.13","url":null,"abstract":"Using insights from social media for making investment decisions has become mainstream. However, in the current era of information ex- plosion, it is essential to mine high-quality so- cial media posts. The FinNLP-2022 ERAI task deals with assessing Maximum Possible Profit (MPP) and Maximum Loss (ML) from social me- dia posts relating to finance. In this paper, we present our team LIPI’s approach. We ensem- bled a range of Sentence Transformers to quan- tify these posts. Unlike other teams with vary- ing performances across different metrics, our system performs consistently well. Our code is available here https://github.com/sohomghosh/LIPI_ERAI_ FinNLP_EMNLP- 2022/","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134589957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Overview of the FinNLP-2022 ERAI Task: Evaluating the Rationales of Amateur Investors","authors":"Chung-Chi Chen, Hen-Hsen Huang, Hiroya Takamura, Hsin-Hsi Chen","doi":"10.18653/v1/2022.finnlp-1.11","DOIUrl":"https://doi.org/10.18653/v1/2022.finnlp-1.11","url":null,"abstract":"This paper provides an overview of the shared task, Evaluating the Rationales of Amateur Investors (ERAI), in FinNLP-2022 at EMNLP-2022. This shared task aims to sort out investment opinions that would lead to higher profit from social platforms. We obtained 19 registered teams; 9 teams submitted their results for final evaluation, and 8 teams submitted papers to share their methods. The discussed directions are various: prompting, fine-tuning, translation system comparison, and tailor-made neural network architectures. We provide details of the task settings, data statistics, participants’ results, and fine-grained analysis.","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125352754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yulong Pei, A. Mbakwe, Akshat Gupta, Salwa Alamir, Hanxuan Lin, Xiaomo Liu, Sameena Shah
{"title":"TweetFinSent: A Dataset of Stock Sentiments on Twitter","authors":"Yulong Pei, A. Mbakwe, Akshat Gupta, Salwa Alamir, Hanxuan Lin, Xiaomo Liu, Sameena Shah","doi":"10.18653/v1/2022.finnlp-1.5","DOIUrl":"https://doi.org/10.18653/v1/2022.finnlp-1.5","url":null,"abstract":"Stock sentiment has strong correlations with the stock market but traditional sentiment analysis task classifies sentiment according to having feelings and emotions of good or bad. This definition of sentiment is not an accurate indicator of public opinion about specific stocks. To bridge this gap, we introduce a new task of stock sentiment analysis and present a new dataset for this task named TweetFinSent. In TweetFinSent, tweets are annotated based on if one gained or expected to gain positive or negative return from a stock. Experiments on TweetFinSent with several sentiment analysis models from lexicon-based to transformer-based have been conducted. Experimental results show that TweetFinSent dataset constitutes a challenging problem and there is ample room for improvement on the stock sentiment analysis task. TweetFinSent is available at https://github.com/jpmcair/tweetfinsent.","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"466 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115814471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"No Stock is an Island: Learning Internal and Relational Attributes of Stocks with Contrastive Learning","authors":"Shicheng Li, Wei Li, Zhiyuan Zhang, Ruihan Bao, Keiko Harimoto","doi":"10.18653/v1/2022.finnlp-1.20","DOIUrl":"https://doi.org/10.18653/v1/2022.finnlp-1.20","url":null,"abstract":"Previous work has demonstrated the viability of applying deep learning techniques in the financial area. Recently, the task of stock embedding learning has been drawing attention from the research community, which aims to represent the characteristics of stocks with distributed vectors that can be used in various financial analysis scenarios. Existing approaches for learning stock embeddings either require expert knowledge, or mainly focus on the textual part of information corresponding to individual temporal movements. In this paper, we propose to model stock properties as the combination of internal attributes and relational attributes, which takes into consideration both the time-invariant properties of individual stocks and their movement patterns in relation to the market. To learn the two types of attributes from financial news and transaction data, we design several training objectives based on contrastive learning to extract and separate the long-term and temporary information in the data that are able to counter the inherent randomness of the stock market. Experiments and further analyses on portfolio optimization reveal the effectiveness of our method in extracting comprehensive stock information from various data sources.","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128906665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hadjer Khaldi, F. Benamara, Camille Pradel, Nathalie Aussenac-Gilles
{"title":"How Can a Teacher Make Learning From Sparse Data Softer? Application to Business Relation Extraction","authors":"Hadjer Khaldi, F. Benamara, Camille Pradel, Nathalie Aussenac-Gilles","doi":"10.18653/v1/2022.finnlp-1.23","DOIUrl":"https://doi.org/10.18653/v1/2022.finnlp-1.23","url":null,"abstract":"Business Relation Extraction between market entities is a challenging information extraction task that suffers from data imbalance due to the over-representation of negative relations (also known as No-relation or Others) compared to positive relations that corresponds to the taxonomy of relations of interest. This paper proposes a novel solution to tackle this problem, relying on binary soft labels supervision generated by an approach based on knowledge distillation. When evaluated on a business relation extraction dataset, the results suggest that the proposed approach improves the overall performance, beating state-of-the art solutions for data imbalance. In particular, it improves the extraction of under-represented relations as well as the detection of false negatives.","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124403073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}