Parag Dakle, Shrikumar Patil, Sai Krishna Rallabandi, Chaitra V. Hegde, Preethi Raghavan
{"title":"基于转换器的分类充实和句子分类模型","authors":"Parag Dakle, Shrikumar Patil, Sai Krishna Rallabandi, Chaitra V. Hegde, Preethi Raghavan","doi":"10.18653/v1/2022.finnlp-1.34","DOIUrl":null,"url":null,"abstract":"In this paper, we present a system that addresses the taxonomy enrichment problem for Environment, Social and Governance issues in the financial domain, as well as classifying sentences as sustainable or unsustainable, for FinSim4-ESG, a shared task for the FinNLP workshop at IJCAI-2022. We first created a derived dataset for taxonomy enrichment by using a sentence-BERT-based paraphrase detector (Reimers and Gurevych, 2019) (on the train set) to create positive and negative term-concept pairs. We then model the problem by fine-tuning the sentence-BERT-based paraphrase detector on this derived dataset, and use it as the encoder, and use a Logistic Regression classifier as the decoder, resulting in test Accuracy: 0.6 and Avg. Rank: 1.97. In case of the sentence classification task, the best-performing classifier (Accuracy: 0.92) consists of a pre-trained RoBERTa model (Liu et al., 2019a) as the encoder and a Feed Forward Neural Network classifier as the decoder.","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using Transformer-based Models for Taxonomy Enrichment and Sentence Classification\",\"authors\":\"Parag Dakle, Shrikumar Patil, Sai Krishna Rallabandi, Chaitra V. Hegde, Preethi Raghavan\",\"doi\":\"10.18653/v1/2022.finnlp-1.34\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a system that addresses the taxonomy enrichment problem for Environment, Social and Governance issues in the financial domain, as well as classifying sentences as sustainable or unsustainable, for FinSim4-ESG, a shared task for the FinNLP workshop at IJCAI-2022. We first created a derived dataset for taxonomy enrichment by using a sentence-BERT-based paraphrase detector (Reimers and Gurevych, 2019) (on the train set) to create positive and negative term-concept pairs. We then model the problem by fine-tuning the sentence-BERT-based paraphrase detector on this derived dataset, and use it as the encoder, and use a Logistic Regression classifier as the decoder, resulting in test Accuracy: 0.6 and Avg. Rank: 1.97. In case of the sentence classification task, the best-performing classifier (Accuracy: 0.92) consists of a pre-trained RoBERTa model (Liu et al., 2019a) as the encoder and a Feed Forward Neural Network classifier as the decoder.\",\"PeriodicalId\":331851,\"journal\":{\"name\":\"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.finnlp-1.34\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.finnlp-1.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
在本文中,我们提出了一个系统,用于FinSim4-ESG解决金融领域环境、社会和治理问题的分类丰富问题,以及将句子分类为可持续或不可持续,这是IJCAI-2022 FinNLP研讨会的共同任务。我们首先使用基于句子bert的释义检测器(Reimers和Gurevych, 2019)(在训练集上)创建了一个派生数据集,用于分类丰富,以创建积极和消极的术语概念对。然后,我们通过在该衍生数据集上微调基于句子bert的释义检测器来建模问题,并将其用作编码器,并使用逻辑回归分类器作为解码器,从而得到测试精度:0.6和平均秩:1.97。在句子分类任务中,表现最好的分类器(准确率:0.92)由预训练的RoBERTa模型(Liu et al., 2019a)作为编码器和前馈神经网络分类器作为解码器组成。
Using Transformer-based Models for Taxonomy Enrichment and Sentence Classification
In this paper, we present a system that addresses the taxonomy enrichment problem for Environment, Social and Governance issues in the financial domain, as well as classifying sentences as sustainable or unsustainable, for FinSim4-ESG, a shared task for the FinNLP workshop at IJCAI-2022. We first created a derived dataset for taxonomy enrichment by using a sentence-BERT-based paraphrase detector (Reimers and Gurevych, 2019) (on the train set) to create positive and negative term-concept pairs. We then model the problem by fine-tuning the sentence-BERT-based paraphrase detector on this derived dataset, and use it as the encoder, and use a Logistic Regression classifier as the decoder, resulting in test Accuracy: 0.6 and Avg. Rank: 1.97. In case of the sentence classification task, the best-performing classifier (Accuracy: 0.92) consists of a pre-trained RoBERTa model (Liu et al., 2019a) as the encoder and a Feed Forward Neural Network classifier as the decoder.