Hadjer Khaldi, F. Benamara, Camille Pradel, Nathalie Aussenac-Gilles
{"title":"How Can a Teacher Make Learning From Sparse Data Softer? Application to Business Relation Extraction","authors":"Hadjer Khaldi, F. Benamara, Camille Pradel, Nathalie Aussenac-Gilles","doi":"10.18653/v1/2022.finnlp-1.23","DOIUrl":null,"url":null,"abstract":"Business Relation Extraction between market entities is a challenging information extraction task that suffers from data imbalance due to the over-representation of negative relations (also known as No-relation or Others) compared to positive relations that corresponds to the taxonomy of relations of interest. This paper proposes a novel solution to tackle this problem, relying on binary soft labels supervision generated by an approach based on knowledge distillation. When evaluated on a business relation extraction dataset, the results suggest that the proposed approach improves the overall performance, beating state-of-the art solutions for data imbalance. In particular, it improves the extraction of under-represented relations as well as the detection of false negatives.","PeriodicalId":331851,"journal":{"name":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.finnlp-1.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Business Relation Extraction between market entities is a challenging information extraction task that suffers from data imbalance due to the over-representation of negative relations (also known as No-relation or Others) compared to positive relations that corresponds to the taxonomy of relations of interest. This paper proposes a novel solution to tackle this problem, relying on binary soft labels supervision generated by an approach based on knowledge distillation. When evaluated on a business relation extraction dataset, the results suggest that the proposed approach improves the overall performance, beating state-of-the art solutions for data imbalance. In particular, it improves the extraction of under-represented relations as well as the detection of false negatives.