使用概念和语义理解关系

Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets Pub Date : 2017-05-14 DOI:10.1145/3077240.3077250

Jouyon Park, Hyunsouk Cho, Seung-won Hwang

{"title":"使用概念和语义理解关系","authors":"Jouyon Park, Hyunsouk Cho, Seung-won Hwang","doi":"10.1145/3077240.3077250","DOIUrl":null,"url":null,"abstract":"The Financial Entity Identification and Information Integration (FEIII) task aims at the question of understanding relationships among financial entities and their roles using three sentences extracted from each financial contract containing the target word. FEIII task has two challenges - 1) data sparseness: small training sets (9% of test data) and 2) context sparseness: limited context (three sentences). Existing statistical approaches, such as Bayes and TF-IDF, cannot evaluate the imporatance of words unobservged in training data, which is vulnerable to the above challenges. We overcome each challenge by considering 1) the concepts of words from knowledge bases (Probase) in addition to the words themselves (conceptual feature) and 2) word semantics from distributed representations such as word2vec (semantic feature). We empirically evaluate the proposed classification model on the four-class classification (highly relevant, relevant, neutral, and irrelevant), and show that the proposed model increases 18% of F1-score compared to the statistical baselines.","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Understanding Relations using Concepts and Semantics\",\"authors\":\"Jouyon Park, Hyunsouk Cho, Seung-won Hwang\",\"doi\":\"10.1145/3077240.3077250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Financial Entity Identification and Information Integration (FEIII) task aims at the question of understanding relationships among financial entities and their roles using three sentences extracted from each financial contract containing the target word. FEIII task has two challenges - 1) data sparseness: small training sets (9% of test data) and 2) context sparseness: limited context (three sentences). Existing statistical approaches, such as Bayes and TF-IDF, cannot evaluate the imporatance of words unobservged in training data, which is vulnerable to the above challenges. We overcome each challenge by considering 1) the concepts of words from knowledge bases (Probase) in addition to the words themselves (conceptual feature) and 2) word semantics from distributed representations such as word2vec (semantic feature). We empirically evaluate the proposed classification model on the four-class classification (highly relevant, relevant, neutral, and irrelevant), and show that the proposed model increases 18% of F1-score compared to the statistical baselines.\",\"PeriodicalId\":326424,\"journal\":{\"name\":\"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets\",\"volume\":\"128 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3077240.3077250\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3077240.3077250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

金融实体识别和信息集成(FEIII)任务旨在使用从每个包含目标词的金融合同中提取的三个句子来理解金融实体及其角色之间的关系。FEIII任务有两个挑战- 1)数据稀疏性:小的训练集(9%的测试数据)和2)上下文稀疏性:有限的上下文(三个句子)。现有的统计方法，如贝叶斯和TF-IDF，无法评估训练数据中未观察到的词的重要性，容易受到上述挑战。我们通过考虑1)除了单词本身(概念特征)之外，还考虑了来自知识库的单词概念(Probase)和来自分布式表示(如word2vec)的单词语义(语义特征)来克服每个挑战。我们在四类分类(高度相关、相关、中性和不相关)上对所提出的分类模型进行了实证评估，结果表明，与统计基线相比，所提出的模型的f1得分提高了18%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Understanding Relations using Concepts and Semantics

The Financial Entity Identification and Information Integration (FEIII) task aims at the question of understanding relationships among financial entities and their roles using three sentences extracted from each financial contract containing the target word. FEIII task has two challenges - 1) data sparseness: small training sets (9% of test data) and 2) context sparseness: limited context (three sentences). Existing statistical approaches, such as Bayes and TF-IDF, cannot evaluate the imporatance of words unobservged in training data, which is vulnerable to the above challenges. We overcome each challenge by considering 1) the concepts of words from knowledge bases (Probase) in addition to the words themselves (conceptual feature) and 2) word semantics from distributed representations such as word2vec (semantic feature). We empirically evaluate the proposed classification model on the four-class classification (highly relevant, relevant, neutral, and irrelevant), and show that the proposed model increases 18% of F1-score compared to the statistical baselines.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets

自引率

0.00%

发文量