使用概念和语义理解关系

Jouyon Park, Hyunsouk Cho, Seung-won Hwang
{"title":"使用概念和语义理解关系","authors":"Jouyon Park, Hyunsouk Cho, Seung-won Hwang","doi":"10.1145/3077240.3077250","DOIUrl":null,"url":null,"abstract":"The Financial Entity Identification and Information Integration (FEIII) task aims at the question of understanding relationships among financial entities and their roles using three sentences extracted from each financial contract containing the target word. FEIII task has two challenges - 1) data sparseness: small training sets (9% of test data) and 2) context sparseness: limited context (three sentences). Existing statistical approaches, such as Bayes and TF-IDF, cannot evaluate the imporatance of words unobservged in training data, which is vulnerable to the above challenges. We overcome each challenge by considering 1) the concepts of words from knowledge bases (Probase) in addition to the words themselves (conceptual feature) and 2) word semantics from distributed representations such as word2vec (semantic feature). We empirically evaluate the proposed classification model on the four-class classification (highly relevant, relevant, neutral, and irrelevant), and show that the proposed model increases 18% of F1-score compared to the statistical baselines.","PeriodicalId":326424,"journal":{"name":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Understanding Relations using Concepts and Semantics\",\"authors\":\"Jouyon Park, Hyunsouk Cho, Seung-won Hwang\",\"doi\":\"10.1145/3077240.3077250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Financial Entity Identification and Information Integration (FEIII) task aims at the question of understanding relationships among financial entities and their roles using three sentences extracted from each financial contract containing the target word. FEIII task has two challenges - 1) data sparseness: small training sets (9% of test data) and 2) context sparseness: limited context (three sentences). Existing statistical approaches, such as Bayes and TF-IDF, cannot evaluate the imporatance of words unobservged in training data, which is vulnerable to the above challenges. We overcome each challenge by considering 1) the concepts of words from knowledge bases (Probase) in addition to the words themselves (conceptual feature) and 2) word semantics from distributed representations such as word2vec (semantic feature). We empirically evaluate the proposed classification model on the four-class classification (highly relevant, relevant, neutral, and irrelevant), and show that the proposed model increases 18% of F1-score compared to the statistical baselines.\",\"PeriodicalId\":326424,\"journal\":{\"name\":\"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets\",\"volume\":\"128 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3077240.3077250\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Workshop on Data Science for Macro--Modeling with Financial and Economic Datasets","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3077240.3077250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

金融实体识别和信息集成(FEIII)任务旨在使用从每个包含目标词的金融合同中提取的三个句子来理解金融实体及其角色之间的关系。FEIII任务有两个挑战- 1)数据稀疏性:小的训练集(9%的测试数据)和2)上下文稀疏性:有限的上下文(三个句子)。现有的统计方法,如贝叶斯和TF-IDF,无法评估训练数据中未观察到的词的重要性,容易受到上述挑战。我们通过考虑1)除了单词本身(概念特征)之外,还考虑了来自知识库的单词概念(Probase)和来自分布式表示(如word2vec)的单词语义(语义特征)来克服每个挑战。我们在四类分类(高度相关、相关、中性和不相关)上对所提出的分类模型进行了实证评估,结果表明,与统计基线相比,所提出的模型的f1得分提高了18%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Understanding Relations using Concepts and Semantics
The Financial Entity Identification and Information Integration (FEIII) task aims at the question of understanding relationships among financial entities and their roles using three sentences extracted from each financial contract containing the target word. FEIII task has two challenges - 1) data sparseness: small training sets (9% of test data) and 2) context sparseness: limited context (three sentences). Existing statistical approaches, such as Bayes and TF-IDF, cannot evaluate the imporatance of words unobservged in training data, which is vulnerable to the above challenges. We overcome each challenge by considering 1) the concepts of words from knowledge bases (Probase) in addition to the words themselves (conceptual feature) and 2) word semantics from distributed representations such as word2vec (semantic feature). We empirically evaluate the proposed classification model on the four-class classification (highly relevant, relevant, neutral, and irrelevant), and show that the proposed model increases 18% of F1-score compared to the statistical baselines.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信