Data Classification by Reducing Bias of Domain-Oriented Knowledge Based on Data Jackets

Masahiro Senda, Daiji Iwasa, Teruaki Hayashi, Y. Ohsawa
{"title":"Data Classification by Reducing Bias of Domain-Oriented Knowledge Based on Data Jackets","authors":"Masahiro Senda, Daiji Iwasa, Teruaki Hayashi, Y. Ohsawa","doi":"10.1109/SPIN.2019.8711715","DOIUrl":null,"url":null,"abstract":"In recent years, because of the worldwide trend of big data and AI, cross-disciplinary data exchange and collaboration is one of the social demands. However, data users do not always have sufficient knowledge about data, which prevents from exchanging and utilizing data. The meaning of words depends on the contexts even if the same words are used because of the different background knowledge. It is necessary to bridge the gap between the expertise of the data owners and the requests of data users. To avoid this contextual gap, we propose the classification system to support data users to discover the related categories of data which is learned by the semantic knowledge. We use Data Jackets as the summary of data, and the knowledge base of Wikipedia and word2vec in order to reduce the influence of domain-oriented knowledge. As a result of the experiment, we found that our proposed method got a higher accuracy rate of the classification tasks and the classification was similar to human recognition.","PeriodicalId":344030,"journal":{"name":"2019 6th International Conference on Signal Processing and Integrated Networks (SPIN)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 6th International Conference on Signal Processing and Integrated Networks (SPIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPIN.2019.8711715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In recent years, because of the worldwide trend of big data and AI, cross-disciplinary data exchange and collaboration is one of the social demands. However, data users do not always have sufficient knowledge about data, which prevents from exchanging and utilizing data. The meaning of words depends on the contexts even if the same words are used because of the different background knowledge. It is necessary to bridge the gap between the expertise of the data owners and the requests of data users. To avoid this contextual gap, we propose the classification system to support data users to discover the related categories of data which is learned by the semantic knowledge. We use Data Jackets as the summary of data, and the knowledge base of Wikipedia and word2vec in order to reduce the influence of domain-oriented knowledge. As a result of the experiment, we found that our proposed method got a higher accuracy rate of the classification tasks and the classification was similar to human recognition.
基于数据夹克衫的面向领域知识减少偏差的数据分类
近年来,由于大数据和人工智能的全球趋势,跨学科的数据交换和协作是社会需求之一。然而,数据使用者并不总是对数据有足够的了解,这阻碍了数据的交换和利用。由于背景知识的不同,即使使用相同的单词,其意义也取决于上下文。有必要弥合数据所有者的专业知识与数据用户的要求之间的差距。为了避免这种上下文差距,我们提出了一种支持数据用户通过语义知识发现数据相关类别的分类系统。为了减少面向领域知识的影响,我们使用了Data Jackets作为数据的汇总,并使用了Wikipedia和word2vec作为知识库。实验结果表明,本文提出的方法对分类任务的准确率较高,分类结果与人类识别结果相似。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信