Unstructured data extraction in distributed NoSQL

Richard K. Lomotey, R. Deters
{"title":"Unstructured data extraction in distributed NoSQL","authors":"Richard K. Lomotey, R. Deters","doi":"10.1109/DEST.2013.6611347","DOIUrl":null,"url":null,"abstract":"While “Big data” has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since todays data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we introduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.","PeriodicalId":145109,"journal":{"name":"2013 7th IEEE International Conference on Digital Ecosystems and Technologies (DEST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 7th IEEE International Conference on Digital Ecosystems and Technologies (DEST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEST.2013.6611347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

While “Big data” has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since todays data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we introduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.
分布式NoSQL中的非结构化数据提取
虽然“大数据”在方便获取海量数据方面带来了好消息,但我们也面临着挑战。现有的面向模式数据源的知识发现(KDD)过程已经不再适用,因为今天的数据是非结构化的。之前,我们部署了一个名为TouchR的工具,它依赖于隐马尔可夫模型(HMM)从非结构化数据源(特别是NoSQL数据库)中提取术语。本文在最初部署的版本的基础上进行了改进,我们引入了可重用的字典和关联规则,以提高提取术语的质量。此外,目前阶段的工具更适合基于最常搜索词的用户搜索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信