{"title":"Unstructured data extraction in distributed NoSQL","authors":"Richard K. Lomotey, R. Deters","doi":"10.1109/DEST.2013.6611347","DOIUrl":null,"url":null,"abstract":"While “Big data” has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since todays data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we introduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.","PeriodicalId":145109,"journal":{"name":"2013 7th IEEE International Conference on Digital Ecosystems and Technologies (DEST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 7th IEEE International Conference on Digital Ecosystems and Technologies (DEST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEST.2013.6611347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
While “Big data” has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since todays data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we introduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.