{"title":"SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce","authors":"Hyeon Gyu Kim","doi":"10.14257/IJDTA.2017.10.6.05","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.6.05","url":null,"abstract":"Substantial research has addressed that frequent I/O required for scalability and faulttolerance sacrifices efficiency of MapReduce. Regarding this, our previous work discussed a method to reduce I/O cost when processing OLAP queries with MapReduce. The method can be implemented simply by providing an SQL-to-MapReduce translator on top of the MapReduce framework and needs not modify the underlying framework. In this paper, we present techniques to translate SQL queries into corresponding MapReduce programs which support the method discussed in our previous work for I/O cost reduction.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"19 1","pages":"61-70"},"PeriodicalIF":0.0,"publicationDate":"2017-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84396797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Capital Markets Prediction: Multi-Faceted Sentiment Analysis using Supervised Machine Learning","authors":"Kushatha Kelebeng, H. Hlomani","doi":"10.14257/IJDTA.2017.10.6.07","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.6.07","url":null,"abstract":"Over the years the stock market has proved to be very difficult to predict due to its unpredictable activities. Data mining techniques such as clustering, decision trees, genetic algorithms and artificial neural networks have been used in order to predict the stock market. Although there has been a significant amount of research done in this area, there are still many issues that have not been explored yet. The impact of fundamental analysis in the prediction of the stock market has been ignored though it can play a vital role in the prediction of the stock market. In this research, the problem of how a social data sentiment correlates to stock price is studied. A stock price prediction model was built using social data sentiments to predict the stock market. Sentiments analysis principles were applied to machine learning techniques in order to find the correlation between the stock market and public sentiments. This study particularly intended to assess the predictability of prices on the Botswana Stock Exchange through the application of Facebook sentiments classification. Three classification models were created that depicted news polarity as happy, calm, alert and vital. Results show that Naïve Bayes and Support vector machine performed well in both types of testing as compared to Random Forest. Naïve Bayes gave good results in terms of error margins with an accuracy of 83.3% making it the best classifier for our data set. When plotting the time series plot of sentiment scores and comparing it to the actual stock price graph, a conclusion can be reached that sentiments and stock prices are related and thus stock prices can be predicted using sentiments.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"1 1","pages":"87-102"},"PeriodicalIF":0.0,"publicationDate":"2017-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75605592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Zheng, M. Kwan, Falin Fang, Junjun Yin, D. Gu, Yanli Fu
{"title":"A Topology-concerned Spatial Vector Data Model for Column-oriented Databases","authors":"Kun Zheng, M. Kwan, Falin Fang, Junjun Yin, D. Gu, Yanli Fu","doi":"10.14257/IJDTA.2017.10.5.04","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.5.04","url":null,"abstract":"In today’s “Big Data” era, the volume of spatial data grows rapidly. Addressing the challenges in efficient spatial Big Data storage and management becomes urgent. However, conventional row-based spatial databases have many limitations, such a slow data I/O efficiency, low data retrieval performance, poor scalability, and high maintenance costs. These conventional spatial databases are no longer suitable for today’s spatial Big Data. On the other hand, column-oriented databases have several superior features, such as high reliability, scalability and fault tolerance. More importantly, they have better I/O efficiency for query processing. This paper presents a topology-concerned spatial vector data model for column-oriented databases and designed the physical storage model, which is a unified model for storing and managing information of geometry, attribute and topology of spatial objects. For the storage characteristics of column-oriented databases, the model designed a new Rowkey encoding schema with the Z-order filling curve approach. This encoding schema of Rowkey considering spatial proximity optimizes the organizational structure of spatial data models. It means nearby spatial objects are also closer to each other in the physical storage, which can further improve the efficiency of spatial data storage and enable spatial query capability in column-oriented databases. Three experiments were conducted including data storing, range query and K-NN query to analyze the efficiency and spatial query capability of the data model. The results of the experiments show that the data model has good scalability and efficiency on the vector data storage and spatial query. It is suitable for large-scale spatial vector data storage and management in column-oriented databases.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"66 1","pages":"33-46"},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75993915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Mobile Database Model of Ad Hoc Network Based on Multi-parameter Weighted Clustering","authors":"Tao Zhan, Lei Wang","doi":"10.14257/IJDTA.2017.10.5.02","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.5.02","url":null,"abstract":"","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"8 1","pages":"11-22"},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86483987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparative Study of Different Dimensionality Reduction Methods with Naïve Bayes Classifier for Mapping Customer Requirements to Product Configurations","authors":"Yao Jiao, Yu Yang","doi":"10.14257/ijdta.2017.10.5.05","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.5.05","url":null,"abstract":"Mapping customer requirements to product configurations are difficult due to the uncertainty and ambiguity of customers’ expression. The Naïve Bayes Classifier (NBC) is suitable to quantify the expression of customers, and to map their requirements to configurations with good performance. However, the prerequisite of manually independent of product attributes for NBC require preprocess. Dimensionality reduction methods are effective for simplifying the data complexity while separating the correlations between data Against the background, this paper conducts a comparative study of 7 dimensionality reduction methods as preprocess procedure for integrating with NBC to map customer requirements to product configurations. Two realistic design cases are illustrated for the comparison, and the outcomes are measured by the accuracy and F-measure. The results of this study imply several findings that the loss of information has great impact on all methods, and linear methods are more sensitive to the loss of information, and several nonlinear methods are more capable in handling the loss of information than other methods, and local linear methods are suggested compared with global nonlinear methods.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"50 1","pages":"47-58"},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75009311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic QoS evaluation for Web Services Using Data Envelopment Analysis on Real-time Status","authors":"Luda Wang, Peng Zhang","doi":"10.14257/ijdta.2017.10.5.06","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.5.06","url":null,"abstract":"Service run-status monitoring can provide treal-time status for service QoS evaluation as service properties. In this work, dynamic QoS evaluation for Web services are based on DEA. Proposed methods could be used to analyze real-time status of Web services. DEA-based service performance evaluation is implemented by a multi-objective model, and DEA-based service QoS evaluation is implemented by a multi-objective model with critical real-time status performance. Both models are effective depending on particular argument and validation. Dynamic QoS evaluation for Web services are based on DEA could provide performance and QoS information to service composition.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"36 1","pages":"59-68"},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90483701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Internet of Things in Cloud Environment: Services and Challenges","authors":"Karuna Lochab, D. Yadav, Mayank Singh, A. Sharmab","doi":"10.14257/IJDTA.2017.10.5.03","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.5.03","url":null,"abstract":"In this paper evolution of IoT along with its intimacy with cloud is described. Application of IoT in various domains and the services are major research areas and can be enhanced further for more effectiveness and efficiency. Limitations associated with this technique are also addressed and appropriate solutions suggested. The future lies in the ‘Internet of Everything’. We have proposed a model which can be used in a wide collection of domains such as by Indian army, military, air-force, navy etc. The main focus of the proposed work is to evolve a combined approach (IoT + Cloud) into a single technology IoTC which will have the benefits of both the technologies and may overcome the shortcomings. It is beneficial in terms of space and time complexity, security, precision and accuracy of result. IoTC can further be applied to services of critically important services.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"53 1","pages":"23-32"},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80618499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Public Cloud Storage for the Seismic Big Data Based on Amazon EC2 Cluster and Hadoop","authors":"Jie Xiong, Song Zhang","doi":"10.14257/ijdta.2017.10.5.01","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.5.01","url":null,"abstract":"The seismic data expanded rapidly in recent years, whose size could be up to hundreds TBs, as modern seismic aquisition technologies were employed. How to store and access the seismic big data efficiently is an emergency problem for the oil industry and scientific research. A public cloud storage scheme for the seismic big data is proposed based on the Amazon EC2 and Hadoop. The IO performance evaluation results show that the proposed public cloud storage scheme has advantages of high IO performance and good scalability. It is suitable for the seismic big data storage and access.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"57 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2017-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79812769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Grid-based k-Nearest Neighbor Queries over Moving Object Trajectories with MapReduce","authors":"Ying Xia, Ruidi Wang, Xu Zhang, Hae-Young Bae","doi":"10.14257/IJDTA.2017.10.4.01","DOIUrl":"https://doi.org/10.14257/IJDTA.2017.10.4.01","url":null,"abstract":"k-Nearest Neighbor Trajectory (k-NNT) Query is a basic and important spatial query operation widely used in many fields, such as intelligent transportation and urban planning. However, with the rapid increase of trajectory data volume, traditional k-NNT query algorithms for centralized environment are not effective and scalable enough, because the computational complexity increases dramatically when the spatial continuity of trajectories is considered. To address this problem, we propose a distributed grid index for trajectory data which partitions the trajectory into grids under MapReduce framework. Furthermore, a parallel query approach MR-GB-KNNT is proposed based on the proposed grid index to improve the efficiency and scalability of the k-NNT query. The experiment demonstrates that MR-GB-KNNT could perform well in cloud computing environment and improve the querying performance of the k-NNT.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"194 1","pages":"1-12"},"PeriodicalIF":0.0,"publicationDate":"2017-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78081889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Current Situation and Application of Graph Data Mining Technology","authors":"Meng Zhang, Pingping Wei, Suzhi Zhang, Jiaxing Xu","doi":"10.14257/ijdta.2017.10.3.01","DOIUrl":"https://doi.org/10.14257/ijdta.2017.10.3.01","url":null,"abstract":"As an important data structure, graph can be used to describe the complex relationship among stuffs. With the setting up of social network, web network and other network in figure data, data mining technology has gradually become a hot research. Traditional data mining technology has been applied to the field of graph data mining constantly. Consequently the development of the graph data mining technology has been accelerated. This paper demonstrates the definition of graph data, and the current graph data mining algorithms which include graph classification, graph clustering, query graph, graph matching, graph of frequent subgraph mining, and graphic database development status. At last, what challenges graph mining technology confronts is illustrated in this paper.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"32 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2017-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91270408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}