{"title":"Towards the automation of XML data warehouse logical design","authors":"Zoubir Ouaret, Omar Boussaïd, R. Chalal","doi":"10.1109/ICDIM.2014.6991422","DOIUrl":"https://doi.org/10.1109/ICDIM.2014.6991422","url":null,"abstract":"Over the past few years XML has rapidly become a widely accepted data format for information interchange and representation of semi-structured data. As a result, huge amounts of interesting data are generated and distributed on the Web, prompting the need of new and automated approaches to store, organize and analyze such data. Data warehouses systems provide an efficient tool and integrated repository with a high level of consolidation and a multi-dimensional view of the data. Several studies address the issue for designing a conceptual and logical schema of XML Data Warehouses. Nevertheless, these approaches focus only on identifying multidimensional concepts in a semi-automatic fashion. In this paper we propose an automatic approach for designing the logical schema for a data mart starting from the XML schema describing XML sources using UML and QVT transformation language. Our focus is not only on automating an XML Data Warehouse design, but also on providing a simplification process and a set of rules that applies successive transformations to create the star schema which is the predominant logical multidimensional scheme. Finally, we discuss implementation issues and we present a graphical tool to help designers and engineers design without technical expertise in multi-dimensional data modeling.","PeriodicalId":407225,"journal":{"name":"Ninth International Conference on Digital Information Management (ICDIM 2014)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123346263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Determining parameters for efficient retrieval in index structures for hybrid data spaces","authors":"Carsten Kropf","doi":"10.1109/ICDIM.2014.6991402","DOIUrl":"https://doi.org/10.1109/ICDIM.2014.6991402","url":null,"abstract":"Different kinds of access methods supporting boolean retrieval in hybrid data spaces exist. We inspect a class of these index structures using a categorization of low and high frequently occurring keywords. This access method uses a basic R*-Tree augmented with bitlists for the representation of a set of terms. Two limits are given for these access methods in realistic environments: the length of the bitlist B Length and the limit separating the set of low and high frequently occurring terms H Limit. This paper presents a theoretical analysis of the setup of H Limit as well as an empirical analysis of the bitlist length for two different corpora in a typical database environment. The final target of this paper is the determination of the free parameters to provide efficient retrieval of data in realistic application domains.","PeriodicalId":407225,"journal":{"name":"Ninth International Conference on Digital Information Management (ICDIM 2014)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125312078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Universal database access layer to facilitate query","authors":"M. Amin, R. Rahman","doi":"10.1109/ICDIM.2014.6991401","DOIUrl":"https://doi.org/10.1109/ICDIM.2014.6991401","url":null,"abstract":"Universal Database Access Layer (UDAL) is a JDBC based database access layer and every SQL supported database can be accessed through UDAL. Interaction with UDAL is simple as no one needs to know database specific SQLs to support multiple databases for products. This is due to UDAL which takes XML (Extended Markup Language) and JSON (JavaScript Object Notation) as input and returns corresponding response. XML and JSON both are standard and popular message format and simple which can support four most used DML operations at the moment which are Insert, Update, Select and Delete. All these types of DML operations are performed with huge data to measure the performance which can be increased by deploying the UDAL on the Cloud. If UDAL can be deployed as Web Service, everyone will be able to access it to eliminate the tight integration with product.","PeriodicalId":407225,"journal":{"name":"Ninth International Conference on Digital Information Management (ICDIM 2014)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126248617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient algorithm for density-balanced partitioning in distributed pagerank","authors":"S. Sangamuang, P. Boonma, J. Natwichai","doi":"10.1109/ICDIM.2014.6991418","DOIUrl":"https://doi.org/10.1109/ICDIM.2014.6991418","url":null,"abstract":"Google's PageRank is the most notable approach for web search ranking. In general, web pages are represented by web-link graph; a web-page is represented by a node, and a link between two pages is represented by an edge. In particular, it is not efficient to perform PageRank of a large web-link graph in a single computer. Distributed systems, such as P2P, are viable choices to address such limitation. In P2P-based PageRank, each computational peer contains a partial web-link graph, i.e., a sub-graph of the global web-link graph, and its PageRank is computed locally. The convergence time of a PageRank calculation is affected by the web-link graph density, i.e., the ratio of the number of edges to the number of nodes, such that if a web-link graph has high density, it will take longer time to converge. As the execution time to compute the P2P-based web ranking is influenced by the execution time of the slowest peer to compute the local ranking, the density-balanced local web-link graph partitioning can be highly desirable. This paper addresses a density-balanced partitioning problem and proposes an efficient algorithm for the problem. The experiment results show that the proposed algorithm can effectively partition graph into density-balanced sub with an acceptable cost.","PeriodicalId":407225,"journal":{"name":"Ninth International Conference on Digital Information Management (ICDIM 2014)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126372615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Association rules: Normalizing the lift","authors":"Desmond Lobo","doi":"10.1109/ICDIM.2014.6991393","DOIUrl":"https://doi.org/10.1109/ICDIM.2014.6991393","url":null,"abstract":"Association rules is a popular data mining technique for discovering relations between variables in large amounts of data. Support, confidence and lift are three of the most common measures for evaluating the usefulness of these rules. A concern with the lift measure is that it can only compare items within a transaction set. The main contribution of this paper is to develop a formula for normalizing the lift, as this will allow valid comparisons between distinct transaction sets. Traffic accident data was used to validate the revised formula for lift and the result of this analysis was very strong.","PeriodicalId":407225,"journal":{"name":"Ninth International Conference on Digital Information Management (ICDIM 2014)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121106077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel feature selection by clustering coefficients of variations","authors":"S. Fong, Justin Liang, R. Wong, M. Ghanavati","doi":"10.1109/ICDIM.2014.6991429","DOIUrl":"https://doi.org/10.1109/ICDIM.2014.6991429","url":null,"abstract":"One of the challenges in inferring a classification model with good prediction accuracy is to select the relevant features that contribute to maximum predictive power. Many feature selection techniques have been proposed and studied in the past, but none so far claimed to be the best. In this paper, a novel and efficient feature selection method called Clustering Coefficients of Variation (CCV) is proposed. CCV is based on a very simple principle of variance-basis which finds an optimal balance between generalization and overfitting. Through a computer simulation experiment, 44 datasets with substantially large number of features are tested by CCV in comparison to four popular feature selection techniques. Results show that CCV outperformed them in all aspects of averaged performances and speed. By the simplicity of design it is anticipated that CCV will be a useful alternative of pre-processing method for classification especially with those datasets that are characterized by many features.","PeriodicalId":407225,"journal":{"name":"Ninth International Conference on Digital Information Management (ICDIM 2014)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133202363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing national e-Government interoperability frameworks: A case of Thailand","authors":"Sasithorn Suchaiya, Somnuk Keretho","doi":"10.1109/ICDIM.2014.6991416","DOIUrl":"https://doi.org/10.1109/ICDIM.2014.6991416","url":null,"abstract":"Many countries have actively engaged in the development of interoperability for electronic data and transaction exchange among government agencies to provide better joint-up public services to their citizens. National-level policy frameworks, often called Electronic Government Interoperability Frameworks (e-GIF), were established in many of those countries. However, most of these e-GIF frameworks haven't adopted the holistic concept of Enterprise Architectures (EA), except for example, Thailand, U.S.A. and Canada. This paper proposes a comparative analysis methodology with an aim to propose further improvement for the EA-based interoperability frameworks to better drive the effective development of smart and connected e-government services. In this paper, Thailand e-Government Interoperability Framework is methodically compared and analyzed with the U.S. Federal Enterprise Architecture Framework as a case study.","PeriodicalId":407225,"journal":{"name":"Ninth International Conference on Digital Information Management (ICDIM 2014)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123888146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Project resource allocation optimization using search based software engineering — A framework","authors":"Nazia Bibi, A. Ahsan, Zeeshan Anwar","doi":"10.1109/ICDIM.2014.6991431","DOIUrl":"https://doi.org/10.1109/ICDIM.2014.6991431","url":null,"abstract":"Human Resource Management is an important area of project management. The concept of Human Resource Allocation is not new and it can also be used for resource allocation to software projects. Software projects are more critical as compared to projects of other disciplines because success of software projects dependents on human resources. In software projects, Project Manager (PM) allocates resources and level resources using Resource Leveling techniques which are already implemented in various project management software. But, Resource Leveling is a resource smoothing technique not an optimization technique neither it ensures optimized resource allocation. Furthermore, Project duration and cost may increase after resource leveling. Therefore, resource leveling is not always a reliable method for resource allocation optimization. Exact solution of resource optimization problem cannot be determined because resource optimization is a NP-Hard problem. To solve resource optimization problems Search Based Software Engineering (SBSE) is used in various studies. However, in existing SBSE algorithms implementation for resource allocation optimization many objectives are not considered. Resource allocation optimization is a multi-objective optimization problem and many important factors like activity criticality, resource skills, activity precedence, skill required to perform activities must be addressed. Our research fills this gap and uses multiple objectives for resource allocation optimization which are 1: increase resource utilization, 2: decrease project duration and 3: decrease project cost. A framework and mathematical model for the implementation of resource allocation optimization is also proposed.","PeriodicalId":407225,"journal":{"name":"Ninth International Conference on Digital Information Management (ICDIM 2014)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127747727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Azri, U. Ujang, A. Abdul-Rahman, F. Anton, D. Mioc
{"title":"Spatial access method for urban geospatial database management: An efficient approach of 3D vector data clustering technique","authors":"S. Azri, U. Ujang, A. Abdul-Rahman, F. Anton, D. Mioc","doi":"10.1109/ICDIM.2014.6991400","DOIUrl":"https://doi.org/10.1109/ICDIM.2014.6991400","url":null,"abstract":"In the last few years, 3D urban data and its information are rapidly increased due to the growth of urban area and urbanization phenomenon. These datasets are then maintain and manage in 3D spatial database system. However, performance deterioration is likely to happen due to the massiveness of 3D datasets. As a solution, 3D spatial index structure is used as a booster to increase the performance of data retrieval. In commercial database, commonly and widely used index structure for 3D spatial database is 3D R-Tree. This is due to its simplicity and promising method in handling spatial data. However, 3D R-Tree produces serious overlapping among nodes. The overlapping factor is important for an efficient 3D R-Tree to avoid replicated data entry in a different node. Thus, an efficient and reliable method is required to reduce the overlapping nodes in 3D R-Tree nodes. In this paper, we proposed a 3D geospatial data clustering to be used in the construction of 3D R-Tree and respectively could reduce the overlapping among nodes. The proposed method is tested on 3D urban dataset for the application of urban infill development. By using several cases of data updating operations such as building infill, building demolition and building modification, the proposed method indicates that the percentage of overlapping coverage among nodes is reduced compared with other existing approaches.","PeriodicalId":407225,"journal":{"name":"Ninth International Conference on Digital Information Management (ICDIM 2014)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126917891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Artificial neural network-based time series analysis forecasting for the amount of solid waste in Bangkok","authors":"M. Sodanil, Paiboon Chatthong","doi":"10.1109/ICDIM.2014.6991427","DOIUrl":"https://doi.org/10.1109/ICDIM.2014.6991427","url":null,"abstract":"Solid waste is a municipal environmental problem which difficult to manage. Thus, a solid waste forecasting model is essential for the effective management and planning. This paper aims to develop a time series forecasting model for the amount of solid waste generated in Bangkok using artificial neural networks, and offers a suitable model for solid waste forecasting. The time series data were collected as monthly accounts of solid waste generated between October 2002 and July 2013. Then, the data were cleaned and converted in order to accurately analyze. The forecast model was developed using predictive analytic tool Rapidminer. Artificial neural network model was trained with backpropagation algorithm. The results showed that the network structure of 3-35-1 performs the greatest performance with prediction accuracy at 0.870 and MSE equaling 0.2333.","PeriodicalId":407225,"journal":{"name":"Ninth International Conference on Digital Information Management (ICDIM 2014)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123556103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}