{"title":"Boundness of a Neural Network Weights Using the Notion of a Limit of a Sequence","authors":"Hazem Migdady","doi":"10.5121/ijdkp.2014.4301","DOIUrl":"https://doi.org/10.5121/ijdkp.2014.4301","url":null,"abstract":"feed forward neural network with backpropagation learning algorithm is considered as a black box learning classifier since there is no certain interpretation or anticipation of the behavior of a neural network weights. The weights of a neural network are considered as the learning tool of the classifier, and the learning task is performed by the repetition modification of those weights. This modification is performed using the delta rule which is mainly used in the gradient descent technique. In this article a proof is provided that helps to understand and explain the behavior of the weights in a feed forward neural network with backpropagation learning algorithm. Also, it illustrates why a feed forward neural network is not always guaranteed to converge in a global minimum. Moreover, the proof shows that the weights in the neural network are upper bounded (i.e. they do not approach infinity).","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125900373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of Rule Extraction Algorithms","authors":"T. Gopikrishna","doi":"10.5121/IJDKP.2014.4302","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4302","url":null,"abstract":"For the data mining domain, the lack of explanation facilities seems to be a serious drawback for techniques based on Artificial Neural Networks, or, for that matter, any technique producing opaque models In particular, the ability to generate even limited explanations is absolutely crucial for user acceptance of such systems. Since the purpose of most data mining systems is to support decision making, the need for explanation facilities in these systems is apparent. The task for the data miner is thus to identify the complex but general relationships that are likely to carry over to production data and the explanation facility makes this easier. Also focused the quality of the extracted rules; i.e. how well the required explanation is performed. In this research some important rule extraction algorithms are discussed and identified the algorithmic complexity; i.e. how efficient the underlying rule extraction algorithm is.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122174122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining High Utility Itemsets in Data Streams Based on the Weighted Sliding Window Model","authors":"P. S. Tsai","doi":"10.5121/IJDKP.2014.4202","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4202","url":null,"abstract":"Most of researches on mining high utility itemsets focus on the static transaction database, where all transactions are treated with the same importance and the database can be scanned more than once. With the emergence of new applications, data stream mining has become a significant research topic. In the data stream environment, online data stream mining algorithms are restricted to make only one pass over the data. However, present methods for mining high utility itemsets still cannot meet the requirement. In this paper, we propose a single pass algorithm for high utility itemset mining based on the weighted sliding window model. The developed algorithm takes advantage of reusing stored information to efficiently discover all the high utility itemsets in data streams.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128779925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate Time Series Classification Using Shapelets","authors":"M. Arathi, A. Govardhan","doi":"10.5121/IJDKP.2014.4204","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4204","url":null,"abstract":"Time series data are sequences of values measured over time. One of the most recent approaches to classification of time series data is to find shapelets within a data set. Time series shapelets are time series subsequences which represent a class. In order to compare two time series sequences, existing work uses Euclidean distance measure. The problem with Euclidean distance is that it requires data to be standardized if scales differ. In this paper, we perform classification of time series data using time series shapelets and used Mahalanobis distance measure. The Mahalanobis distance is a descriptive statistic that provides a relative measure of a data point's distance (residual) from a common point. The Mahalanobis distance is used to identify and gauge similarity of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set and is scaleinvariant. We show that Mahalanobis distance results in more accuracy than Euclidean distance measure.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132635983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Caballero, A. Caro, T. Pérez-Palacios, P. G. Rodríguez, Ramón Palacios
{"title":"Prediction of Quality Features in Iberian Ham by Applying Data Mining on Data From MRI and Computer Vision Techniques","authors":"D. Caballero, A. Caro, T. Pérez-Palacios, P. G. Rodríguez, Ramón Palacios","doi":"10.5121/IJDKP.2014.4201","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4201","url":null,"abstract":"","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117035142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing the Labelling Technique of Suffix Tree Clustering Algorithm","authors":"R. Mahalakshmi, L. Praba","doi":"10.5121/IJDKP.2014.4104","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4104","url":null,"abstract":"","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124595662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recommending Tags for New Resources in Social Bookmarking Systems","authors":"Shweta Yagnik, Priyank Thakkar, K. Kotecha","doi":"10.5121/IJDKP.2014.4102","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4102","url":null,"abstract":"Social bookmarking system is a web-based resource sharing system that allows users to upload, share and organize their resources i.e. bookmarks and publications. The system has shifted the paradigm of bookmarking from an individual activity limited to desktop to a collective activity on the web. It also facilitates user to annotate his resource with free form tags that leads to large communities of users to collaboratively create accessible repositories of web resources. Tagging process has its own challenges like ambiguity, redundancy or misspelled tags and sometimes user tends to avoid it as he has to describe tag at his own. The resultant tag space is noisy or very sparse and dilutes the purpose of tagging. The effective solution is Tag Recommendation System that automatically suggests appropriate set of tags to user while annotating resource. In this paper, we propose a framework that does not depend on tagging history of the resource or user and thereby capable of suggesting tags to the resources which are being submitted to the system first time. We model tag recommendation task as multi-label text classification problem and use Naive Bayes classifier as the base learner of the multilabel classifier. We experiment with Boolean, bag-of-words and term frequency-inverse document frequency (TFIDF) representation of the resources and fit appropriate distribution to the data based on the representation used. Impact of feature selection on the effectiveness of the tag recommendation is also studied. Effectiveness of the proposed framework is evaluated through precision, recall and f-measure metrics.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115999228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building a Collective-Experience Engine for Experience-Transfer Amongst Web Users","authors":"J. K. Hall, Y. Kiyoki","doi":"10.5121/IJDKP.2014.4101","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4101","url":null,"abstract":"This paper describes the Collective Experience Engine (CEE), a system for indexing ExperientialKnowledge about Web knowledge-sources (websites), and performing relative-experience calculations between participants of the CEE. The CEE provides an in-browser interface to query the collective experience of others participating in the CEE. This interface accepts a list of URLs, to which the CEE adds additional information based on the Queryee's previously indexed Experiential-Knowledge. The core of the CEE is its Experiential-Context Conversation (ECConversation) functionality, whereby an collection of a person’s Web Experiential-Knowledge can be utilized to allow a real-world conversation-like exchange of information to take place, including adjusting information-flow based on the Queryee's experiential background and knowledge, and providing additional experientially-related knowledge integrated into the answer from multiple selected 'experience donors'. A relative-experience calculation ensures that information is retrieved only from relative-experts, to ensure sufficient additional information exists, but that such information isn't too advanced for the Queryee to process. This paper gives an overview of the CEE, and the underlying algorithms and data structures, and describes a system architecture and implementation details.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128532009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dormancy Prediction Model in a Prepaid Predominant Mobile Market : A Customer Value Management Approach","authors":"Adeolu O. Dairo, T. Akinwumi","doi":"10.5121/IJDKP.2014.4103","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4103","url":null,"abstract":"Previous studies have predicted customer churn in the mobile indutry especially the postpaid customer segment of the market. However, only few studies have been published on the prepaid segment that could be used and operationalised within the marketing team that are responsible for the management of incident of prepaid churn. This is the first identifiable literature where customer dormancy is predicted along the customer value segmentation. In this article, we use a popular data mining technique to predict when a customer will go dormant or stop performing revenue generating events in a prepaid predominant market. Our study is unique as we considered ~1,451 attributes derived from CDR and SIM registration database (previous studies only considered maximum of ~1,381 potential variables). We built 3 different models for Very High, High and Low value segments. We applied our models on the prepaid base and the output was later compared with the actual dormant customers. Very High segment has the highest accuracy and lift while Low segment has the least at the same threshold. We show that once the problem of prepaid churn is well defined, it can be predicted. We recommend a value segmentation dormancy prediction with decision tree for prepaid segment with a certain threshold. Our study shows that this approach can be easily adopted and operationalised by the campaign management team responsible for the management of prepaid churn in a mobile industry.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125886098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancement Techniques for Data Warehouse Staging Area","authors":"Mahmoud El-Wessimy, Hoda M. O. Mokhtar, O. Hegazy","doi":"10.5121/IJDKP.2013.3601","DOIUrl":"https://doi.org/10.5121/IJDKP.2013.3601","url":null,"abstract":"Poor performance can turn a successful data warehousing project into a failure. Consequently, several attempts have been made by various researchers to deal with the problem of scheduling the ExtractTransform-Load (ETL) process. In this paper we therefore present several approaches in the context of enhancing the data warehousing Extract, Transform and loading stages. We focus on enhancing the performance of extract and transform phases by proposing two algorithms that reduce the time needed in each phase through employing the hidden semantic information in the data. Using the semantic information, a large volume of useless data can be pruned in early design stage. We also focus on the problem of scheduling the execution of the ETL activities, with the goal of minimizing ETL execution time. We explore and invest in this area by choosing three scheduling techniques for ETL. Finally, we experimentally show their behavior in terms of execution time in the sales domain to understand the impact of implementing any of them and choosing the one leading to maximum performance enhancement.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128438090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}