{"title":"FINDING FREQUENT SUBPATHS IN A G RAPH","authors":"S. Guha, Klong Luang","doi":"10.5121/IJDKP.2014.4503","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4503","url":null,"abstract":"The problem considered is that of finding frequent subpaths of a database of paths in a fixed undirected graph. This problem arises in applications such as predicting congestion in network and vehicular traffic. An algorithm, called AFS, based on the classic frequent itemset mining algorithm Apriori is developed, but with significantly improved efficiency over Apriori from exponential in transaction size to quadratic through exploiting the underlying graph structure. This efficiency makes AFS feasible for practical input path sizes. It is also proved that a natural generalization of the frequent subpaths problem is not amenable to any solution quicker than Apriori.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130829197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Students Performance Prediction System Using Multi Agent Data Mining Technique","authors":"Abdullah Al-Malaise, A. Malibari, Mona Alkhozae","doi":"10.5121/IJDKP.2014.4501","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4501","url":null,"abstract":"A high prediction accuracy of the students’ performance is more helpful to identify the low performance students at the beginning of the learning process. Data mining is used to attain this objective. Data mining techniques are used to discover models or patterns of data, and it is much helpful in the decision-making. Boosting technique is the most popular techniques for constructing ensembles of classifier to improve the classification accuracy. Adaptive Boosting (AdaBoost) is a generation of boosting algorithm. It is used for the binary classification and not applicable to multiclass classification directly. SAMME boosting technique extends AdaBoost to a multiclass classification without reduce it to a set of sub-binary classification. In this paper, students’ performance prediction system using Multi Agent Data Mining is proposed to predict the performance of the students based on their data with high prediction accuracy and provide help to the low students by optimization rules. The proposed system has been implemented and evaluated by investigate the prediction accuracy of Adaboost.M1 and LogitBoost ensemble classifiers methods and with C4.5 single classifier method. The results show that using SAMME Boosting technique improves the prediction accuracy and outperformed C4.5 single classifier and LogitBoost.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133386682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Mining in Education : A Review on the Knowledge Discovery Perspective","authors":"P. Guleria, M. Sood","doi":"10.5121/IJDKP.2014.4504","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4504","url":null,"abstract":"Knowledge Discovery in Databases is the process of finding knowledge in massive amount of data where data mining is the core of this process. Data mining can be used to mine understandable meaningful patterns from large databases and these patterns may then be converted into knowledge.Data mining is the process of extracting the information and patterns derived by the KDD process which helps in crucial decision-making.Data mining works with data warehouse and the whole process is divded into action plan to be performed on data: Selection, transformation, mining and results interpretation. In this paper, we have reviewed Knowledge Discovery perspective in Data Mining and consolidated different areas of data mining, its techniques and methods in it.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122581401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experiments on Hypothesis \"Fuzzy K-Means is Better than K-Means for Clustering\"","authors":"Srinivas Sivarathri, A. Govardhan","doi":"10.5121/IJDKP.2014.4502","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4502","url":null,"abstract":"Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122364443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Evaluation of Factors Influencing Safety Performance : A Case in an Industrial Gas Manufacturing Company (Ghana)","authors":"Evelyn Enchill, K. Mireku","doi":"10.5121/IJDKP.2014.4505","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4505","url":null,"abstract":"Safety has become a very important element in firms and organisations especially in Ghana. The impact of safety factors on a firm’s 3E’s (Employee, Environment and Equipment) can improve or deteriorate firm’s public image. This paper identified the key safety indicators and also provided a set of core factors that contribute meaningful in promoting safety performance in an Industrial Gas producer in Ghana using the Analytic Hierarchy Process. Organisational, Human, Technical and Environmental factors were identified as the safety indicators in relation to the study area. The studies revealed that organisational factor is the most important factor or criterion that could facilitate a better safety performance of the Industrial Gas Company. In addition, employees was identified the best safety alternative, whilst environment and equipment followed sequentially.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121070043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Algorithm for Privacy Preserving Data Mining Using Hybrid Transformation","authors":"H. Jalla, Girija P.N","doi":"10.5121/IJDKP.2014.4404","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4404","url":null,"abstract":"","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114790394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fuzzy Logic Based Sentiment Classification","authors":"Sheeba J.I., K. Vivekanandan","doi":"10.5121/IJDKP.2014.4403","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4403","url":null,"abstract":"Sentiment classification aims to detect information such as opinions, explicit , implicit feelings expressed in text. The most existing approaches are able to detect either explicit expressions or implicit expressions of sentiments in the text separately. In this proposed framework it will detect both Implicit and Explicit expressions available in the meeting transcripts. It will classify the Positive, Negative, Neutral words and also identify the topic of the particular meeting transcripts by using fuzzy logic. This paper aims to add some additional features for improving the classification method. The quality of the sentiment classification is improved using proposed fuzzy logic framework .In this fuzzy logic it includes the features like Fuzzy rules and Fuzzy C-means algorithm.The quality of the output is evaluated using the parameters such as precision, recall, f-measure. Here Fuzzy C-means Clustering technique measured in terms of Purity and Entropy. The data set was validated using 10-fold cross validation method and observed 95% confidence interval between the accuracy values .Finally, the proposed fuzzy logic method produced more than 85 % accurate results and error rate is very less compared to existing sentiment classification techniques.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121969043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of Data Mining Tools for Selected Scripts of Stock Market","authors":"Mahajan K.S, K. R.V","doi":"10.5121/IJDKP.2014.4405","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4405","url":null,"abstract":"One of the most important problems in modern finance is finding efficient ways to summarize and visualize the stock market data to give individuals or institutions useful information about the market behavior for investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national economy. So, at the present time many investors look to find criterion to compare stocks together and selecting the best and also investors choose strategies that maximize the earning value of the investment process. Therefore the enormous amount of valuable data generated by the stock market has attracted researchers to explore this problem domain using different methodologies. Therefore research in data mining has gained a high attraction due to the importance of its applications and the increasing generation information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm techniques are used to find association between different scripts of stock market, and also much of the research and development has taken place regarding the reasons for fluctuating Indian stock exchange. But, now days there are two important factors such as gold prices and US Dollar Prices are more dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar prices and transactions of customers. Hence researcher has considered these problems as a topic for research.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"167 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120979832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Arslanturk, Mohammad-Reza Siadat, T. Ogunyemi, B. Givens, A. Diokno
{"title":"Stratification of Clinical Survey Data by Using Contingency Tables","authors":"S. Arslanturk, Mohammad-Reza Siadat, T. Ogunyemi, B. Givens, A. Diokno","doi":"10.5121/IJDKP.2014.4401","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4401","url":null,"abstract":"Data stratification is the process of partitioning the data into distinct and non-overlapping groups since the study population consists of subpopulations that are of particular interest. In clinical data, once the data is stratified into sub populations based on a significant stratifying factor, different risk factors can be determined from each subpopulation. In this paper, the Fisher’s Exact Test is used to determine the significant stratifying factors. The experiments are conducted on a simulated study and the Medical, Epidemiological and Social Aspects of Aging (MESA) data constructed for prediction of urinary incontinence. Results show that, smoking is the most significant stratifying factor of MESA data, showing that the smokers and non-smokers indicates different risk factors towards urinary incontinence and should be treated differently.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133046526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Additive Gaussian Noise Based Data Perturbation in Multi-Level Trust Privacy Preserving Data Mining","authors":"Kalaivani R, Chidambaram S","doi":"10.5121/IJDKP.2014.4303","DOIUrl":"https://doi.org/10.5121/IJDKP.2014.4303","url":null,"abstract":"Data perturbation is one of the most popular models used in privacy preserving data mining. It is specially convenient for applications where the data owners need to export/publish the privacy-sensitive data. This work proposes that an Additive Perturbation based Privacy Preserving Data Mining (PPDM) to deal with the problem of increasing accurate models about all data without knowing exact details of individual values. To Preserve Privacy, the approach establishes Random Perturbation to individual values before data are published. In Proposed system the PPDM approach introduces Multilevel Trust (MLT) on data miners. Here different perturbed copies of the similar data are available to the data miner at different trust levels and may mingle these copies to jointly gather extra information about original data and release the data is called diversity attack. To prevent this attack MLT-PPDM approach is used along with the addition of random Gaussian noise and the noise is properly correlated to the original data, so the data miners cannot get diversity gain in their combined reconstruction.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116857772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}