Sarina Sulaiman, Siti Mariyam Hj. Shamsuddin, Nor Bahiah Hj. Ahmad, A. Abraham
{"title":"Meaningless to meaningful Web log data for generation of Web pre-caching decision rules using Rough Set","authors":"Sarina Sulaiman, Siti Mariyam Hj. Shamsuddin, Nor Bahiah Hj. Ahmad, A. Abraham","doi":"10.1109/DMO.2012.6329804","DOIUrl":"https://doi.org/10.1109/DMO.2012.6329804","url":null,"abstract":"Web caching and pre-fetching are vital technologies that can increase the speed of Web loading processes. Since speed and memory are crucial aspects in enhancing the performance of mobile applications and websites, a better technique for Web loading process should be investigated. The weaknesses of the conventional Web caching policy include meaningless information and uncertainty of knowledge representation in Web logs data from the proxy cache to mobile-client. The organisation and learning task of the knowledge-processing for Web logs data require explicit representation to deal with uncertainties. This is due to the exponential growth of rules for finding a suitable knowledge representation from the proxy cache to the mobileclient. Consequently, Rough Set is chosen in this research to generate Web pre-caching decision rules to ensure the meaningless Web log data can be changed to meaningful information.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117210132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Differential Evolution Algorithm for the University course timetabling problem","authors":"Khalid Shaker, S. Abdullah, A. Hatem","doi":"10.1109/DMO.2012.6329805","DOIUrl":"https://doi.org/10.1109/DMO.2012.6329805","url":null,"abstract":"The University course timetabling problem is known as a NP-hard problem. It is a complex problem wherein the problem size can become huge due to limited resources (e.g. amount of rooms, their capacities and number availability of lecturers) and the requirements for these resources. The university course timetabling problem involves assigning a given number of events to a limited number of timeslots and rooms under a given set of constraints; the objective is to satisfy the hard constraints and minimize the violation of soft constraints. In this paper, a Differential Evolution (DE) algorithm is proposed. DE algorithm relies on the mutation operation to reduce the convergence time while reducing the penalty cost of solution. The proposed algorithm is tested over eleven benchmark datasets (representing one large, five medium and five small problems). Experimental results show that our approach is able to generate competitive results when compared with previous available approaches. Possible extensions upon this simple approach are also discussed.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128569590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topic detections in Arabic Dark websites using improved Vector Space Model","authors":"H. Alghamdi, Ali Selamat","doi":"10.1109/DMO.2012.6329790","DOIUrl":"https://doi.org/10.1109/DMO.2012.6329790","url":null,"abstract":"Terrorist group's forums remain a threat for all web users. It stills need to be inspired with algorithms to detect the informative contents. In this paper, we investigate most discussed topics on Arabic Dark Web forums. Arabic Textual contents extracted from selected Arabic Dark Web forums. Vector Space Model (VSM) used as text representation with two different term weighing schemas, Term Frequency (TF) and Term Frequency - Inverse Document Frequency (TF-IDF). Pre-processing phase plays a significant role in processing extracted terms. That consists of filtering, tokenization and stemming. Stemming step is based on proposed stemmer without a root dictionary. Using one of the well-know clustering algorithm k-means to cluster of the terms. The experimental results were presented and showed the most shared terms between the selected forums.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129654528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WebSum: Enhanced SumBasic algorithm for Web site summarization","authors":"Jason Yong-Jin Tee, Lay-Ki Soon, Choo-Yee Ting","doi":"10.1109/DMO.2012.6329812","DOIUrl":"https://doi.org/10.1109/DMO.2012.6329812","url":null,"abstract":"Due to the rapid increase of information in the World Wide Web, there exists an explosion of information on the Web that may overwhelm the common Web user. The Web user may find it quicker or more efficient to browse the Web by reading summaries of Web sites. This paper proposes WebSum to compress Web site content into a summary. WebSum is an enhancement of the SumBasic algorithm, that was mainly used for multi-document summarization. In the case of Web sites, we find that several Web characteristics such as title and keywords can be used to extract sentences that may represent the overall topic of the Web site. Initial results show that WebSum is able to reveal sentences relate to the concept of the Web site. WebSum is then evaluated against the original algorithm of SumBasic.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117125541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Ahmed, A. Bakar, A. Hamdan, Sharifah Mastura Syed Abdullah, O. Jaafar
{"title":"Discovering frequent serial episodes in symbolic sequences for rainfall dataset","authors":"A. Ahmed, A. Bakar, A. Hamdan, Sharifah Mastura Syed Abdullah, O. Jaafar","doi":"10.1109/DMO.2012.6329809","DOIUrl":"https://doi.org/10.1109/DMO.2012.6329809","url":null,"abstract":"Serial episode is a type of temporal frequent pattern in time series. Many different algorithms have been proposed to discover different types of episodes for different applications. In this paper we propose an algorithm for discovering frequent episodes from processed rain fall data. The algorithm is based on three main steps. (1) The rainfall data is first represented in symbolic representation (2) Then numbers of events are detected by applying sliding window for segmentation and CBR for classification. (3)Finally the processed rain fall data is passed through mining phase. Frequent algorithm is used to discover frequent episodes with fixed width. The experiment shows that many frequent episodes with different structure in different years are extracted.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127237033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-parent insertion crossover for vehicle routing problem with time windows","authors":"E. T. Yassen, M. Ayob, M. Nazri, Nasser R. Sabar","doi":"10.1109/DMO.2012.6329806","DOIUrl":"https://doi.org/10.1109/DMO.2012.6329806","url":null,"abstract":"Multi parent crossover has been successfully applied to solve many combinatorial optimization problems such as unconstrained binary quadratic programming problem (UBQP). This because using more than two parents has increased the intensification process by exploiting the information shared by multi parents. However not all type of crossovers are suitable to solve vehicle routing problem (VRP). Therefore, this work introduces a multi parent insertion crossover in solving vehicle routing problem with time windows (VRPTW) by enhancing two parent insertion crossovers. This crossover exchange information among three parents instead of two. Result tested on Solomon VRPTW benchmarks demonstrate that multi parent crossover outperformed two parent crossover on same instances. This prove the effectiveness of having more parents for crossover that can be help the search to find better quality solution.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121610349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Direct Ensemble Classifier for Imbalanced Multiclass Learning","authors":"M. Sainin, R. Alfred","doi":"10.1109/DMO.2012.6329799","DOIUrl":"https://doi.org/10.1109/DMO.2012.6329799","url":null,"abstract":"Researchers have shown that although traditional direct classifier algorithm can be easily applied to multiclass classification, the performance of a single classifier is decreased with the existence of imbalance data in multiclass classification tasks. Thus, ensemble of classifiers has emerged as one of the hot topics in multiclass classification tasks for imbalance problem for data mining and machine learning domain. Ensemble learning is an effective technique that has increasingly been adopted to combine multiple learning algorithms to improve overall prediction accuraciesand may outperform any single sophisticated classifiers. In this paper, an ensemble learner called a Direct Ensemble Classifier for Imbalanced Multiclass Learning (DECIML) that combines simple nearest neighbour and Naive Bayes algorithms is proposed. A combiner method called OR-tree is used to combine the decisions obtained from the ensemble classifiers. The DECIML framework has been tested with several benchmark dataset and shows promising results.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123274469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solving flexible manufacturing system distributed scheduling problem subject to maintenance using harmony search algorithm","authors":"M. Khalid, U. K. Yusof, Maziani Sabudin","doi":"10.1109/DMO.2012.6329801","DOIUrl":"https://doi.org/10.1109/DMO.2012.6329801","url":null,"abstract":"Flexible manufacturing system is one of the industrial branches that highly competitive and rapidly expand. Globalization of the industrial system has encouraged the development of distributed manufacturing, including flexible manufacturing system. As such, the complexity of the problem faced in this new environment promotes current researcher to develop various approaches in optimizing the production scheduling. Approaches such as petri net, ant colony, genetic algorithm, intelligent agents, particle swarm optimization, and tabu search are used to apprehend optimization issues. In reality, maintenance is one of the core parts which is important to the manufacturing scheduling as it will affect greatly toward the manufacturing scheduling when the machine breakdown happen. Unfortunately, most approaches disregard the preventive maintenance in the production scheduling problem. In this paper, a harmony search algorithm is introduced to address the problem which includes maintenance. The problem description is successfully represented and the algorithm performance is studied with several parameter tunings.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133544214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A feature selection model for binary classification of imbalanced data based on preference for target instances","authors":"D. Tan, S. Liew, T. Tan, W. Yeoh","doi":"10.1109/DMO.2012.6329795","DOIUrl":"https://doi.org/10.1109/DMO.2012.6329795","url":null,"abstract":"Telemarketers of online job advertising firms face significant challenges understanding the advertising demands of small-sized enterprises. The effective use of data mining approach can offer e-recruitment companies an improved understanding of customers' patterns and greater insights of purchasing trends. However, prior studies on classifier built by data mining approach provided limited insights into the customer targeting problem of job advertising companies. In this paper we develop a single feature evaluator and propose an approach to select a desired feature subset by setting a threshold. The proposed feature evaluator demonstrates its stability and outstanding performance through empirical experiments in which real-world customer data of an e-recruitment firm are used. Practically, the findings together with the model may help telemarketers to better understand their customers. Theoretically, this paper extends existing research on feature selection for binary classification of imbalanced data.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116666051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NA Abd Majid, B. Young, M. Taylor, John J. J. Chen
{"title":"K-means clustering pre-analysis for fault diagnosis in an aluminium smelting process","authors":"NA Abd Majid, B. Young, M. Taylor, John J. J. Chen","doi":"10.1109/DMO.2012.6329796","DOIUrl":"https://doi.org/10.1109/DMO.2012.6329796","url":null,"abstract":"Developing a fault detection and diagnosis system of complex processes usually involve large volumes of highly correlated data. In the complex aluminium smelting process, there are difficulties in isolating historical data into different classes of faults for developing a fault diagnostic model. This paper presents a new application of using a data mining tool, k-means clustering in order to determine precisely how data corresponds to different classes of faults in the aluminium smelting process. The results of applying the clustering technique on real data sets show that the boundary of each class of faults can be identified. This means the faulty data can be isolated accurately to enable for the development of a fault diagnostic model that can diagnose faults effectively.","PeriodicalId":330241,"journal":{"name":"2012 4th Conference on Data Mining and Optimization (DMO)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127253539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}