{"title":"Applying Semantic Suffix Net to suffix tree clustering","authors":"Jongkol Janruang, S. Guha","doi":"10.1109/DMO.2011.5976519","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976519","url":null,"abstract":"In this paper we consider the problem of clustering snippets returned from search engines. We propose a technique to invoke semantic similarity in the clustering process. Our technique improves on the well-known STC method, which is a highly efficient heuristic for clustering web search results. However, a weakness of STC is that it cannot cluster semantic similar documents. To solve this problem, we propose a new data structure to represent suffixes of a single string, called a Semantic Suffix Net (SSN). A generalized semantic suffix net is created to represent suffixes of a set of strings by using a new operator to partially combine nets. A key feature of this new operator is to find a joint point by using semantic similarity and string matching; net pairs combination then begins at that joint point. This logic causes the number of nodes and branches of a generalized semantic suffix net to decrease. The operator then uses the line of suffix links as a boundary to separate the net. A generalized semantic suffix net is then incorporated into the STC algorithm so that it can cluster semantically similar snippets. Experimental results show that the proposed algorithm improves upon conventional STC.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129446694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MPCA-ARDA for solving course timetabling problems","authors":"A. Abuhamdah, M. Ayob","doi":"10.1109/DMO.2011.5976523","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976523","url":null,"abstract":"This work presents a hybridization between Multi-Neighborhood Particle Collision Algorithm (MPCA) and Adaptive Randomized Descent Algorithm (ARDA) acceptance criterion to solve university course timetabling problems. The aim of this work is to produce an effective algorithm for assigning a set of courses, lecturers and students to a specific number of rooms and timeslots, subject to a set of constraints. The structure of the MPCA-ARDA resembles a Hybrid Particle Collision Algorithm (HPCA) structure. The basic difference is that MPCA-ARDA hybridize MPCA and ARDA acceptance criterion, whilst HPCA, hybridize MPCA and great deluge acceptance criterion. In other words, MPCA-ARDA employ adaptive acceptance criterion, whilst HPCA, employ deterministic acceptance criterion. Therefore, MPCA-ARDA has better capability of escaping from local optima compared to HPCA and MPCA. MPCA-ARDA attempts to enhance the trial solution by exploring different neighborhood structures to overcome the limitation in HPCA and MPCA. Results tested on Socha benchmark datasets show that, MPCA-ARDA is able to produce significantly good quality solutions within a reasonable time and outperformed some other approaches in some instances.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"83 5 Pt 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128659841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Soft skills recommendation systems for IT jobs: A Bayesian network approach","authors":"Azuraini Abu Bakar, Choo-Yee Ting","doi":"10.1109/DMO.2011.5976509","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976509","url":null,"abstract":"Today, soft skills are crucial factors to the success of a project. For a certain set of jobs, soft skills are often considered more crucial than the hard skills or technical skills, in order to perform the job effectively. However, it is not a trivial task to identify the appropriate soft skills for each job. In this light, this study proposed a solution to assist employers when preparing advertisement via identification of suitable soft skills together with its relevancy to that particular job title. Bayesian network is employed to solve this problem because it is suitable for reasoning and decision making under uncertainty. The proposed Bayesian Network is trained using a dataset collected via extracting information from advertisements and also through interview sessions with a few identified experts.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132000019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarina Sulaiman, Siti Mariyam Hj. Shamsuddin, A. Abraham
{"title":"Intelligent Web caching using Adaptive Regression Trees, Splines, Random Forests and Tree Net","authors":"Sarina Sulaiman, Siti Mariyam Hj. Shamsuddin, A. Abraham","doi":"10.1109/DMO.2011.5976513","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976513","url":null,"abstract":"Web caching is a technology for improving network traffic on the internet. It is a temporary storage of Web objects (such as HTML documents) for later retrieval. There are three significant advantages to Web caching; reduced bandwidth consumption, reduced server load, and reduced latency. These rewards have made the Web less expensive with better performance. The aim of this research is to introduce advanced machine learning approaches for Web caching to decide either to cache or not to the cache server, which could be modelled as a classification problem. The challenges include identifying attributes ranking and significant improvements in the classification accuracy. Four methods are employed in this research; Classification and Regression Trees (CART), Multivariate Adaptive Regression Splines (MARS), Random Forest (RF) and TreeNet (TN) are used for classification on Web caching. The experimental results reveal that CART performed extremely well in classifying Web objects from the existing log data and an excellent attribute to consider for an accomplishment of Web cache performance enhancement.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133040425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Munaisyah Abdullah, S. Abdullah, A. Hamdan, R. Ismail
{"title":"Optimisation model of selective cutting for Timber Harvest Planning in Peninsular Malaysia","authors":"Munaisyah Abdullah, S. Abdullah, A. Hamdan, R. Ismail","doi":"10.1109/DMO.2011.5976536","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976536","url":null,"abstract":"Timber Harvest Planning (THP) model is used to determine which forest areas to be harvested in different time periods with objective to maximize profit subject to harvesting regulations. Various THP models have been developed in the Western countries based on optimisation approach to generate an optimal or feasible harvest plan. However similar studies have gained less attention in Tropical countries. Thus, this study proposes an optimisation model of THP that reflects selective cutting in Peninsular Malaysia. The model was tested on seven blocks that consists a total of 636 trees with different size and species. We found that, optimisation approach generates selectively timber harvest plan with higher volume and less damage.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114933326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework of rough reducts optimization based on PSO/ACO hybridized algorithms","authors":"Lustiana Pratiwi, Y. Choo, A. Muda","doi":"10.1109/DMO.2011.5976520","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976520","url":null,"abstract":"Rough reducts has contributed significantly in numerous researches of feature selection analysis. It has been proven as a reliable reduction technique in identifying the importance of attributes set in an information system. The key factor for the success of reducts calculation in finding minimal reduct with minimal cardinality of attributes is an NP-Hard problem. This paper has proposed an improved PSO/ACO optimization framework to enhance rough reduct performance by reducing the computational complexities. The proposed framework consists of a three-stage optimization process, i.e. global optimization with PSO, local optimization with ACO and vaccination process on discernibility matrix.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131872168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Hossin, M. Sulaiman, A. Mustapha, N. Mustapha, R. Rahmat
{"title":"A hybrid evaluation metric for optimizing classifier","authors":"M. Hossin, M. Sulaiman, A. Mustapha, N. Mustapha, R. Rahmat","doi":"10.1109/DMO.2011.5976522","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976522","url":null,"abstract":"The accuracy metric has been widely used for discriminating and selecting an optimal solution in constructing an optimized classifier. However, the use of accuracy metric leads the searching process to the sub-optimal solutions due to its limited capability of discriminating values. In this study, we propose a hybrid evaluation metric, which combines the accuracy metric with the precision and recall metrics. We call this new performance metric as Optimized Accuracy with Recall-Precision (OARP). This paper demonstrates that the OARP metric is more discriminating than the accuracy metric using two counter-examples. To verify this advantage, we conduct an empirical verification using a statistical discriminative analysis to prove that the OARP is statistically more discriminating than the accuracy metric. We also empirically demonstrate that a naive stochastic classification algorithm trained with the OARP metric is able to obtain better predictive results than the one trained with the conventional accuracy metric. The experiments have proved that the OARP metric is a better evaluator and optimizer in the constructing of optimized classifier.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131966476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High order fuzzy time series for exchange rates forecasting","authors":"L. Abdullah, I. Taib","doi":"10.1109/DMO.2011.5976496","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976496","url":null,"abstract":"Fuzzy time series model has been employed by many researchers in various forecasting activities such as university enrolment, temperature, direct tax collection and the most popular stock price forecasting. However exchange rate forecasting especially using high order fuzzy time series has been given less attention despite its huge contribution in business transactions. The paper aims to test the forecasting of US dollar (USD) against Malaysian Ringgit (MYR) exchange rates using high order fuzzy time series and check its accuracy. Twenty five data set of the exchange rates USD against MYR was tested to the seven-step of high fuzzy time series. The results show that higher order fuzzy time series yield very small errors thereby the model does produce a good forecasting tool for the exchange rates.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126147929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing network intrusion detection association rules using Chi-Squared pruning technique","authors":"Ammar Fikrat Namik, Z. Othman","doi":"10.1109/DMO.2011.5976515","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976515","url":null,"abstract":"Increasing number of computer networks now a day has increased the effort of putting networks in secure with various attack risk. Intrusion Detection System (IDS) is a popular tool to secure network. Applying data mining has increased the quality of intrusion detection neither as anomaly detection or misused detection from large scale network traffic transaction. Association rules is a popular technique to produce a quality misused detection. However, the weaknesses of association rules is the fact that it often produced with thousands rules which reduce the performance of IDS. This paper aims to show applying post-mining to reduce the number of rules and remaining the most quality rules to produce quality signature. The experiment conducted using two data set collected from KDD Cup 99. Each data set is partitioned into 4 data sets based on type of attacks (PROB, UR2, R2L and DOS). Each partition is mining using Apriori Algorithm, which later performing post-mining using Chi-Squared (χ2) computation techniques. The quality of rules is measured based on Chi-Square value, which calculated according the support, confidence and lift of each association rule. The experiment results shows applying post-mining has reduced the rules up to 98% and remaining the quality rules.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115359149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Mehdi Seyednejad, hamidreza musavi, S. Mohaddese Seyednejad, Tooraj Darabi
{"title":"Fuzzy projective clustering in high dimension data using decrement size of data","authors":"S. Mehdi Seyednejad, hamidreza musavi, S. Mohaddese Seyednejad, Tooraj Darabi","doi":"10.1109/DMO.2011.5976521","DOIUrl":"https://doi.org/10.1109/DMO.2011.5976521","url":null,"abstract":"Today, data clustering problems became an important challenge in Data Mining domain. A kind of clustering is projective clustering. Since a lot of researches has done in this article but each of previous algorithms had some defects that we will be indicate in this paper. We propose a new algorithm based on fuzzy sets and at first using this approach detect and eliminate unimportant properties for all clusters. Then we remove outliers, finally we use weighted fuzzy c-mean algorithm according to offered formula for fuzzy calculations. Experimental results show that our approach has more performance and accuracy than similar algorithms.","PeriodicalId":436393,"journal":{"name":"2011 3rd Conference on Data Mining and Optimization (DMO)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123429118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}