{"title":"Cox's Proportional Hazards Model with Lp Penalty for Biomarker Identification and Survival Prediction","authors":"Zhenqiu Liu","doi":"10.1109/ICMLA.2007.96","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.96","url":null,"abstract":"Advances in high throughput technology provide massive high dimensional data. It is very important and challenging to study the association of genes with various clinical outcomes. Due to large variability in time to certain clinical event among patients, studying possibly censored survival data can be more informative than classification. We proposed the Cox's proportional hazards model with Lp penalty method for simultaneous feature (gene) selection and survival prediction. Lp penalty shrinks coefficients and produces some coefficients that are exactly zero. It has been shown that Lp (p < 1) regularization performs better than L1 in the regression and classification framework (Knight & Fu 2000, Liu et al. 2007). Experimental results with different data demonstrate that the proposed procedures can be used for identifying important genes (features) that are related to time to death due to cancer and for building parsimonious model for predicting the survival of future patients.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130149288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning bayesian networks consistent with the optimal branching","authors":"Alexandra M. Carvalho, Arlindo L. Oliveira","doi":"10.1109/ICMLA.2007.74","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.74","url":null,"abstract":"We introduce a polynomial-time algorithm to learn Bayesian networks whose structure is restricted to nodes with in-degree at most k and to edges consistent with the optimal branching, that we call consistent k-graphs (CkG). The optimal branching is used as an heuristic for a primary causality order between network variables, which is subsequently refined, according to a certain score, into an optimal CkG Bayesian network. This approach augments the search space exponentially, in the number of nodes, relatively to trees, yet keeping a polynomial-time bound. The proposed algorithm can be applied to scores that decompose over the network structure, such as the well known LL, MDL, AIC, BIC, K2, BD, BDe, BDeu and MIT scores. We tested the proposed algorithm in a classification task. We show that the induced classifier always score better than or the same as the Naive Bayes and Tree Augmented Naive Bayes classifiers. Experiments on the UCI repository show that, in many cases, the improved scores translate into increased classification accuracy.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132038575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient label propagation for interactive image segmentation","authors":"Fei Wang, Xin Wang, Ta-Hsin Li","doi":"10.1109/ICMLA.2007.54","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.54","url":null,"abstract":"A novel algorithm for interactive multilabel image/video segmentation is proposed in this paper. Given a small number of pixels with user-defined (or pre-defined) labels, our method can automatically propagate those labels to the remaining unlabeled pixels through an iterative procedure. Theoretical analysis of the convergence property of this algorithm is developed along with the corresponding connections with energy minimization of the hidden Markov random field models. To make the algorithm more efficient, we also derive a multi-level way for propagating the labels. Finally the segmentation results on natural images are presented to show the effectiveness of our method.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127809257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rule refinement with extended data expression","authors":"Jung Min Kong, Dong-Hun Seo, W. Lee","doi":"10.1109/ICMLA.2007.75","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.75","url":null,"abstract":"The rule refinement problem has been known to be one of the most difficult and complex problems. This paper presents a systematic rule refinement method that deals with the old rule directly with the new data, for the first time. To be able to do the rule refinement, the data are represented in the extended data expression, where an event has its weight of importance. To show how this can be done systematically, a decision tree classifier is used for the rule refinement. The weights of the events of the former rule are adjusted according to the depth of the tree merged with the collected new data set to form the new rule. Experiment shows that this approach, with properly designing the weight assignment procedure, is promising to enhance the performance of the inference engine by generating a rule with higher accuracy than the one from new data set only.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121419145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Volker Willert, Marc Toussaint, J. Eggert, E. Körner
{"title":"Uncertainty optimization for robust dynamic optical flow estimation","authors":"Volker Willert, Marc Toussaint, J. Eggert, E. Körner","doi":"10.1109/ICMLA.2007.15","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.15","url":null,"abstract":"We develop an optical flow estimation framework that focuses on motion estimation over time formulated in a dynamic Bayesian network. It realizes a spatiotemporal integration of motion information using a dynamic and robust prior that incorporates spatial and temporal coherence constraints on the flow field. The main contribution is the embedding of these particular assumptions on optical flow evolution into the Bayesian propagation approach that leads to a computationally feasible two-filter inference method and is applicable for on and offline parameter optimization. We analyse the possibility to optimize imposed Student's t-distributed model uncertainties, which are the camera noise and the transition noise. Experiments with synthetic sequences illustrate how the probabilistic framework improves the optical flow estimation because it allows for noisy data, motion ambiguities and motion discontinuities.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120944861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Extraction from Microarray Expression Data by Integration of Semantic Knowledge","authors":"Young-Rae Cho, Xian Xu, W. Hwang, A. Zhang","doi":"10.1109/ICMLA.2007.10","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.10","url":null,"abstract":"Microarray techniques give biologists first peek into the molecular states of living tissues. Previous studies have proven that it is feasible to build sample classifiers using the gene expressional profiles. To build an effective sample classifier, dimension reduction process is necessary since classic pattern recognition algorithms do not work well in high dimensional space. In this paper, we present a novel feature extraction algorithm based on the concept of virtual genes by integrating microarray expression data sets with domain knowledge embedded in gene ontology (GO) annotations. We define semantic similarity to measure the functional associations between two genes using the annotation on each GO term. We then identify the groups of genes, called virtual genes, that potentially interact with each other for a biological function. The correlation in gene expression levels of virtual genes can be used to build a sample classifier. For a colon cancer data set, the integration of microarray expression data with GO annotations significantly improves the accuracy of sample classification by more than 10%.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123006619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Clustering-Based Approach to Predict Outcome in Cancer Patients","authors":"Kai Xing, D. Henson, Dechang Chen, Li Sheng","doi":"10.1109/ICMLA.2007.20","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.20","url":null,"abstract":"The TNM (tumor, lymph node, metastasis) is a widely used staging system for predicting the outcome of cancer patients. However, the TNM is not accurate in prediction, partially due to the fact of deficient staging within and between stages. Based on the availability of large cancer patient datasets, there is a need to expand the TNM. In this paper, we present a general clustering-based approach to accomplish this task of expansion. This approach admits multiple factors. One major advantage of the approach is that patients within each generated group are homogeneous in terms of survival, so that a more accurate prediction of outcome of patients can be made. A demonstration of use of the proposed method is given for breast cancer patients.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121783695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. H. Yamamoto, Maria Cristina Ferreira de Oliveira, M. L. Fujimoto, S. O. Rezende
{"title":"Support Vector Machine Classification of Probability Models and Peptide Features for Improved Peptide Identification from Shotgun Proteomics","authors":"C. H. Yamamoto, Maria Cristina Ferreira de Oliveira, M. L. Fujimoto, S. O. Rezende","doi":"10.1109/ICMLA.2007.17","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.17","url":null,"abstract":"Mass spectrometry (MS)-based proteomics is a powerful and popular high-throughput process for characterizing the global protein content of a sample. In shotgun proteomics, typically proteins are digested into fragments (peptides) prior to mass analysis, and the presence of a protein in inferred from the identification of its constituent peptides. Thus, accurate proteome characterization is dependent upon the accuracy of this peptide identification step. Database search routines generate predicted spectra for all peptides derived from the known genome information, and thus, identify a peptide by 'matching' an experimental to a predicted spectrum. However, due to many problems, such as incomplete fragmentation, this process results in a large number of false positives. We present a new scoring algorithm that integrates probabilistic database scoring metrics (from the MSPolygraph program) with physico-chemical properties in a support vector machine (SVM). We demonstrate that this peptide identification classifier SVM (PICS) score is not only more accurate than the single best database scoring metric, but is also significantly more accurate than models derived using a linear discriminant analysis, decision tree, or artificial neural network.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"81 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123177713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using genetic programming for the induction of oblique decision trees","authors":"A. Shali, M. Kangavari, B. Bina","doi":"10.1109/ICMLA.2007.66","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.66","url":null,"abstract":"In this paper, we present a genetically induced oblique decision tree algorithm. In traditional decision tree, each internal node has a testing criterion involving a single attribute. Oblique decision tree allows testing criterion to consist of more than one attribute. Here we use genetic programming to evolve and find an optimal testing criterion in each internal node for the set of samples at that node. This testing criterion is the characteristic function of a relation over existing attributes. We present the algorithm for construction of the oblique decision tree. We also compare the results of our proposed oblique decision tree with the one of C4.5 algorithm.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"626 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120971689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Neural Network Based Model for Predicting the Performance of a Two Stroke Spark Ignition Engine","authors":"M. M. Wani, M. Wani","doi":"10.1109/ICMLA.2007.107","DOIUrl":"https://doi.org/10.1109/ICMLA.2007.107","url":null,"abstract":"This paper describes a hybrid neural network based model for predicting the performance of a single cylinder two stroke cycle spark ignition engine. The engine was run in the carburetor mode and engine mapping was done by collecting the engine performance data in terms of power and brake specific fuel consumption for various combinations of speed, load and air-fuel ratio. This data was used for predicting the engine performance. The work first presents a model that is based on conventional thermodynamic and gas dynamic relations. The performance of the model is improved by integrating a conventional model with a distributed and synergistic neural network. The resulting hybrid model follows closely the expected results in predicting the performance of a two stroke cycle spark ignition engine. The analysis shows that the hybrid model has learnt the input output data relation very well and is capable to predict the output in the decided domain.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126891795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}