Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining最新文献_第3页

Discriminative Feature Selection for Uncertain Graph Classification. 不确定图分类的判别特征选择。

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2013-01-01 DOI: 10.1137/1.9781611972832.10

Xiangnan Kong, Philip S Yu, Xue Wang, Ann B Ragin

{"title":"Discriminative Feature Selection for Uncertain Graph Classification.","authors":"Xiangnan Kong, Philip S Yu, Xue Wang, Ann B Ragin","doi":"10.1137/1.9781611972832.10","DOIUrl":"https://doi.org/10.1137/1.9781611972832.10","url":null,"abstract":"Mining discriminative features for graph data has attracted much attention in recent years due to its important role in constructing graph classifiers, generating graph indices, etc. Most measurement of interestingness of discriminative subgraph features are defined on certain graphs, where the structure of graph objects are certain, and the binary edges within each graph represent the \"presence\" of linkages among the nodes. In many real-world applications, however, the linkage structure of the graphs is inherently uncertain. Therefore, existing measurements of interestingness based upon certain graphs are unable to capture the structural uncertainty in these applications effectively. In this paper, we study the problem of discriminative subgraph feature selection from uncertain graphs. This problem is challenging and different from conventional subgraph mining problems because both the structure of the graph objects and the discrimination score of each subgraph feature are uncertain. To address these challenges, we propose a novel discriminative subgraph feature selection method, Dug, which can find discriminative subgraph features in uncertain graphs based upon different statistical measures including expectation, median, mode and φ-probability. We first compute the probability distribution of the discrimination scores for each subgraph feature based on dynamic programming. Then a branch-and-bound algorithm is proposed to search for discriminative subgraphs efficiently. Extensive experiments on various neuroimaging applications (i.e., Alzheimers Disease, ADHD and HIV) have been performed to analyze the gain in performance by taking into account structural uncertainties in identifying discriminative subgraph features for graph classification.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"2013 ","pages":"82-93"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611972832.10","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33282393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Sparse Representation for Prediction of HIV-1 Protease Drug Resistance. 预测HIV-1蛋白酶耐药性的稀疏表示。

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2013-01-01 DOI: 10.1137/1.9781611972832.38

Xiaxia Yu, Irene T Weber, Robert W Harrison

{"title":"Sparse Representation for Prediction of HIV-1 Protease Drug Resistance.","authors":"Xiaxia Yu, Irene T Weber, Robert W Harrison","doi":"10.1137/1.9781611972832.38","DOIUrl":"https://doi.org/10.1137/1.9781611972832.38","url":null,"abstract":"HIV rapidly evolves drug resistance in response to antiviral drugs used in AIDS therapy. Estimating the specific resistance of a given strain of HIV to individual drugs from sequence data has important benefits for both the therapy of individual patients and the development of novel drugs. We have developed an accurate classification method based on the sparse representation theory, and demonstrate that this method is highly effective with HIV-1 protease. The protease structure is represented using our newly proposed encoding method based on Delaunay triangulation, and combined with the mutated amino acid sequences of known drug-resistant strains to train a machine-learning algorithm both for classification and regression of drug-resistant mutations. An overall cross-validated classification accuracy of 97% is obtained when trained on a publically available data base of approximately 1.5×104 known sequences (Stanford HIV database http://hivdb.stanford.edu/cgi-bin/GenoPhenoDS.cgi). Resistance to four FDA approved drugs is computed and comparisons with other algorithms demonstrate that our method shows significant improvements in classification accuracy.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"2013 ","pages":"342-349"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611972832.38","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32407549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Sampling Strategies to Evaluate the Performance of Unknown Predictors. 评估未知预测器性能的抽样策略。

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2012-01-01 DOI: 10.1137/1.9781611972825.43

Hamed Valizadegan, Saeed Amizadeh, Milos Hauskrecht

引用次数: 2

Revenue Generation in Hospital Foundations: Neural Network versus Regression Model Recommendations 医院基金会的创收:神经网络与回归模型建议

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2011-01-25 DOI: 10.19030/IJMIS.V15I1.1596

M. Malliaris, M. Pappas

引用次数: 4

Generalized and Heuristic-Free Feature Construction for Improved Accuracy. 提高准确率的广义和无启发式特征构建。

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2010-01-01 DOI: 10.1137/1.9781611972801.55

Wei Fan, Erheng Zhong, Jing Peng, Olivier Verscheure, Kun Zhang, Jiangtao Ren, Rong Yan, Qiang Yang

{"title":"Generalized and Heuristic-Free Feature Construction for Improved Accuracy.","authors":"Wei Fan, Erheng Zhong, Jing Peng, Olivier Verscheure, Kun Zhang, Jiangtao Ren, Rong Yan, Qiang Yang","doi":"10.1137/1.9781611972801.55","DOIUrl":"https://doi.org/10.1137/1.9781611972801.55","url":null,"abstract":"State-of-the-art learning algorithms accept data in feature vector format as input. Examples belonging to different classes may not always be easy to separate in the original feature space. One may ask: can transformation of existing features into new space reveal significant discriminative information not obvious in the original space? Since there can be infinite number of ways to extend features, it is impractical to first enumerate and then perform feature selection. Second, evaluation of discriminative power on the complete dataset is not always optimal. This is because features highly discriminative on subset of examples may not necessarily be significant when evaluated on the entire dataset. Third, feature construction ought to be automated and general, such that, it doesn't require domain knowledge and its improved accuracy maintains over a large number of classification algorithms. In this paper, we propose a framework to address these problems through the following steps: (1) divide-conquer to avoid exhaustive enumeration; (2) local feature construction and evaluation within subspaces of examples where local error is still high and constructed features thus far still do not predict well; (3) weighting rules based search that is domain knowledge free and has provable performance guarantee. Empirical studies indicate that significant improvement (as much as 9% in accuracy and 28% in AUC) is achieved using the newly constructed features over a variety of inductive learners evaluated against a number of balanced, skewed and high-dimensional datasets. Software and datasets are available from the authors.","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"2010 ","pages":"629-640"},"PeriodicalIF":0.0,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1137/1.9781611972801.55","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29859654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Anomaly Detection Using the Dempster-Shafer Method 基于Dempster-Shafer方法的异常检测

Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining Pub Date : 2008-03-11 DOI: 10.2139/SSRN.2831339

Qi Chen, U. Aickelin

引用次数: 34