图核的特征选择

Mehmet Tan, Faruk Polat, R. Alhajj
{"title":"图核的特征选择","authors":"Mehmet Tan, Faruk Polat, R. Alhajj","doi":"10.1109/BIBM.2010.5706643","DOIUrl":null,"url":null,"abstract":"Graph classification is important for different scientific applications; it can be exploited in various problems related to bioinformatics and cheminformatics. Given their graphs, there is increasing need for classifying small molecules to predict their properties such as activity, toxicity or mutagenicity. Using subtrees as feature set for graph classification in kernel methods has been shown to perform well in classifying small molecules. It is also well-known that feature selection can improve the performance of classifiers. However, most of the graph kernels are not selective in choosing which subtrees to include in the set of features. Instead, they use all subtrees of a certain property as their feature set. We argue that not all the latter features are needed for effective classification. In this paper, we investigate the effect of selecting subset of the subtrees as features for graph kernels, i.e., we try to identify and keep useful features; all the remaining subtrees are eliminated. A masking procedure, which boils down to feature selection, is proposed for classifying graphs. We conducted experiments on several molecule classification datasets; the results demonstrate the applicability and effectiveness of the proposed feature selection process.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Feature selection for graph kernels\",\"authors\":\"Mehmet Tan, Faruk Polat, R. Alhajj\",\"doi\":\"10.1109/BIBM.2010.5706643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph classification is important for different scientific applications; it can be exploited in various problems related to bioinformatics and cheminformatics. Given their graphs, there is increasing need for classifying small molecules to predict their properties such as activity, toxicity or mutagenicity. Using subtrees as feature set for graph classification in kernel methods has been shown to perform well in classifying small molecules. It is also well-known that feature selection can improve the performance of classifiers. However, most of the graph kernels are not selective in choosing which subtrees to include in the set of features. Instead, they use all subtrees of a certain property as their feature set. We argue that not all the latter features are needed for effective classification. In this paper, we investigate the effect of selecting subset of the subtrees as features for graph kernels, i.e., we try to identify and keep useful features; all the remaining subtrees are eliminated. A masking procedure, which boils down to feature selection, is proposed for classifying graphs. We conducted experiments on several molecule classification datasets; the results demonstrate the applicability and effectiveness of the proposed feature selection process.\",\"PeriodicalId\":275098,\"journal\":{\"name\":\"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2010.5706643\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2010.5706643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

图分类在不同的科学应用中很重要;它可以应用于与生物信息学和化学信息学有关的各种问题。鉴于它们的图表,越来越需要对小分子进行分类,以预测它们的性质,如活性、毒性或诱变性。在核方法中,使用子树作为特征集进行图分类在小分子分类中表现良好。众所周知,特征选择可以提高分类器的性能。然而,大多数图核在选择哪些子树包含在特征集中时是没有选择性的。相反,它们使用某一属性的所有子树作为它们的特征集。我们认为,并不是所有的后一种特征都需要有效的分类。在本文中,我们研究了选择子树子集作为图核特征的效果,即我们试图识别并保留有用的特征;所有剩余的子树都被消除。提出了一种可以归结为特征选择的掩蔽方法来对图进行分类。我们在几个分子分类数据集上进行了实验;实验结果验证了所提出的特征选择过程的适用性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Feature selection for graph kernels
Graph classification is important for different scientific applications; it can be exploited in various problems related to bioinformatics and cheminformatics. Given their graphs, there is increasing need for classifying small molecules to predict their properties such as activity, toxicity or mutagenicity. Using subtrees as feature set for graph classification in kernel methods has been shown to perform well in classifying small molecules. It is also well-known that feature selection can improve the performance of classifiers. However, most of the graph kernels are not selective in choosing which subtrees to include in the set of features. Instead, they use all subtrees of a certain property as their feature set. We argue that not all the latter features are needed for effective classification. In this paper, we investigate the effect of selecting subset of the subtrees as features for graph kernels, i.e., we try to identify and keep useful features; all the remaining subtrees are eliminated. A masking procedure, which boils down to feature selection, is proposed for classifying graphs. We conducted experiments on several molecule classification datasets; the results demonstrate the applicability and effectiveness of the proposed feature selection process.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信