超高维离散特征分类中的交互识别与团块筛选

IF 1.8 4区 计算机科学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
An, Baiguo, Feng, Guozhong, Guo, Jianhua
{"title":"超高维离散特征分类中的交互识别与团块筛选","authors":"An, Baiguo, Feng, Guozhong, Guo, Jianhua","doi":"10.1007/s00357-021-09399-0","DOIUrl":null,"url":null,"abstract":"<p>Interactions have greatly influenced recent scientific discoveries, but the identification of interactions is challenging in ultra-high dimensions. In this study, we propose an interaction identification method for classification with ultra-high dimensional discrete features. We utilize clique sets to capture interactions among features, where features in a common clique have interactions that can be used for classification. The number of features related to the interaction is the size of the clique. Hence, our method can consider interactions caused by more than two feature variables. We propose a Kullback-Leibler divergence-based approach to correctly identify the clique sets with a probability that tends to 1 as the sample size tends to infinity. A clique screening method is then proposed to filter out clique sets that are useless for classification, and the strong sure screening property can be guaranteed. Finally, a clique naïve Bayes classifier is proposed for classification. Numerical studies demonstrate that our proposed approach performs very well.</p>","PeriodicalId":50241,"journal":{"name":"Journal of Classification","volume":"8 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2021-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features\",\"authors\":\"An, Baiguo, Feng, Guozhong, Guo, Jianhua\",\"doi\":\"10.1007/s00357-021-09399-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Interactions have greatly influenced recent scientific discoveries, but the identification of interactions is challenging in ultra-high dimensions. In this study, we propose an interaction identification method for classification with ultra-high dimensional discrete features. We utilize clique sets to capture interactions among features, where features in a common clique have interactions that can be used for classification. The number of features related to the interaction is the size of the clique. Hence, our method can consider interactions caused by more than two feature variables. We propose a Kullback-Leibler divergence-based approach to correctly identify the clique sets with a probability that tends to 1 as the sample size tends to infinity. A clique screening method is then proposed to filter out clique sets that are useless for classification, and the strong sure screening property can be guaranteed. Finally, a clique naïve Bayes classifier is proposed for classification. Numerical studies demonstrate that our proposed approach performs very well.</p>\",\"PeriodicalId\":50241,\"journal\":{\"name\":\"Journal of Classification\",\"volume\":\"8 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2021-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Classification\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00357-021-09399-0\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Classification","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00357-021-09399-0","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

相互作用极大地影响了最近的科学发现,但在超高维度中确定相互作用是具有挑战性的。在本研究中,我们提出了一种用于具有超高维离散特征的分类的交互识别方法。我们利用团集来捕获特征之间的交互,其中公共团中的特征具有可用于分类的交互。与交互相关的特征的数量就是团的大小。因此,我们的方法可以考虑由两个以上特征变量引起的相互作用。我们提出了一种基于Kullback-Leibler散度的方法来正确识别当样本量趋于无穷大时概率趋于1的团集。在此基础上,提出了一种团团筛选方法,过滤掉对分类无用的团团集,保证了强可靠筛选的特性。最后,提出了一种团naïve贝叶斯分类器进行分类。数值研究表明,该方法具有良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features

Interactions have greatly influenced recent scientific discoveries, but the identification of interactions is challenging in ultra-high dimensions. In this study, we propose an interaction identification method for classification with ultra-high dimensional discrete features. We utilize clique sets to capture interactions among features, where features in a common clique have interactions that can be used for classification. The number of features related to the interaction is the size of the clique. Hence, our method can consider interactions caused by more than two feature variables. We propose a Kullback-Leibler divergence-based approach to correctly identify the clique sets with a probability that tends to 1 as the sample size tends to infinity. A clique screening method is then proposed to filter out clique sets that are useless for classification, and the strong sure screening property can be guaranteed. Finally, a clique naïve Bayes classifier is proposed for classification. Numerical studies demonstrate that our proposed approach performs very well.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Classification
Journal of Classification 数学-数学跨学科应用
CiteScore
3.60
自引率
5.00%
发文量
16
审稿时长
>12 weeks
期刊介绍: To publish original and valuable papers in the field of classification, numerical taxonomy, multidimensional scaling and other ordination techniques, clustering, tree structures and other network models (with somewhat less emphasis on principal components analysis, factor analysis, and discriminant analysis), as well as associated models and algorithms for fitting them. Articles will support advances in methodology while demonstrating compelling substantive applications. Comprehensive review articles are also acceptable. Contributions will represent disciplines such as statistics, psychology, biology, information retrieval, anthropology, archeology, astronomy, business, chemistry, computer science, economics, engineering, geography, geology, linguistics, marketing, mathematics, medicine, political science, psychiatry, sociology, and soil science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信