Implementing Vertices Principal Component Analysis under Various Weighting Schemes for Interval Valued Observations with Applications to Data Mining

Md Anwarul Islam Bhuiyan, S. Jahan, Mohammad Babul Hasan
{"title":"Implementing Vertices Principal Component Analysis under Various Weighting Schemes for Interval Valued Observations with Applications to Data Mining","authors":"Md Anwarul Islam Bhuiyan, S. Jahan, Mohammad Babul Hasan","doi":"10.3329/dujs.v72i1.71184","DOIUrl":null,"url":null,"abstract":"Data mining is the technique for deriving valuable data from a more extensive collection of raw data. It is the process of looking for irregularities, trends, and correlations in huge data sets in order to forecast results. Although a number of techniques have been developed to perform data mining on conventional data in the past years, there are huge scope to work with Interval Valued data (IVD). Working with IVD has been shown to be of significant importance when it comes to identifying the objective entity in a precise manner or representing incomplete knowledge on life situations. Unlike classical data where each object is represented by a point, in IVD the objects are represented by regions in Rp. In this paper, an extension of Principle Component Analysis (PCA) known as Vertices Principal Components method for interval-valued information has been explored. It additionally incorporated the relative contributions of the vertices depending on different choices of weighting schemes. A new idea for classification of the supervised IVD is proposed which is based on the idea of K-Nearest Neighbor (KNN) technique. The proposed approach is implemented on several benchmarking data sets. Numerical results suggest the proper choice of weighting schemes for each of the data set that will lead to better recognition rate.\nDhaka Univ. J. Sci. 72(1): 46-55, 2024 (January)","PeriodicalId":11280,"journal":{"name":"Dhaka University Journal of Science","volume":" May","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dhaka University Journal of Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3329/dujs.v72i1.71184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data mining is the technique for deriving valuable data from a more extensive collection of raw data. It is the process of looking for irregularities, trends, and correlations in huge data sets in order to forecast results. Although a number of techniques have been developed to perform data mining on conventional data in the past years, there are huge scope to work with Interval Valued data (IVD). Working with IVD has been shown to be of significant importance when it comes to identifying the objective entity in a precise manner or representing incomplete knowledge on life situations. Unlike classical data where each object is represented by a point, in IVD the objects are represented by regions in Rp. In this paper, an extension of Principle Component Analysis (PCA) known as Vertices Principal Components method for interval-valued information has been explored. It additionally incorporated the relative contributions of the vertices depending on different choices of weighting schemes. A new idea for classification of the supervised IVD is proposed which is based on the idea of K-Nearest Neighbor (KNN) technique. The proposed approach is implemented on several benchmarking data sets. Numerical results suggest the proper choice of weighting schemes for each of the data set that will lead to better recognition rate. Dhaka Univ. J. Sci. 72(1): 46-55, 2024 (January)
在各种加权方案下对区间值观测值实施顶点主成分分析并应用于数据挖掘
数据挖掘是从更广泛的原始数据集合中获取有价值数据的技术。它是在庞大的数据集中寻找不规则性、趋势和相关性,从而预测结果的过程。尽管在过去几年中已经开发出了许多对传统数据进行数据挖掘的技术,但对区间值数据(IVD)进行数据挖掘还有很大的发展空间。事实证明,在以精确的方式识别客观实体或表示生活场景中的不完整知识时,使用 IVD 具有重要意义。与传统数据中每个对象由一个点表示不同,在 IVD 中,对象由 Rp 中的区域表示。此外,它还纳入了顶点的相对贡献,这取决于加权方案的不同选择。基于 K-Nearest Neighbor (KNN) 技术的理念,提出了对有监督 IVD 进行分类的新思路。我们在多个基准数据集上实施了所提出的方法。数值结果表明,为每个数据集选择适当的加权方案将提高识别率:46-55, 2024 (January)
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信