Md Anwarul Islam Bhuiyan, S. Jahan, Mohammad Babul Hasan
{"title":"在各种加权方案下对区间值观测值实施顶点主成分分析并应用于数据挖掘","authors":"Md Anwarul Islam Bhuiyan, S. Jahan, Mohammad Babul Hasan","doi":"10.3329/dujs.v72i1.71184","DOIUrl":null,"url":null,"abstract":"Data mining is the technique for deriving valuable data from a more extensive collection of raw data. It is the process of looking for irregularities, trends, and correlations in huge data sets in order to forecast results. Although a number of techniques have been developed to perform data mining on conventional data in the past years, there are huge scope to work with Interval Valued data (IVD). Working with IVD has been shown to be of significant importance when it comes to identifying the objective entity in a precise manner or representing incomplete knowledge on life situations. Unlike classical data where each object is represented by a point, in IVD the objects are represented by regions in Rp. In this paper, an extension of Principle Component Analysis (PCA) known as Vertices Principal Components method for interval-valued information has been explored. It additionally incorporated the relative contributions of the vertices depending on different choices of weighting schemes. A new idea for classification of the supervised IVD is proposed which is based on the idea of K-Nearest Neighbor (KNN) technique. The proposed approach is implemented on several benchmarking data sets. Numerical results suggest the proper choice of weighting schemes for each of the data set that will lead to better recognition rate.\nDhaka Univ. J. Sci. 72(1): 46-55, 2024 (January)","PeriodicalId":11280,"journal":{"name":"Dhaka University Journal of Science","volume":" May","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Implementing Vertices Principal Component Analysis under Various Weighting Schemes for Interval Valued Observations with Applications to Data Mining\",\"authors\":\"Md Anwarul Islam Bhuiyan, S. Jahan, Mohammad Babul Hasan\",\"doi\":\"10.3329/dujs.v72i1.71184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data mining is the technique for deriving valuable data from a more extensive collection of raw data. It is the process of looking for irregularities, trends, and correlations in huge data sets in order to forecast results. Although a number of techniques have been developed to perform data mining on conventional data in the past years, there are huge scope to work with Interval Valued data (IVD). Working with IVD has been shown to be of significant importance when it comes to identifying the objective entity in a precise manner or representing incomplete knowledge on life situations. Unlike classical data where each object is represented by a point, in IVD the objects are represented by regions in Rp. In this paper, an extension of Principle Component Analysis (PCA) known as Vertices Principal Components method for interval-valued information has been explored. It additionally incorporated the relative contributions of the vertices depending on different choices of weighting schemes. A new idea for classification of the supervised IVD is proposed which is based on the idea of K-Nearest Neighbor (KNN) technique. The proposed approach is implemented on several benchmarking data sets. Numerical results suggest the proper choice of weighting schemes for each of the data set that will lead to better recognition rate.\\nDhaka Univ. J. Sci. 72(1): 46-55, 2024 (January)\",\"PeriodicalId\":11280,\"journal\":{\"name\":\"Dhaka University Journal of Science\",\"volume\":\" May\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Dhaka University Journal of Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3329/dujs.v72i1.71184\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dhaka University Journal of Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3329/dujs.v72i1.71184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Implementing Vertices Principal Component Analysis under Various Weighting Schemes for Interval Valued Observations with Applications to Data Mining
Data mining is the technique for deriving valuable data from a more extensive collection of raw data. It is the process of looking for irregularities, trends, and correlations in huge data sets in order to forecast results. Although a number of techniques have been developed to perform data mining on conventional data in the past years, there are huge scope to work with Interval Valued data (IVD). Working with IVD has been shown to be of significant importance when it comes to identifying the objective entity in a precise manner or representing incomplete knowledge on life situations. Unlike classical data where each object is represented by a point, in IVD the objects are represented by regions in Rp. In this paper, an extension of Principle Component Analysis (PCA) known as Vertices Principal Components method for interval-valued information has been explored. It additionally incorporated the relative contributions of the vertices depending on different choices of weighting schemes. A new idea for classification of the supervised IVD is proposed which is based on the idea of K-Nearest Neighbor (KNN) technique. The proposed approach is implemented on several benchmarking data sets. Numerical results suggest the proper choice of weighting schemes for each of the data set that will lead to better recognition rate.
Dhaka Univ. J. Sci. 72(1): 46-55, 2024 (January)