A. Guzmán-Ponce, J. Raymundo Marcial-Romero, R.M. Valdovinos-Rosas, J.S. Sánchez-Garreta
{"title":"压缩数据的加权完全图","authors":"A. Guzmán-Ponce, J. Raymundo Marcial-Romero, R.M. Valdovinos-Rosas, J.S. Sánchez-Garreta","doi":"10.1016/j.entcs.2020.10.005","DOIUrl":null,"url":null,"abstract":"<div><p>In many real-world problems (such as industrial applications, chemistry models, social network analysis, among others), their solution can be obtained by transforming the problem in terms of vertices and edges, that is to say, using graph theory. Data Science applications are characterized by processing large volumes of data, in some cases, the data size can be higher than the resources for their processing, situation that makes prohibitive to use the traditional methods. In this way, to develop solutions based on graphs for condensing data can be a good strategy for handling big datasets. In this paper we include two methods for condensing data based on graphs, the two proposals consider a weighted complete graph by acquiring an induced subgraph or a minimum spanning tree from the whole datasets. We conducted some experiments in order to validate our proposals, using 24 benchmark real-datasets for training the 1NN, C4.5, and SVM classifiers. The results prove that our methods condensed the datasets without reducing the performance of the classifier, in terms of geometric means and the Wilcoxon's test.</p></div>","PeriodicalId":38770,"journal":{"name":"Electronic Notes in Theoretical Computer Science","volume":"354 ","pages":"Pages 45-60"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.entcs.2020.10.005","citationCount":"2","resultStr":"{\"title\":\"Weighted Complete Graphs for Condensing Data\",\"authors\":\"A. Guzmán-Ponce, J. Raymundo Marcial-Romero, R.M. Valdovinos-Rosas, J.S. Sánchez-Garreta\",\"doi\":\"10.1016/j.entcs.2020.10.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In many real-world problems (such as industrial applications, chemistry models, social network analysis, among others), their solution can be obtained by transforming the problem in terms of vertices and edges, that is to say, using graph theory. Data Science applications are characterized by processing large volumes of data, in some cases, the data size can be higher than the resources for their processing, situation that makes prohibitive to use the traditional methods. In this way, to develop solutions based on graphs for condensing data can be a good strategy for handling big datasets. In this paper we include two methods for condensing data based on graphs, the two proposals consider a weighted complete graph by acquiring an induced subgraph or a minimum spanning tree from the whole datasets. We conducted some experiments in order to validate our proposals, using 24 benchmark real-datasets for training the 1NN, C4.5, and SVM classifiers. The results prove that our methods condensed the datasets without reducing the performance of the classifier, in terms of geometric means and the Wilcoxon's test.</p></div>\",\"PeriodicalId\":38770,\"journal\":{\"name\":\"Electronic Notes in Theoretical Computer Science\",\"volume\":\"354 \",\"pages\":\"Pages 45-60\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.entcs.2020.10.005\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronic Notes in Theoretical Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1571066120300815\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Notes in Theoretical Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1571066120300815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
In many real-world problems (such as industrial applications, chemistry models, social network analysis, among others), their solution can be obtained by transforming the problem in terms of vertices and edges, that is to say, using graph theory. Data Science applications are characterized by processing large volumes of data, in some cases, the data size can be higher than the resources for their processing, situation that makes prohibitive to use the traditional methods. In this way, to develop solutions based on graphs for condensing data can be a good strategy for handling big datasets. In this paper we include two methods for condensing data based on graphs, the two proposals consider a weighted complete graph by acquiring an induced subgraph or a minimum spanning tree from the whole datasets. We conducted some experiments in order to validate our proposals, using 24 benchmark real-datasets for training the 1NN, C4.5, and SVM classifiers. The results prove that our methods condensed the datasets without reducing the performance of the classifier, in terms of geometric means and the Wilcoxon's test.
期刊介绍:
ENTCS is a venue for the rapid electronic publication of the proceedings of conferences, of lecture notes, monographs and other similar material for which quick publication and the availability on the electronic media is appropriate. Organizers of conferences whose proceedings appear in ENTCS, and authors of other material appearing as a volume in the series are allowed to make hard copies of the relevant volume for limited distribution. For example, conference proceedings may be distributed to participants at the meeting, and lecture notes can be distributed to those taking a course based on the material in the volume.