使用基于距离的数据分组算法的分组标记方法

Research initiative, treatment action : RITA Pub Date : 2020-01-15 DOI:10.22456/2175-2745.91414

Francisco das Chagas Imperes Filho, V. Machado, R. Veras, K. Aires, Aline Montenegro Leal Silva

{"title":"使用基于距离的数据分组算法的分组标记方法","authors":"Francisco das Chagas Imperes Filho, V. Machado, R. Veras, K. Aires, Aline Montenegro Leal Silva","doi":"10.22456/2175-2745.91414","DOIUrl":null,"url":null,"abstract":"Clustering algorithms are often used to form groups based on the similarity of their members. In this context, understanding a group is just as important as its composition. Identifying, or labeling groups can assist with their interpretation and, consequently, guide decision-making efforts by taking into account the features from each group. Interpreting groups can be beneficial when it is necessary to know what makes an element a part of a given group, what are the main features of a group, and what are the differences and similarities among them. This work describes a method for finding relevant features and generate labels for the elements of each group, uniquely identifying them. This way, our approach solves the problem of finding relevant definitions that can identify groups. The proposed method transforms the standard output of an unsupervised distance-based clustering algorithm into a Pertinence Degree (GP), where each element of the database receives a GP concerning each formed group. The elements with their GPs are used to formulate ranges of values for their attributes. Such ranges can identify the groups uniquely. The labels produced by this approach averaged 94.83% of correct answers for the analyzed databases, allowing a natural interpretation of the generated definitions.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":"517 1","pages":"48-61"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Group Labeling Methodology Using Distance-based Data Grouping Algorithms\",\"authors\":\"Francisco das Chagas Imperes Filho, V. Machado, R. Veras, K. Aires, Aline Montenegro Leal Silva\",\"doi\":\"10.22456/2175-2745.91414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering algorithms are often used to form groups based on the similarity of their members. In this context, understanding a group is just as important as its composition. Identifying, or labeling groups can assist with their interpretation and, consequently, guide decision-making efforts by taking into account the features from each group. Interpreting groups can be beneficial when it is necessary to know what makes an element a part of a given group, what are the main features of a group, and what are the differences and similarities among them. This work describes a method for finding relevant features and generate labels for the elements of each group, uniquely identifying them. This way, our approach solves the problem of finding relevant definitions that can identify groups. The proposed method transforms the standard output of an unsupervised distance-based clustering algorithm into a Pertinence Degree (GP), where each element of the database receives a GP concerning each formed group. The elements with their GPs are used to formulate ranges of values for their attributes. Such ranges can identify the groups uniquely. The labels produced by this approach averaged 94.83% of correct answers for the analyzed databases, allowing a natural interpretation of the generated definitions.\",\"PeriodicalId\":82472,\"journal\":{\"name\":\"Research initiative, treatment action : RITA\",\"volume\":\"517 1\",\"pages\":\"48-61\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research initiative, treatment action : RITA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22456/2175-2745.91414\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research initiative, treatment action : RITA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22456/2175-2745.91414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

聚类算法通常用于根据其成员的相似性来形成组。在这种情况下，理解一个群体和它的组成一样重要。识别或标记组可以帮助他们解释，因此，通过考虑每个组的特征来指导决策工作。当有必要知道是什么使一个元素成为给定群体的一部分，一个群体的主要特征是什么，以及它们之间的异同是什么时，解释群体是有益的。这项工作描述了一种寻找相关特征的方法，并为每组元素生成标签，唯一地标识它们。通过这种方式，我们的方法解决了寻找可以识别组的相关定义的问题。该方法将基于无监督距离的聚类算法的标准输出转换为相关度(GP)，其中数据库的每个元素接收与每个形成的组相关的GP。带有其gp的元素用于为其属性制定值范围。这样的范围可以唯一地标识组。这种方法生成的标签平均为所分析数据库的94.83%的正确答案，允许对生成的定义进行自然解释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Group Labeling Methodology Using Distance-based Data Grouping Algorithms

Clustering algorithms are often used to form groups based on the similarity of their members. In this context, understanding a group is just as important as its composition. Identifying, or labeling groups can assist with their interpretation and, consequently, guide decision-making efforts by taking into account the features from each group. Interpreting groups can be beneficial when it is necessary to know what makes an element a part of a given group, what are the main features of a group, and what are the differences and similarities among them. This work describes a method for finding relevant features and generate labels for the elements of each group, uniquely identifying them. This way, our approach solves the problem of finding relevant definitions that can identify groups. The proposed method transforms the standard output of an unsupervised distance-based clustering algorithm into a Pertinence Degree (GP), where each element of the database receives a GP concerning each formed group. The elements with their GPs are used to formulate ranges of values for their attributes. Such ranges can identify the groups uniquely. The labels produced by this approach averaged 94.83% of correct answers for the analyzed databases, allowing a natural interpretation of the generated definitions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Research initiative, treatment action : RITA

自引率

0.00%

发文量