确定分类有序数据相似性的方法

IF 0.2 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
N. Kondruk
{"title":"确定分类有序数据相似性的方法","authors":"N. Kondruk","doi":"10.15588/1607-3274-2023-2-4","DOIUrl":null,"url":null,"abstract":"Context. The development of effective distance metrics and similarity measures for categorical features is an important task in data analysis, machine learning, and decision theory since a significant portion of object properties is described by non-numerical values. Typically, the dependence between categorical features may be more complex than simply comparing them for equality or inequality. Such attributes can be relatively similar, and to construct an effective model, it is necessary to consider this similarity when calculating distance or similarity measures. \nObjective. The aim of the study is to improve the efficiency of solving practical data analysis problems by developing mathematical tools for determining the similarity of objects based on categorical ordered features. \nMethod. A distance based on weighted Manhattan distance and a similarity measure for determining the similarity of objects based on categorical ordinal features (i.e. a linear order with scales of preference considering the problem domain can be specified on the attribute value set) are proposed. It is proven that the distance formula satisfies the axioms of non-negativity, symmetry, triangle inequality, and upper bound, and therefore is a distance metric in the space of ranked categorical features. It is also proven that the similarity measure presented in the study satisfies the axioms of boundedness, symmetry, maximum and minimum similarity, and is described by a decreasing function. \nResults. The developed approach has been implemented in an applied problem of determining the degree of similarity between objects described by ordered categorical features. \nConclusions. In this study, mathematical tools were developed to determine similarity between structured data described by categorical attributes that can be ordered based on a specific priority in the form of a ranking system with preferences. Their properties were analyzed. Experimental studies have shown the convenience and “intuitive understanding” of the logic of data processing in solving practical problems. The proposed approach can provide the opportunity to conduct new meaningful research in data analysis. Prospects for further research lie in the experimental use of the proposed tools in practical tasks and in studying their effectiveness.","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"76 1","pages":""},"PeriodicalIF":0.2000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"METHODS FOR DETERMINING SIMILARITY OF CATEGORICAL ORDERED DATA\",\"authors\":\"N. Kondruk\",\"doi\":\"10.15588/1607-3274-2023-2-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Context. The development of effective distance metrics and similarity measures for categorical features is an important task in data analysis, machine learning, and decision theory since a significant portion of object properties is described by non-numerical values. Typically, the dependence between categorical features may be more complex than simply comparing them for equality or inequality. Such attributes can be relatively similar, and to construct an effective model, it is necessary to consider this similarity when calculating distance or similarity measures. \\nObjective. The aim of the study is to improve the efficiency of solving practical data analysis problems by developing mathematical tools for determining the similarity of objects based on categorical ordered features. \\nMethod. A distance based on weighted Manhattan distance and a similarity measure for determining the similarity of objects based on categorical ordinal features (i.e. a linear order with scales of preference considering the problem domain can be specified on the attribute value set) are proposed. It is proven that the distance formula satisfies the axioms of non-negativity, symmetry, triangle inequality, and upper bound, and therefore is a distance metric in the space of ranked categorical features. It is also proven that the similarity measure presented in the study satisfies the axioms of boundedness, symmetry, maximum and minimum similarity, and is described by a decreasing function. \\nResults. The developed approach has been implemented in an applied problem of determining the degree of similarity between objects described by ordered categorical features. \\nConclusions. In this study, mathematical tools were developed to determine similarity between structured data described by categorical attributes that can be ordered based on a specific priority in the form of a ranking system with preferences. Their properties were analyzed. Experimental studies have shown the convenience and “intuitive understanding” of the logic of data processing in solving practical problems. The proposed approach can provide the opportunity to conduct new meaningful research in data analysis. Prospects for further research lie in the experimental use of the proposed tools in practical tasks and in studying their effectiveness.\",\"PeriodicalId\":43783,\"journal\":{\"name\":\"Radio Electronics Computer Science Control\",\"volume\":\"76 1\",\"pages\":\"\"},\"PeriodicalIF\":0.2000,\"publicationDate\":\"2023-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radio Electronics Computer Science Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15588/1607-3274-2023-2-4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radio Electronics Computer Science Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15588/1607-3274-2023-2-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

上下文。分类特征的有效距离度量和相似性度量的发展是数据分析,机器学习和决策理论中的重要任务,因为物体属性的很大一部分是由非数值描述的。通常,分类特征之间的依赖关系可能比简单地比较它们的相等或不相等更为复杂。这些属性可以是相对相似的,为了构建一个有效的模型,在计算距离或相似度量时需要考虑这种相似性。目标。该研究的目的是通过开发基于分类有序特征确定对象相似性的数学工具来提高解决实际数据分析问题的效率。方法。提出了一种基于加权曼哈顿距离的距离和一种基于分类有序特征的相似性度量(即在属性值集上指定考虑问题域的具有偏好尺度的线性顺序)。证明了该距离公式满足非负性、对称性、三角形不等式和上界公理,因此是排序分类特征空间中的距离度量。并证明了所提出的相似性测度满足有界性、对称性、最大相似性和最小相似性公理,并用递减函数来描述。结果。所开发的方法已在确定有序分类特征描述的对象之间的相似程度的应用问题中实现。结论。在本研究中,开发了数学工具来确定由分类属性描述的结构化数据之间的相似性,这些分类属性可以基于具有偏好的排序系统的特定优先级进行排序。分析了它们的性质。实验研究表明,在解决实际问题时,数据处理逻辑的便利性和“直观理解”。所提出的方法可以为进行新的有意义的数据分析研究提供机会。进一步研究的前景在于在实际任务中实验使用所提出的工具并研究其有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
METHODS FOR DETERMINING SIMILARITY OF CATEGORICAL ORDERED DATA
Context. The development of effective distance metrics and similarity measures for categorical features is an important task in data analysis, machine learning, and decision theory since a significant portion of object properties is described by non-numerical values. Typically, the dependence between categorical features may be more complex than simply comparing them for equality or inequality. Such attributes can be relatively similar, and to construct an effective model, it is necessary to consider this similarity when calculating distance or similarity measures. Objective. The aim of the study is to improve the efficiency of solving practical data analysis problems by developing mathematical tools for determining the similarity of objects based on categorical ordered features. Method. A distance based on weighted Manhattan distance and a similarity measure for determining the similarity of objects based on categorical ordinal features (i.e. a linear order with scales of preference considering the problem domain can be specified on the attribute value set) are proposed. It is proven that the distance formula satisfies the axioms of non-negativity, symmetry, triangle inequality, and upper bound, and therefore is a distance metric in the space of ranked categorical features. It is also proven that the similarity measure presented in the study satisfies the axioms of boundedness, symmetry, maximum and minimum similarity, and is described by a decreasing function. Results. The developed approach has been implemented in an applied problem of determining the degree of similarity between objects described by ordered categorical features. Conclusions. In this study, mathematical tools were developed to determine similarity between structured data described by categorical attributes that can be ordered based on a specific priority in the form of a ranking system with preferences. Their properties were analyzed. Experimental studies have shown the convenience and “intuitive understanding” of the logic of data processing in solving practical problems. The proposed approach can provide the opportunity to conduct new meaningful research in data analysis. Prospects for further research lie in the experimental use of the proposed tools in practical tasks and in studying their effectiveness.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Radio Electronics Computer Science Control
Radio Electronics Computer Science Control COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-
自引率
20.00%
发文量
66
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信