识别相似数据序列的基于距离测量的信息技术

A. Baturinets
{"title":"识别相似数据序列的基于距离测量的信息技术","authors":"A. Baturinets","doi":"10.33108/visnyk_tntu2022.01.128","DOIUrl":null,"url":null,"abstract":"The aim of the work is to develop and implement a technology for identifying similar series, and to test on series of data represented by hydrological samples. The subject of the study is the methods and approaches for identifying similar series. The object of the study is the process of identifying similar series, which are represented by certain indicators. The task is to propose and implement distance measures, where one of them takes into consideration the similarity between the values of the series and their relationship, and another is based on a weighted Euclidean distance taking into account the need to actualize the values that are the most important under certain conditions of the task; to implement a technology to find similar series represented by certain indicators values; to obtain a more resilient solution, to implement a procedure for determining a set of similar series based on the results obtained for each individual distance; the results should be analyzed and the conclusions have to be drawn dealing with practical application of the technology. The following methods were used: statistical analysis methods, methods for calculating distances, and similarity between data series. The following results were obtained: the technology for similar data series detection has been implemented; two distance measures were proposed and described as a part of the technology implemented; a procedure for determining a set of similar rows was implemented that was based on the obtained distances calculation. The scientific novelty of the research under discussion involves: Euclidean weighted distance was described and applied taking into account the actuality of data series values; a new measure of distance has been described and applied that allows both the degree of similarity between the values of the series and their correlation to be taken into account, as well as a technique has been developed for determining similar series from a set of selected distance measures. The practical importance of the developed and implemented technology consists in the following possibilities application to data series of different applied fields: conducting an assessment and identifying some similar series, in particular as an intermediate step in the analysis; in addition, the proposed distance measures improve the quality of identifying similar data series. In our further research, we plan to investigate the possibilities of lengthening the data series and filling in the gaps with values from other series defined as similar ones.","PeriodicalId":21595,"journal":{"name":"Scientific journal of the Ternopil national technical university","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distance measures-based information technology for identifying similar data series\",\"authors\":\"A. Baturinets\",\"doi\":\"10.33108/visnyk_tntu2022.01.128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The aim of the work is to develop and implement a technology for identifying similar series, and to test on series of data represented by hydrological samples. The subject of the study is the methods and approaches for identifying similar series. The object of the study is the process of identifying similar series, which are represented by certain indicators. The task is to propose and implement distance measures, where one of them takes into consideration the similarity between the values of the series and their relationship, and another is based on a weighted Euclidean distance taking into account the need to actualize the values that are the most important under certain conditions of the task; to implement a technology to find similar series represented by certain indicators values; to obtain a more resilient solution, to implement a procedure for determining a set of similar series based on the results obtained for each individual distance; the results should be analyzed and the conclusions have to be drawn dealing with practical application of the technology. The following methods were used: statistical analysis methods, methods for calculating distances, and similarity between data series. The following results were obtained: the technology for similar data series detection has been implemented; two distance measures were proposed and described as a part of the technology implemented; a procedure for determining a set of similar rows was implemented that was based on the obtained distances calculation. The scientific novelty of the research under discussion involves: Euclidean weighted distance was described and applied taking into account the actuality of data series values; a new measure of distance has been described and applied that allows both the degree of similarity between the values of the series and their correlation to be taken into account, as well as a technique has been developed for determining similar series from a set of selected distance measures. The practical importance of the developed and implemented technology consists in the following possibilities application to data series of different applied fields: conducting an assessment and identifying some similar series, in particular as an intermediate step in the analysis; in addition, the proposed distance measures improve the quality of identifying similar data series. In our further research, we plan to investigate the possibilities of lengthening the data series and filling in the gaps with values from other series defined as similar ones.\",\"PeriodicalId\":21595,\"journal\":{\"name\":\"Scientific journal of the Ternopil national technical university\",\"volume\":\"58 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific journal of the Ternopil national technical university\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33108/visnyk_tntu2022.01.128\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific journal of the Ternopil national technical university","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33108/visnyk_tntu2022.01.128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

这项工作的目的是开发和实施一种识别相似序列的技术,并对水文样本所代表的一系列数据进行测试。本研究的主题是识别相似序列的方法和途径。研究对象是识别相似序列的过程,相似序列用一定的指标来表示。任务是提出和实现距离度量,其中一个考虑了序列值之间的相似性及其关系,另一个基于加权欧几里得距离,考虑了在任务的某些条件下实现最重要值的需要;实现一种寻找由某些指标值表示的相似序列的技术;为了获得更具弹性的解,实施一种基于每个单独距离获得的结果确定一组相似序列的程序;应该对结果进行分析,并得出结论,以处理该技术的实际应用。使用了以下方法:统计分析方法、距离计算方法和数据序列之间的相似度。研究结果表明:实现了相似数据序列检测技术;提出并描述了两种距离措施,作为实施技术的一部分;基于获得的距离计算实现了确定一组相似行的过程。所讨论的研究的科学新颖性包括:考虑到数据序列值的现实性,描述和应用欧几里得加权距离;已经描述和应用了一种新的距离度量,它允许考虑到序列值之间的相似程度及其相关性,并且已经开发了一种技术,用于从一组选定的距离度量中确定相似的序列。所开发和实施的技术的实际重要性在于下列可能应用于不同应用领域的数据系列:进行评估和确定一些类似的系列,特别是作为分析的中间步骤;此外,所提出的距离度量提高了识别相似数据序列的质量。在我们进一步的研究中,我们计划研究延长数据序列的可能性,并用定义为相似序列的其他序列的值填补空白。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Distance measures-based information technology for identifying similar data series
The aim of the work is to develop and implement a technology for identifying similar series, and to test on series of data represented by hydrological samples. The subject of the study is the methods and approaches for identifying similar series. The object of the study is the process of identifying similar series, which are represented by certain indicators. The task is to propose and implement distance measures, where one of them takes into consideration the similarity between the values of the series and their relationship, and another is based on a weighted Euclidean distance taking into account the need to actualize the values that are the most important under certain conditions of the task; to implement a technology to find similar series represented by certain indicators values; to obtain a more resilient solution, to implement a procedure for determining a set of similar series based on the results obtained for each individual distance; the results should be analyzed and the conclusions have to be drawn dealing with practical application of the technology. The following methods were used: statistical analysis methods, methods for calculating distances, and similarity between data series. The following results were obtained: the technology for similar data series detection has been implemented; two distance measures were proposed and described as a part of the technology implemented; a procedure for determining a set of similar rows was implemented that was based on the obtained distances calculation. The scientific novelty of the research under discussion involves: Euclidean weighted distance was described and applied taking into account the actuality of data series values; a new measure of distance has been described and applied that allows both the degree of similarity between the values of the series and their correlation to be taken into account, as well as a technique has been developed for determining similar series from a set of selected distance measures. The practical importance of the developed and implemented technology consists in the following possibilities application to data series of different applied fields: conducting an assessment and identifying some similar series, in particular as an intermediate step in the analysis; in addition, the proposed distance measures improve the quality of identifying similar data series. In our further research, we plan to investigate the possibilities of lengthening the data series and filling in the gaps with values from other series defined as similar ones.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信