在小型集合中填充数据缺口的熵-极限概念

IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Viacheslav Kovtun , Krzysztof Grochla , Mohammed Al-Maitah , Saad Aldosary , Oleksii Kozachko
{"title":"在小型集合中填充数据缺口的熵-极限概念","authors":"Viacheslav Kovtun ,&nbsp;Krzysztof Grochla ,&nbsp;Mohammed Al-Maitah ,&nbsp;Saad Aldosary ,&nbsp;Oleksii Kozachko","doi":"10.1016/j.eij.2025.100621","DOIUrl":null,"url":null,"abstract":"<div><div>The article investigates the process of filling data gaps in a small-sized collection, which generalizes information about periodic measurement of input and output parameters of a target object. To fill the data gaps, a concept is proposed based on generating a committee of entropy-optimal trajectories through sampling probability density functions of parameters from a stochastic parameterized model trained on relevant data. The concept is generalized to cases of filling gaps in output data, input data, and both those data spaces. Filling gaps in output data is implemented using entropy-extreme estimation of probability density functions for parameters of the model and errors of measurement. In the case of addressing missing values in input data, these are interpreted as results of transforming a sequence of independent stochastic vectors introduced into a model structurally identical to that formalized for filling gaps in output data. Thus, the proposed concept inherits the benefits of both parametric estimation and using a trained model of the target process and non-parametric estimation of undefined characteristics that distort data. The proposed concept was tested on the task of filling gaps in a collection consisting of 35 tuples with measurement results of three attributes. It was considered that the imperfection of the measurement procedure caused variability in the obtained data at the level of 15% of their absolute value. Less than 20% of the data from the collection was used to train the corresponding entropy-extreme model. The relative error of the filled missing data was 0.21.</div></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":"29 ","pages":"Article 100621"},"PeriodicalIF":5.0000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Entropy-extreme concept of data gaps filling in a small-sized collection\",\"authors\":\"Viacheslav Kovtun ,&nbsp;Krzysztof Grochla ,&nbsp;Mohammed Al-Maitah ,&nbsp;Saad Aldosary ,&nbsp;Oleksii Kozachko\",\"doi\":\"10.1016/j.eij.2025.100621\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The article investigates the process of filling data gaps in a small-sized collection, which generalizes information about periodic measurement of input and output parameters of a target object. To fill the data gaps, a concept is proposed based on generating a committee of entropy-optimal trajectories through sampling probability density functions of parameters from a stochastic parameterized model trained on relevant data. The concept is generalized to cases of filling gaps in output data, input data, and both those data spaces. Filling gaps in output data is implemented using entropy-extreme estimation of probability density functions for parameters of the model and errors of measurement. In the case of addressing missing values in input data, these are interpreted as results of transforming a sequence of independent stochastic vectors introduced into a model structurally identical to that formalized for filling gaps in output data. Thus, the proposed concept inherits the benefits of both parametric estimation and using a trained model of the target process and non-parametric estimation of undefined characteristics that distort data. The proposed concept was tested on the task of filling gaps in a collection consisting of 35 tuples with measurement results of three attributes. It was considered that the imperfection of the measurement procedure caused variability in the obtained data at the level of 15% of their absolute value. Less than 20% of the data from the collection was used to train the corresponding entropy-extreme model. The relative error of the filled missing data was 0.21.</div></div>\",\"PeriodicalId\":56010,\"journal\":{\"name\":\"Egyptian Informatics Journal\",\"volume\":\"29 \",\"pages\":\"Article 100621\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-02-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Egyptian Informatics Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1110866525000143\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866525000143","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

本文研究了在小型集合中填充数据缺口的过程,该过程概括了目标对象的输入和输出参数的周期性测量信息。为了填补数据空白,提出了一种基于从相关数据训练的随机参数化模型中通过采样参数的概率密度函数来生成熵最优轨迹委员会的概念。这个概念可以推广到填充输出数据、输入数据以及这两个数据空间中的空白的情况。利用模型参数和测量误差的概率密度函数的熵极值估计来填补输出数据的空白。在处理输入数据中缺失值的情况下,这些被解释为将一系列独立的随机向量转换成一个模型的结果,该模型在结构上与为填补输出数据中的空白而形式化的模型相同。因此,所提出的概念继承了参数估计和使用目标过程的训练模型以及对扭曲数据的未定义特征的非参数估计的好处。提出的概念在一个由35个元组组成的集合中填补空白的任务上进行了测试,该集合具有三个属性的测量结果。据认为,测量程序的不完善导致所获得的数据在其绝对值的15%水平上发生变化。收集的数据中只有不到20%用于训练相应的熵极值模型。填补缺失数据的相对误差为0.21。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Entropy-extreme concept of data gaps filling in a small-sized collection
The article investigates the process of filling data gaps in a small-sized collection, which generalizes information about periodic measurement of input and output parameters of a target object. To fill the data gaps, a concept is proposed based on generating a committee of entropy-optimal trajectories through sampling probability density functions of parameters from a stochastic parameterized model trained on relevant data. The concept is generalized to cases of filling gaps in output data, input data, and both those data spaces. Filling gaps in output data is implemented using entropy-extreme estimation of probability density functions for parameters of the model and errors of measurement. In the case of addressing missing values in input data, these are interpreted as results of transforming a sequence of independent stochastic vectors introduced into a model structurally identical to that formalized for filling gaps in output data. Thus, the proposed concept inherits the benefits of both parametric estimation and using a trained model of the target process and non-parametric estimation of undefined characteristics that distort data. The proposed concept was tested on the task of filling gaps in a collection consisting of 35 tuples with measurement results of three attributes. It was considered that the imperfection of the measurement procedure caused variability in the obtained data at the level of 15% of their absolute value. Less than 20% of the data from the collection was used to train the corresponding entropy-extreme model. The relative error of the filled missing data was 0.21.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Egyptian Informatics Journal
Egyptian Informatics Journal Decision Sciences-Management Science and Operations Research
CiteScore
11.10
自引率
1.90%
发文量
59
审稿时长
110 days
期刊介绍: The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信