{"title":"用于预测网络流行病早期发展的熵极端模型","authors":"","doi":"10.1016/j.csbj.2024.08.017","DOIUrl":null,"url":null,"abstract":"<div><p>The approaches used in biomedicine to analyze epidemics take into account features such as exponential growth in the early stages, slowdown in dynamics upon saturation, time delays in spread, segmented spread, evolutionary adaptations of the pathogen, and preventive measures based on universal communication protocols. All these characteristics are also present in modern cyber epidemics. Therefore, adapting effective biomedical approaches to epidemic analysis for the investigation of the development of cyber epidemics is a promising scientific research task. The article is dedicated to researching the problem of predicting the development of cyber epidemics at early stages. In such conditions, the available data is scarce, incomplete, and distorted. This situation makes it impossible to use artificial intelligence models for prediction. Therefore, the authors propose an entropy-extreme model, defined within the machine learning paradigm, to address this problem. The model is based on estimating the probability distributions of its controllable parameters from input data, taking into account the variability characteristic of the last ones. The entropy-extreme instance, identified from a set of such distributions, indicates the most uncertain (most negative) trajectory of the investigated process. Numerical methods are used to analyze the generated set of investigated process development trajectories based on the assessments of probability distributions of the controllable parameters and the variability characteristic. The result of the analysis includes characteristic predictive trajectories such as the average and median trajectories from the set, as well as the trajectory corresponding to the standard deviation area of the parameters’ values. Experiments with real data on the infection of Windows-operated devices by various categories of malware showed that the proposed model outperforms the classical competitor (least squares method) in predicting the development of cyber epidemics near the extremum of the time series representing the deployment of such a process over time. Moreover, the proposed model can be applied without any prior hypotheses regarding the probabilistic properties of the available data.</p></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2001037024002770/pdfft?md5=93adf58e38f4237644ddd0c1ca45aefd&pid=1-s2.0-S2001037024002770-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Entropy-extreme model for predicting the development of cyber epidemics at early stages\",\"authors\":\"\",\"doi\":\"10.1016/j.csbj.2024.08.017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The approaches used in biomedicine to analyze epidemics take into account features such as exponential growth in the early stages, slowdown in dynamics upon saturation, time delays in spread, segmented spread, evolutionary adaptations of the pathogen, and preventive measures based on universal communication protocols. All these characteristics are also present in modern cyber epidemics. Therefore, adapting effective biomedical approaches to epidemic analysis for the investigation of the development of cyber epidemics is a promising scientific research task. The article is dedicated to researching the problem of predicting the development of cyber epidemics at early stages. In such conditions, the available data is scarce, incomplete, and distorted. This situation makes it impossible to use artificial intelligence models for prediction. Therefore, the authors propose an entropy-extreme model, defined within the machine learning paradigm, to address this problem. The model is based on estimating the probability distributions of its controllable parameters from input data, taking into account the variability characteristic of the last ones. The entropy-extreme instance, identified from a set of such distributions, indicates the most uncertain (most negative) trajectory of the investigated process. Numerical methods are used to analyze the generated set of investigated process development trajectories based on the assessments of probability distributions of the controllable parameters and the variability characteristic. The result of the analysis includes characteristic predictive trajectories such as the average and median trajectories from the set, as well as the trajectory corresponding to the standard deviation area of the parameters’ values. Experiments with real data on the infection of Windows-operated devices by various categories of malware showed that the proposed model outperforms the classical competitor (least squares method) in predicting the development of cyber epidemics near the extremum of the time series representing the deployment of such a process over time. Moreover, the proposed model can be applied without any prior hypotheses regarding the probabilistic properties of the available data.</p></div>\",\"PeriodicalId\":10715,\"journal\":{\"name\":\"Computational and structural biotechnology journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2001037024002770/pdfft?md5=93adf58e38f4237644ddd0c1ca45aefd&pid=1-s2.0-S2001037024002770-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and structural biotechnology journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2001037024002770\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2001037024002770","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
生物医学中用于分析流行病的方法考虑到了一些特征,如早期阶段的指数增长、饱和后的动态放缓、传播的时间延迟、分段传播、病原体的进化适应以及基于通用通信协议的预防措施。所有这些特点在现代网络流行病中也同样存在。因此,将有效的生物医学方法应用于流行病分析,以研究网络流行病的发展是一项大有可为的科学研究任务。本文致力于研究在早期阶段预测网络流行病发展的问题。在这种情况下,可用数据稀缺、不完整且失真。这种情况导致无法使用人工智能模型进行预测。因此,作者提出了一种在机器学习范式中定义的熵极端模型来解决这一问题。该模型基于从输入数据中估计其可控参数的概率分布,同时考虑到最后参数的可变性特征。从一组此类分布中识别出的熵极实例,表示所研究过程中最不确定(最负面)的轨迹。根据对可控参数概率分布和可变性特征的评估,使用数值方法对生成的一组调查流程发展轨迹进行分析。分析结果包括特征预测轨迹,如来自该集合的平均轨迹和中位轨迹,以及与参数值标准偏差区域相对应的轨迹。利用各类恶意软件感染 Windows 操作设备的真实数据进行的实验表明,在预测网络流行病的发展方面,所提出的模型优于传统的竞争模型(最小二乘法),后者的预测结果接近时间序列的极值,代表了随着时间推移这种过程的部署情况。此外,建议的模型可以在不对可用数据的概率属性进行任何先验假设的情况下应用。
Entropy-extreme model for predicting the development of cyber epidemics at early stages
The approaches used in biomedicine to analyze epidemics take into account features such as exponential growth in the early stages, slowdown in dynamics upon saturation, time delays in spread, segmented spread, evolutionary adaptations of the pathogen, and preventive measures based on universal communication protocols. All these characteristics are also present in modern cyber epidemics. Therefore, adapting effective biomedical approaches to epidemic analysis for the investigation of the development of cyber epidemics is a promising scientific research task. The article is dedicated to researching the problem of predicting the development of cyber epidemics at early stages. In such conditions, the available data is scarce, incomplete, and distorted. This situation makes it impossible to use artificial intelligence models for prediction. Therefore, the authors propose an entropy-extreme model, defined within the machine learning paradigm, to address this problem. The model is based on estimating the probability distributions of its controllable parameters from input data, taking into account the variability characteristic of the last ones. The entropy-extreme instance, identified from a set of such distributions, indicates the most uncertain (most negative) trajectory of the investigated process. Numerical methods are used to analyze the generated set of investigated process development trajectories based on the assessments of probability distributions of the controllable parameters and the variability characteristic. The result of the analysis includes characteristic predictive trajectories such as the average and median trajectories from the set, as well as the trajectory corresponding to the standard deviation area of the parameters’ values. Experiments with real data on the infection of Windows-operated devices by various categories of malware showed that the proposed model outperforms the classical competitor (least squares method) in predicting the development of cyber epidemics near the extremum of the time series representing the deployment of such a process over time. Moreover, the proposed model can be applied without any prior hypotheses regarding the probabilistic properties of the available data.
期刊介绍:
Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to:
Structure and function of proteins, nucleic acids and other macromolecules
Structure and function of multi-component complexes
Protein folding, processing and degradation
Enzymology
Computational and structural studies of plant systems
Microbial Informatics
Genomics
Proteomics
Metabolomics
Algorithms and Hypothesis in Bioinformatics
Mathematical and Theoretical Biology
Computational Chemistry and Drug Discovery
Microscopy and Molecular Imaging
Nanotechnology
Systems and Synthetic Biology