Big data analytics最新文献

筛选
英文 中文
Cyberpsychology: A Longitudinal Analysis of Cyber Adversarial Tactics and Techniques 网络心理学:网络对抗战术和技术的纵向分析
Big data analytics Pub Date : 2023-08-11 DOI: 10.3390/analytics2030035
Marshall S. Rich
{"title":"Cyberpsychology: A Longitudinal Analysis of Cyber Adversarial Tactics and Techniques","authors":"Marshall S. Rich","doi":"10.3390/analytics2030035","DOIUrl":"https://doi.org/10.3390/analytics2030035","url":null,"abstract":"The rapid proliferation of cyberthreats necessitates a robust understanding of their evolution and associated tactics, as found in this study. A longitudinal analysis of these threats was conducted, utilizing a six-year data set obtained from a deception network, which emphasized its significance in the study’s primary aim: the exhaustive exploration of the tactics and strategies utilized by cybercriminals and how these tactics and techniques evolved in sophistication and target specificity over time. Different cyberattack instances were dissected and interpreted, with the patterns behind target selection shown. The focus was on unveiling patterns behind target selection and highlighting recurring techniques and emerging trends. The study’s methodological design incorporated data preprocessing, exploratory data analysis, clustering and anomaly detection, temporal analysis, and cross-referencing. The validation process underscored the reliability and robustness of the findings, providing evidence of increasingly sophisticated, targeted cyberattacks. The work identified three distinct network traffic behavior clusters and temporal attack patterns. A validated scoring mechanism provided a benchmark for network anomalies, applicable for predictive analysis and facilitating comparative study of network behaviors. This benchmarking aids organizations in proactively identifying and responding to potential threats. The study significantly contributed to the cybersecurity discourse, offering insights that could guide the development of more effective defense strategies. The need for further investigation into the nature of detected anomalies was acknowledged, advocating for continuous research and proactive defense strategies in the face of the constantly evolving landscape of cyberthreats.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87463876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Stroke Disease with Demographic and Behavioural Data Using Random Forest Algorithm 基于随机森林算法的人口统计学和行为学数据预测中风疾病
Big data analytics Pub Date : 2023-08-02 DOI: 10.3390/analytics2030034
O. Shobayo, Oluwafemi Zachariah, M. Odusami, Bayode Ogunleye
{"title":"Prediction of Stroke Disease with Demographic and Behavioural Data Using Random Forest Algorithm","authors":"O. Shobayo, Oluwafemi Zachariah, M. Odusami, Bayode Ogunleye","doi":"10.3390/analytics2030034","DOIUrl":"https://doi.org/10.3390/analytics2030034","url":null,"abstract":"Stroke is a major cause of death worldwide, resulting from a blockage in the flow of blood to different parts of the brain. Many studies have proposed a stroke disease prediction model using medical features applied to deep learning (DL) algorithms to reduce its occurrence. However, these studies pay less attention to the predictors (both demographic and behavioural). Our study considers interpretability, robustness, and generalisation as key themes for deploying algorithms in the medical domain. Based on this background, we propose the use of random forest for stroke incidence prediction. Results from our experiment showed that random forest (RF) outperformed decision tree (DT) and logistic regression (LR) with a macro F1 score of 94%. Our findings indicated age and body mass index (BMI) as the most significant predictors of stroke disease incidence.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73794205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of Patterns in the Stock Market through Unsupervised Algorithms 通过无监督算法识别股票市场的模式
Big data analytics Pub Date : 2023-07-27 DOI: 10.3390/analytics2030033
Adrian Barradas, R. Cantón-Croda, D. Gibaja-Romero
{"title":"Identification of Patterns in the Stock Market through Unsupervised Algorithms","authors":"Adrian Barradas, R. Cantón-Croda, D. Gibaja-Romero","doi":"10.3390/analytics2030033","DOIUrl":"https://doi.org/10.3390/analytics2030033","url":null,"abstract":"Making predictions in the stock market is a challenging task. At the same time, several studies have focused on forecasting the future behavior of the market and classifying financial assets. A different approach is to classify correlated data to discover patterns and atypical behaviors in them. In this study, we propose applying unsupervised algorithms to process, model, and cluster related data from two different data sources, i.e., Google News and Yahoo Finance, to identify conditions in the stock market that might help to support the investment decision-making process. We applied principal component analysis (PCA) and a k-means clustering approach to group data according to their principal characteristics. We identified four conditions in the stock market, one comprising the least amount of data, characterized by high volatility. The main results show that, regularly, the stock market tends to have a steady performance. However, atypical conditions are conducive to higher volatility.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80952271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Streamflow Estimation through Coupling of Hieararchical Clustering Analysis and Regression Analysis—A Case Study in Euphrates-Tigris Basin 基于层次聚类分析和回归分析的河流流量估算——以幼发拉底河流域为例
Big data analytics Pub Date : 2023-07-13 DOI: 10.3390/analytics2030032
Goksel Ezgi Guzey, Bihrat Onoz
{"title":"Streamflow Estimation through Coupling of Hieararchical Clustering Analysis and Regression Analysis—A Case Study in Euphrates-Tigris Basin","authors":"Goksel Ezgi Guzey, Bihrat Onoz","doi":"10.3390/analytics2030032","DOIUrl":"https://doi.org/10.3390/analytics2030032","url":null,"abstract":"In this study, the resilience of designed water systems in the face of limited streamflow gauging stations and escalating global warming impacts were investigated. By performing a regression analysis, simulated meteorological data with observed streamflow from 1971 to 2020 across 33 stream gauging stations in the Euphrates-Tigris Basin were correlated. Utilizing the Ordinary Least Squares regression method, streamflow for 2020–2100 using simulated meteorological data under RCP 4.5 and RCP 8.5 scenarios in CORDEX-EURO and CORDEX-MENA domains were also predicted. Streamflow variability was calculated based on meteorological variables and station morphological characteristics, particularly evapotranspiration. Hierarchical clustering analysis identified two clusters among the stream gauging stations, and for each cluster, two streamflow equations were derived. The regression analysis achieved robust streamflow predictions using six representative climate variables, with adj. R2 values of 0.7–0.85 across all models, primarily influenced by evapotranspiration. The use of a global model led to a 10% decrease in prediction capabilities for all CORDEX models based on R2 performance. This study emphasizes the importance of region homogeneity in estimating streamflow, encompassing both geographical and hydro-meteorological characteristics.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"81 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80978870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Model-Based Deep Reinforcement Learning for Single-Asset Trading 基于层次模型的单资产交易深度强化学习
Big data analytics Pub Date : 2023-07-11 DOI: 10.3390/analytics2030031
Adrian Millea
{"title":"Hierarchical Model-Based Deep Reinforcement Learning for Single-Asset Trading","authors":"Adrian Millea","doi":"10.3390/analytics2030031","DOIUrl":"https://doi.org/10.3390/analytics2030031","url":null,"abstract":"We present a hierarchical reinforcement learning (RL) architecture that employs various low-level agents to act in the trading environment, i.e., the market. The highest-level agent selects from among a group of specialized agents, and then the selected agent decides when to sell or buy a single asset for a period of time. This period can be variable according to a termination function. We hypothesized that, due to different market regimes, more than one single agent is needed when trying to learn from such heterogeneous data, and instead, multiple agents will perform better, with each one specializing in a subset of the data. We use k-meansclustering to partition the data and train each agent with a different cluster. Partitioning the input data also helps model-based RL (MBRL), where models can be heterogeneous. We also add two simple decision-making models to the set of low-level agents, diversifying the pool of available agents, and thus increasing overall behavioral flexibility. We perform multiple experiments showing the strengths of a hierarchical approach and test various prediction models at both levels. We also use a risk-based reward at the high level, which transforms the overall problem into a risk-return optimization. This type of reward shows a significant reduction in risk while minimally reducing profits. Overall, the hierarchical approach shows significant promise, especially when the pool of low-level agents is highly diverse. The usefulness of such a system is clear, especially for human-devised strategies, which could be incorporated in a sound manner into larger, powerful automatic systems.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86190977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
occams: A Text Summarization Package occams:文本摘要包
Big data analytics Pub Date : 2023-06-30 DOI: 10.3390/analytics2030030
Clinton T. White, Neil P. Molino, Julia S. Yang, John M. Conroy
{"title":"occams: A Text Summarization Package","authors":"Clinton T. White, Neil P. Molino, Julia S. Yang, John M. Conroy","doi":"10.3390/analytics2030030","DOIUrl":"https://doi.org/10.3390/analytics2030030","url":null,"abstract":"Extractive text summarization selects asmall subset of sentences from a document, which gives good “coverage” of a document. When given a set of term weights indicating the importance of the terms, the concept of coverage may be formalized into a combinatorial optimization problem known as the budgeted maximum coverage problem. Extractive methods in this class are known to beamong the best of classic extractive summarization systems. This paper gives a synopsis of thesoftware package occams, which is a multilingual extractive single and multi-document summarization package based on an algorithm giving an optimal approximation to the budgeted maximum coverage problem. The occams package is written in Python and provides an easy-to-use modular interface, allowing it to work in conjunction with popular Python NLP packages, such as nltk, stanza or spacy.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79443044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian Mixture Copula Estimation and Selection with Applications 贝叶斯混合Copula估计与选择及其应用
Big data analytics Pub Date : 2023-06-15 DOI: 10.3390/analytics2020029
Yujian Liu, Dejun Xie, Siyi Yu
{"title":"Bayesian Mixture Copula Estimation and Selection with Applications","authors":"Yujian Liu, Dejun Xie, Siyi Yu","doi":"10.3390/analytics2020029","DOIUrl":"https://doi.org/10.3390/analytics2020029","url":null,"abstract":"Mixture copulas are popular and essential tools for studying complex dependencies among variables. However, selecting the correct mixture models often involves repeated testing and estimations using criteria such as AIC, which could require effort and time. In this paper, we propose a method that would enable us to select and estimate the correct mixture copulas simultaneously. This is accomplished by first overfitting the model and then conducting the Bayesian estimations. We verify the correctness of our approach by numerical simulations. Finally, the real data analysis is performed by studying the dependencies among three major financial markets.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"129 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77407683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Preliminary Perspectives on Information Passing in the Intelligence Community 情报界信息传递的初步展望
Big data analytics Pub Date : 2023-06-15 DOI: 10.3390/analytics2020028
Jeremy E. Block, Ilana Bookner, S. Chu, R. J. Crouser, Donald R. Honeycutt, Rebecca M. Jonas, Abhishek Kulkarni, Yancy Vance M. Paredes, E. Ragan
{"title":"Preliminary Perspectives on Information Passing in the Intelligence Community","authors":"Jeremy E. Block, Ilana Bookner, S. Chu, R. J. Crouser, Donald R. Honeycutt, Rebecca M. Jonas, Abhishek Kulkarni, Yancy Vance M. Paredes, E. Ragan","doi":"10.3390/analytics2020028","DOIUrl":"https://doi.org/10.3390/analytics2020028","url":null,"abstract":"Analyst sensemaking research typically focuses on individual or small groups conducting intelligence tasks. This has helped understand information retrieval tasks and how people communicate information. As a part of the grand challenge of the Summer Conference on Applied Data Science (SCADS) to build a system that can generate tailored daily reports (TLDR) for intelligence analysts, we conducted a qualitative interview study with analysts to increase understanding of information passing in the intelligence community. While our results are preliminary, we expect that this work will contribute to a better understanding of the information ecosystem of the intelligence community, how institutional dynamics affect information passing, and what implications this has for a TLDR system. This work describes our involvement in and work completed during SCADS. Although preliminary, we identify that information passing is both a formal and informal process and often follows professional networks due especially to the small population and specialization of work. We call attention to the need for future analysis of information ecosystems to better support tailored information retrieval features.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81962561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatiotemporal Data Mining Problems and Methods 时空数据挖掘问题与方法
Big data analytics Pub Date : 2023-06-14 DOI: 10.3390/analytics2020027
Eleftheria Koutsaki, George Vardakis, N. Papadakis
{"title":"Spatiotemporal Data Mining Problems and Methods","authors":"Eleftheria Koutsaki, George Vardakis, N. Papadakis","doi":"10.3390/analytics2020027","DOIUrl":"https://doi.org/10.3390/analytics2020027","url":null,"abstract":"Many scientific fields show great interest in the extraction and processing of spatiotemporal data, such as medicine with an emphasis on epidemiology and neurology, geology, social sciences, meteorology, and a great interest is also observed in the study of transport. Spatiotemporal data differ significantly from spatial data, since spatiotemporal data refer to measurements, which take into account both the place and the time in which they are received, with their respective characteristics, while spatial data refer to and describe information related only to place. The innovation brought about by spatiotemporal data mining has caused a revolution in many scientific fields, and this is because through it we can now provide solutions and answers to complex problems, as well as provide useful and valuable predictions, through predictive learning. However, combining time and place in data mining presents significant challenges and difficulties that must be overcome. Spatiotemporal data mining and analysis is a relatively new approach to data mining which has been studied more systematically in the last decade. The purpose of this article is to provide a good introduction to spatiotemporal data, and through this detailed description, we attempt to introduce descriptive logic and gain a complete knowledge of these data. We aim to introduce a new way of describing them, aiming for future studies, by combining the expressions that arise by type of data, using descriptive logic, with new expressions, that can be derived, to describe future states of objects and environments with great precision, providing accurate predictions. In order to highlight the value of spatiotemporal data, we proceed to give a brief description of ST data in the introduction. We describe the relevant work carried out to date, the types of spatiotemporal (ST) data, their properties and the transformations that can be made between them, attempting, to a small extent, to introduce constraints and rules using descriptive logic, introducing descriptive logic into spatiotemporal data by type, when initially presenting the ST data. The data snapshots by species and similarities between the cases are then described. We describe methods, introducing clustering, dynamic ST clusters, predictive learning, pattern mining frequency, and pattern emergence, and problems such as anomaly detection, identifying time points of changes in the behavior of the observed object, and development of relationships between them. We describe the application of ST data in various fields today, as well as the future work. We finally conclude with our conclusions, with the representation and study of spatiotemporal data can, in combination with other properties which accompany all natural phenomena, through their appropriate processing, lead to safe conclusions regarding the study of problems, and also with great precision in the extraction of predictions by accurately determining future states of an environmen","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91169182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Zero-Truncated Katz Distribution by the Lagrange Expansion of the Second Kind with Associated Inferences 用带关联推论的第二类拉格朗日展开的一种新的零截尾Katz分布
Big data analytics Pub Date : 2023-06-01 DOI: 10.3390/analytics2020026
D. S. Shibu, C. Chesneau, M. Monisha, R. Maya, M. Irshad
{"title":"A Novel Zero-Truncated Katz Distribution by the Lagrange Expansion of the Second Kind with Associated Inferences","authors":"D. S. Shibu, C. Chesneau, M. Monisha, R. Maya, M. Irshad","doi":"10.3390/analytics2020026","DOIUrl":"https://doi.org/10.3390/analytics2020026","url":null,"abstract":"In this article, the Lagrange expansion of the second kind is used to generate a novel zero-truncated Katz distribution; we refer to it as the Lagrangian zero-truncated Katz distribution (LZTKD). Notably, the zero-truncated Katz distribution is a special case of this distribution. Along with the closed form expression of all its statistical characteristics, the LZTKD is proven to provide an adequate model for both underdispersed and overdispersed zero-truncated count datasets. Specifically, we show that the associated hazard rate function has increasing, decreasing, bathtub, or upside-down bathtub shapes. Moreover, we demonstrate that the LZTKD belongs to the Lagrangian distribution of the first kind. Then, applications of the LZTKD in statistical scenarios are explored. The unknown parameters are estimated using the well-reputed method of the maximum likelihood. In addition, the generalized likelihood ratio test procedure is applied to test the significance of the additional parameter. In order to evaluate the performance of the maximum likelihood estimates, simulation studies are also conducted. The use of real-life datasets further highlights the relevance and applicability of the proposed model.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89251864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信