Big data analytics最新文献

筛选
英文 中文
Generalized Unit Half-Logistic Geometric Distribution: Properties and Regression with Applications to Insurance 广义单位半logistic几何分布:性质与回归及其在保险中的应用
Big data analytics Pub Date : 2023-05-16 DOI: 10.3390/analytics2020025
Suleman Nasiru, C. Chesneau, A. Abubakari, I. Angbing
{"title":"Generalized Unit Half-Logistic Geometric Distribution: Properties and Regression with Applications to Insurance","authors":"Suleman Nasiru, C. Chesneau, A. Abubakari, I. Angbing","doi":"10.3390/analytics2020025","DOIUrl":"https://doi.org/10.3390/analytics2020025","url":null,"abstract":"The use of distributions to model and quantify risk is essential in risk assessment and management. In this study, the generalized unit half-logistic geometric (GUHLG) distribution is developed to model bounded insurance data on the unit interval. The corresponding probability density function plots indicate that the related distribution can handle data that exhibit left-skewed, right-skewed, symmetric, reversed-J, and bathtub shapes. The hazard rate function also suggests that the distribution can be applied to analyze data with bathtubs, N-shapes, and increasing failure rates. Subsequently, the inferential aspects of the proposed model are investigated. In particular, Monte Carlo simulation exercises are carried out to examine the performance of the estimation method by using an algorithm to generate random observations from the quantile function. The results of the simulation suggest that the considered estimation method is efficient. The univariate application of the distribution and the multivariate application of the associated regression using risk survey data reveal that the model provides a better fit than the other existing distributions and regression models. Under the multivariate application, we estimate the parameters of the regression model using both maximum likelihood and Bayesian estimations. The estimates of the parameters for the two methods are very close. Diagnostic plots of the Bayesian method using the trace, ergodic, and autocorrelation plots reveal that the chains converge to a stationary distribution.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90283317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Clustering Matrix Variate Longitudinal Count Data 聚类矩阵变量纵向计数数据
Big data analytics Pub Date : 2023-05-05 DOI: 10.3390/analytics2020024
Sanjeena Subedi
{"title":"Clustering Matrix Variate Longitudinal Count Data","authors":"Sanjeena Subedi","doi":"10.3390/analytics2020024","DOIUrl":"https://doi.org/10.3390/analytics2020024","url":null,"abstract":"Matrix variate longitudinal discrete data can arise in transcriptomics studies when the data are collected for N genes at r conditions over t time points, and thus, each observation Yn for n=1,…,N can be written as an r×t matrix. When dealing with such data, the number of parameters in the model can be greatly reduced by considering the matrix variate structure. The components of the covariance matrix then also provide a meaningful interpretation. In this work, a mixture of matrix variate Poisson-log normal distributions is introduced for clustering longitudinal read counts from RNA-seq studies. To account for the longitudinal nature of the data, a modified Cholesky-decomposition is utilized for a component of the covariance structure. Furthermore, a parsimonious family of models is developed by imposing constraints on elements of these decompositions. The models are applied to both real and simulated data, and it is demonstrated that the proposed approach can recover the underlying cluster structure.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84936430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Wavelet Support Vector Censored Regression 小波支持向量删节回归
Big data analytics Pub Date : 2023-05-04 DOI: 10.3390/analytics2020023
M. Maia, J. S. Pimentel, R. Ospina, Anderson Ara
{"title":"Wavelet Support Vector Censored Regression","authors":"M. Maia, J. S. Pimentel, R. Ospina, Anderson Ara","doi":"10.3390/analytics2020023","DOIUrl":"https://doi.org/10.3390/analytics2020023","url":null,"abstract":"Learning methods in survival analysis have the ability to handle censored observations. The Cox model is a predictive prevalent statistical technique for survival analysis, but its use rests on the strong assumption of hazard proportionality, which can be challenging to verify, particularly when working with non-linearity and high-dimensional data. Therefore, it may be necessary to consider a more flexible and generalizable approach, such as support vector machines. This paper aims to propose a new method, namely wavelet support vector censored regression, and compare the Cox model with traditional support vector regression and traditional support vector regression for censored data models, survival models based on support vector machines. In addition, to evaluate the effectiveness of different kernel functions in the support vector censored regression approach to survival data, we conducted a series of simulations with varying number of observations and ratios of censored data. Based on the simulation results, we found that the wavelet support vector censored regression outperformed the other methods in terms of the C-index. The evaluation was performed on simulations, survival benchmarking datasets and in a biomedical real application.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79284884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building Neural Machine Translation Systems for Multilingual Participatory Spaces 构建多语言参与空间的神经机器翻译系统
Big data analytics Pub Date : 2023-05-01 DOI: 10.3390/analytics2020022
P. Lohar, G. Xie, Daniel Gallagher, Andy Way
{"title":"Building Neural Machine Translation Systems for Multilingual Participatory Spaces","authors":"P. Lohar, G. Xie, Daniel Gallagher, Andy Way","doi":"10.3390/analytics2020022","DOIUrl":"https://doi.org/10.3390/analytics2020022","url":null,"abstract":"This work presents the development of the translation component in a multistage, multilevel, multimode, multilingual and dynamic deliberative (M4D2) system, built to facilitate automated moderation and translation in the languages of five European countries: Italy, Ireland, Germany, France and Poland. Two main topics were to be addressed in the deliberation process: (i) the environment and climate change; and (ii) the economy and inequality. In this work, we describe the development of neural machine translation (NMT) models for these domains for six European languages: Italian, English (included as the second official language of Ireland), Irish, German, French and Polish. As a result, we generate 30 NMT models, initially baseline systems built using freely available online data, which are then adapted to the domains of interest in the project by (i) filtering the corpora, (ii) tuning the systems with automatically extracted in-domain development datasets and (iii) using corpus concatenation techniques to expand the amount of data available. We compare our results produced by the domain-adapted systems with those produced by Google Translate, and demonstrate that fast, high-quality systems can be produced that facilitate multilingual deliberation in a secure environment.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76044300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating Online Art Search through Quantitative Behavioral Data and Machine Learning Techniques 通过定量行为数据和机器学习技术调查在线艺术搜索
Big data analytics Pub Date : 2023-04-26 DOI: 10.3390/analytics2020021
Minas Pergantis, Alexandros Kouretsis, Andreas Giannakoulopoulos
{"title":"Investigating Online Art Search through Quantitative Behavioral Data and Machine Learning Techniques","authors":"Minas Pergantis, Alexandros Kouretsis, Andreas Giannakoulopoulos","doi":"10.3390/analytics2020021","DOIUrl":"https://doi.org/10.3390/analytics2020021","url":null,"abstract":"Studying searcher behavior has been a cornerstone of search engine research for decades, since it can lead to a better understanding of user needs and allow for an improved user experience. Going beyond descriptive data analysis and statistics, studies have been utilizing the capabilities of Machine Learning to further investigate how users behave during general purpose searching. But the thematic content of a search greatly affects many aspects of user behavior, which often deviates from general purpose search behavior. Thus, in this study, emphasis is placed specifically on the fields of Art and Cultural Heritage. Insights derived from behavioral data can help Culture and Art institutions streamline their online presence and allow them to better understand their user base. Existing research in this field often focuses on lab studies and explicit user feedback, but this study takes advantage of real usage quantitative data and its analysis through machine learning. Using data collected by real world usage of the Art Boulevard proprietary search engine for content related to Art and Culture and through the means of Machine Learning-powered tools and methodologies, this article investigates the peculiarities of Art-related online searches. Through clustering, various archetypes of Art search sessions were identified, thus providing insight on the variety of ways in which users interacted with the search engine. Additionally, using extreme Gradient boosting, the metrics that were more likely to predict the success of a search session were documented, underlining the importance of various aspects of user activity for search success. Finally, through applying topic modeling on the textual information of user-clicked results, the thematic elements that dominated user interest were investigated, providing an overview of prevalent themes in the fields of Art and Culture. It was established that preferred results revolved mostly around traditional visual Art themes, while academic and historical topics also had a strong presence.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77238024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The AI Learns to Lie to Please You: Preventing Biased Feedback Loops in Machine-Assisted Intelligence Analysis 人工智能学会撒谎以取悦你:防止机器辅助智能分析中的偏见反馈循环
Big data analytics Pub Date : 2023-04-18 DOI: 10.3390/analytics2020020
J. Stray
{"title":"The AI Learns to Lie to Please You: Preventing Biased Feedback Loops in Machine-Assisted Intelligence Analysis","authors":"J. Stray","doi":"10.3390/analytics2020020","DOIUrl":"https://doi.org/10.3390/analytics2020020","url":null,"abstract":"Researchers are starting to design AI-powered systems to automatically select and summarize the reports most relevant to each analyst, which raises the issue of bias in the information presented. This article focuses on the selection of relevant reports without an explicit query, a task known as recommendation. Drawing on previous work documenting the existence of human-machine feedback loops in recommender systems, this article reviews potential biases and mitigations in the context of intelligence analysis. Such loops can arise when behavioral “engagement” signals such as clicks or user ratings are used to infer the value of displayed information. Even worse, there can be feedback loops in the collection of intelligence information because users may also be responsible for tasking collection. Avoiding misalignment feedback loops requires an alternate, ongoing, non-engagement signal of information quality. Existing evaluation scales for intelligence product quality and rigor, such as the IC Rating Scale, could provide ground-truth feedback. This sparse data can be used in two ways: for human supervision of average performance and to build models that predict human survey ratings for use at recommendation time. Both techniques are widely used today by social media platforms. Open problems include the design of an ideal human evaluation method, the cost of skilled human labor, and the sparsity of the resulting data.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88816724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Data Stream Analytics 数据流分析
Big data analytics Pub Date : 2023-04-14 DOI: 10.3390/analytics2020019
J. Aguilar-Ruiz, A. Bifet, João Gama
{"title":"Data Stream Analytics","authors":"J. Aguilar-Ruiz, A. Bifet, João Gama","doi":"10.3390/analytics2020019","DOIUrl":"https://doi.org/10.3390/analytics2020019","url":null,"abstract":"The human brain works in such a complex way that we have not yet managed to decipher its functional mysteries [...]","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87581087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of a Dynamically Adaptable Routing System for Data Analytics Insights in Logistic Services 物流服务中数据分析见解的动态适应性路由系统的开发
Big data analytics Pub Date : 2023-04-13 DOI: 10.3390/analytics2020018
Vasileios Tsoukas, Eleni Boumpa, Vasileios Chioktour, M. Kalafati, G. Spathoulas, Athanasios Kakarountas
{"title":"Development of a Dynamically Adaptable Routing System for Data Analytics Insights in Logistic Services","authors":"Vasileios Tsoukas, Eleni Boumpa, Vasileios Chioktour, M. Kalafati, G. Spathoulas, Athanasios Kakarountas","doi":"10.3390/analytics2020018","DOIUrl":"https://doi.org/10.3390/analytics2020018","url":null,"abstract":"This work proposes an effective solution to the Vehicle Routing Problem, taking into account all phases of the delivery process. When compared to real-world data, the findings are encouraging and demonstrate the value of Machine Learning algorithms incorporated into the process. Several algorithms were combined along with a modified Hopfield network to deliver the optimal solution to a multiobjective issue on a platform capable of monitoring the various phases of the process. Additionally, a system providing viable insights and analytics in regard to the orders was developed. The results reveal a maximum distance saving of 25% and a maximum overall delivery time saving of 14%.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87990177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Metric Ensembles Aid in Explainability: A Case Study with Wikipedia Data 度量集成有助于可解释性:维基百科数据的案例研究
Big data analytics Pub Date : 2023-04-07 DOI: 10.3390/analytics2020017
Grant Forbes, R. J. Crouser
{"title":"Metric Ensembles Aid in Explainability: A Case Study with Wikipedia Data","authors":"Grant Forbes, R. J. Crouser","doi":"10.3390/analytics2020017","DOIUrl":"https://doi.org/10.3390/analytics2020017","url":null,"abstract":"In recent years, as machine learning models have become larger and more complex, it has become both more difficult and more important to be able to explain and interpret the results of those models, both to prevent model errors and to inspire confidence for end users of the model. As such, there has been a significant and growing interest in explainability in recent years as a highly desirable trait for a model to have. Similarly, there has been much recent attention on ensemble methods, which aim to aggregate results from multiple (often simple) models or metrics in order to outperform models that optimize for only a single metric. We argue that this latter issue can actually assist with the former: a model that optimizes for several metrics has some base level of explainability baked into the model, and this explainability can be leveraged not only for user confidence but to fine-tune the weights between the metrics themselves in an intuitive way. We demonstrate a case study of such a benefit, in which we obtain clear, explainable results based on an aggregate of five simple metrics of relevance, using Wikipedia data as a proxy for some large text-based recommendation problem. We demonstrate that not only can these metrics’ simplicity and multiplicity be leveraged for explainability, but in fact, that very explainability can lead to an intuitive fine-tuning process that improves the model itself.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"119 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74446611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Readability Indices Do Not Say It All on a Text Readability 可读性索引并不能说明文本可读性的全部
Big data analytics Pub Date : 2023-03-30 DOI: 10.3390/analytics2020016
E. Matricciani
{"title":"Readability Indices Do Not Say It All on a Text Readability","authors":"E. Matricciani","doi":"10.3390/analytics2020016","DOIUrl":"https://doi.org/10.3390/analytics2020016","url":null,"abstract":"We propose a universal readability index, GU, applicable to any alphabetical language and related to cognitive psychology, the theory of communication, phonics and linguistics. This index also considers readers’ short-term-memory processing capacity, here modeled by the word interval IP, namely, the number of words between two interpunctions. Any current readability formula does not consider Ip, but scatterplots of Ip versus a readability index show that texts with the same readability index can have very different Ip, ranging from 4 to 9, practically Miller’s range, which refers to 95% of readers. It is unlikely that IP has no impact on reading difficulty. The examples shown are taken from Italian and English Literatures, and from the translations of The New Testament in Latin and in contemporary languages. We also propose an extremely compact formula, relating the capacity of human short-term memory to the difficulty of reading a text. It should synthetically model human reading difficulty, a kind of “footprint” of humans. However, further experimental and multidisciplinary work is necessary to confirm our conjecture about the dependence of a readability index on a reader’s short-term-memory capacity.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"96 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84799884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信