R. Laher, F. Masci, L. Rebull, S. Schurr, Wendy Burt, A. Laity, M. Swain, D. Shupe, S. Groom, B. Rusholme, M. Kong, J. Good, V. Gorjian, R. Akeson, B. Fulton, D. Ciardi, S. Carey
{"title":"Upgraded Thoth: Software for Data Visualization and Statistics","authors":"R. Laher, F. Masci, L. Rebull, S. Schurr, Wendy Burt, A. Laity, M. Swain, D. Shupe, S. Groom, B. Rusholme, M. Kong, J. Good, V. Gorjian, R. Akeson, B. Fulton, D. Ciardi, S. Carey","doi":"10.3390/analytics2010015","DOIUrl":"https://doi.org/10.3390/analytics2010015","url":null,"abstract":"Thoth is a free desktop/laptop software application with a friendly graphical user interface that facilitates routine data-visualization and statistical-calculation tasks for astronomy and astrophysical research (and other fields where numbers are visualized). This software has been upgraded with many significant improvements and new capabilities. The major upgrades consist of: (1) six new graph types, including 3D stacked-bar charts and 3D surface plots, made by the Orson 3D Charts library; (2) new saving and loading of graph settings; (3) a new batch-mode or command-line operation; (4) new graph-data annotation functions; (5) new options for data-file importation; and (6) a new built-in FITS-image viewer. There is now the requirement that Thoth be run under Java 1.8 or higher. Many other miscellaneous minor upgrades and bug fixes have also been made to Thoth. The newly implemented plotting options generally make possible graph construction and reuse with relative ease, without resorting to writing computer code. The illustrative astronomy case study of this paper demonstrates one of the many ways the software can be utilized. These new software features and refinements help make astronomers more efficient in their work of elucidating data.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81525652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of Mixture Models for Doubly Inflated Count Data","authors":"Monika Arora, N. Chaganty","doi":"10.3390/analytics2010014","DOIUrl":"https://doi.org/10.3390/analytics2010014","url":null,"abstract":"In health and social science and other fields where count data analysis is important, zero-inflated models have been employed when the frequency of zero count is high (inflated). Due to multiple reasons, there are scenarios in which an additional count value of k > 0 occurs with high frequency. The zero- and k-inflated Poisson distribution model (ZkIP) is more appropriate for such situations. The ZkIP model is a mixture distribution with three components: degenerate distributions at 0 and k count and a Poisson distribution. In this article, we propose an alternative and computationally fast expectation–maximization (EM) algorithm to obtain the parameter estimates for grouped zero and k-inflated count data. The asymptotic standard errors are derived using the complete data approach. We compare the zero- and k-inflated Poisson model with its zero-inflated and non-inflated counterparts. The best model is selected based on commonly used criteria. The theoretical results are supplemented with the analysis of two real-life datasets from health sciences.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86155839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Voronoi-Based Semantically Balanced Dummy Generation Framework for Location Privacy","authors":"Aditya Tadakaluru, Xiao Qin","doi":"10.3390/analytics2010013","DOIUrl":"https://doi.org/10.3390/analytics2010013","url":null,"abstract":"Location-based services (LBS) require users to provide their current location for service delivery and customization. Location privacy protection addresses concerns associated with the potential mishandling of location information submitted to the LBS provider. Location accuracy has a direct impact on the quality of service (QoS), where higher location accuracy results in better QoS. In general, the main goal of any location privacy technique is to achieve maximum QoS while providing minimum or no location information if possible, and using dummy locations is one such location privacy technique. In this paper, we introduced a temporal constraint attack whereby an adversary can exploit the temporal constraints associated with the semantic category of locations to eliminate dummy locations and identify the true location. We demonstrated how an adversary can devise a temporal constraint attack to breach the location privacy of a residential location. We addressed this major limitation of the current dummy approaches with a novel Voronoi-based semantically balanced framework (VSBDG) capable of generating dummy locations that can withstand a temporal constraint attack. Built based on real-world geospatial datasets, the VSBDG framework leverages spatial relationships and operations. Our results show a high physical dispersion cosine similarity of 0.988 between the semantic categories even with larger location set sizes. This indicates a strong and scalable semantic balance for each semantic category within the VSBDG’s output location set. The VSBDG algorithm is capable of producing location sets with high average minimum dispersion distance values of 5861.894 m for residential locations and 6258.046 m for POI locations. The findings demonstrate that the locations within each semantic category are scattered farther apart, entailing optimized location privacy.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83719603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Survey of Distances between the Most Popular Distributions","authors":"M. Kelbert","doi":"10.3390/analytics2010012","DOIUrl":"https://doi.org/10.3390/analytics2010012","url":null,"abstract":"We present a number of upper and lower bounds for the total variation distances between the most popular probability distributions. In particular, some estimates of the total variation distances in the cases of multivariate Gaussian distributions, Poisson distributions, binomial distributions, between a binomial and a Poisson distribution, and also in the case of negative binomial distributions are given. Next, the estimations of Lévy–Prohorov distance in terms of Wasserstein metrics are discussed, and Fréchet, Wasserstein and Hellinger distances for multivariate Gaussian distributions are evaluated. Some novel context-sensitive distances are introduced and a number of bounds mimicking the classical results from the information theory are proved.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89064747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smart Multimedia Information Retrieval","authors":"Stefan Wagenpfeil","doi":"10.3390/analytics2010011","DOIUrl":"https://doi.org/10.3390/analytics2010011","url":null,"abstract":"The area of multimedia information retrieval (MMIR) faces two major challenges: the enormously growing number of multimedia objects (i.e., images, videos, audio, and text files), and the fast increasing level of detail of these objects (e.g., the number of pixels in images). Both challenges lead to a high demand of scalability, semantic representations, and explainability of MMIR processes. Smart MMIR solves these challenges by employing graph codes as an indexing structure, attaching semantic annotations for explainability, and employing application profiling for scaling, which results in human-understandable, expressive, and interoperable MMIR. The mathematical foundation, the modeling, implementation detail, and experimental results are shown in this paper, which confirm that Smart MMIR improves MMIR in the area of efficiency, effectiveness, and human understandability.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"154 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77800653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The SP Theory of Intelligence, and Its Realisation in the SP Computer Model, as a Foundation for the Development of Artificial General Intelligence","authors":"J. Wolff","doi":"10.3390/analytics2010010","DOIUrl":"https://doi.org/10.3390/analytics2010010","url":null,"abstract":"The theme of this paper is that the SP Theory of Intelligence (SPTI), and its realisation in the SP Computer Model, is a promising foundation for the development of artificial intelligence at the level of people or higher, also known as ‘artificial general intelligence’ (AGI). The SPTI, and alternatives to the SPTI, are considered and compared as potential foundations for the development of AGI. The alternatives include ‘Gato’ from DeepMind, ‘DALL·E 2’ from OpenAI, ‘Soar’ from Allen Newell, John Laird, and others, and ACT-R from John Anderson, Christian Lebiere, and others. A key principle in the SPTI and its development is the importance of information compression in human learning, perception, and cognition. Since there are many uncertainties between where we are now and, far into the future, anything that might qualify as an AGI, a multi-pronged attack on the problem is needed. The SPTI qualifies as the basis for one of those prongs. Although it will take time to achieve AGI, there is potential along the road for many useful benefits and applications of the research.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"145 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76772234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Skyline Computation with LSD Trees","authors":"D. Köppl","doi":"10.3390/analytics2010009","DOIUrl":"https://doi.org/10.3390/analytics2010009","url":null,"abstract":"Given a set of high-dimensional feature vectors S⊂Rn, the skyline or Pareto problem is to report the subset of vectors in S that are not dominated by any vector of S. Vectors closer to the origin are preferred: we say a vector x is dominated by another distinct vector y if x is equally or further away from the origin than y with respect to all its dimensions. The dynamic skyline problem allows us to shift the origin, which changes the answer set. This problem is crucial for dynamic recommender systems where users can shift the parameters and thus shift the origin. For each origin shift, a recomputation of the answer set from scratch is time intensive. To tackle this problem, we propose a parallel algorithm for dynamic skyline computation that uses multiple local split decision (LSD) trees concurrently. The geometric nature of the LSD trees allows us to reuse previous results. Experiments show that our proposed algorithm works well if the dimension is small in relation to the number of tuples to process.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72958217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Munshi Md. Shafwat Yazdan, Shah Saki, Raaghul Kumar
{"title":"Untangling Energy Consumption Dynamics with Renewable Energy Using Recurrent Neural Network","authors":"Munshi Md. Shafwat Yazdan, Shah Saki, Raaghul Kumar","doi":"10.3390/analytics2010008","DOIUrl":"https://doi.org/10.3390/analytics2010008","url":null,"abstract":"The environmental issues we are currently facing require long-term prospective efforts for sustainable growth. Renewable energy sources seem to be one of the most practical and efficient alternatives in this regard. Understanding a nation’s pattern of energy use and renewable energy production is crucial for developing strategic plans. No previous study has been performed to explore the dynamics of power consumption with the change in renewable energy production on a country-wide scale. In contrast, a number of deep learning algorithms have demonstrated acceptable performance while handling sequential data in the era of data-driven predictions. In this study, we developed a scheme to investigate and predict total power consumption and renewable energy production time series for eleven years of data using a recurrent neural network (RNN). The dynamics of the interaction between the total annual power consumption and renewable energy production were investigated through extensive exploratory data analysis (EDA) and a feature engineering framework. The performance of the model was found to be satisfactory through the comparison of the predicted data with the observed data, the visualization of the distribution of the errors and root mean squared error (RMSE), and the R2 values of 0.084 and 0.82. Higher performance was achieved by increasing the number of epochs and hyperparameter tuning. The proposed framework has the potential to be used and transferred to investigate the trend of renewable energy production and power consumption and predict future scenarios for different communities. The incorporation of a cloud-based platform into the proposed pipeline to perform predictive studies from data acquisition to outcome generation may lead to real-time forecasting.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84623505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tristan Lim, Tao Pan, C. Ong, Shuaiwei Chen, Jie Jun Jeremy Chia
{"title":"Theory-Guided Analytics Process: Using Theories to Underpin an Analytics Process for New Banking Product Development Using Segmentation-Based Marketing Analytics Leveraging on Marketing Intelligence","authors":"Tristan Lim, Tao Pan, C. Ong, Shuaiwei Chen, Jie Jun Jeremy Chia","doi":"10.3390/analytics2010007","DOIUrl":"https://doi.org/10.3390/analytics2010007","url":null,"abstract":"Retail banking is undergoing considerable product competitiveness and disruptions. New product development is necessary to tackle such challenges and reinvigorate product lines. This study presents an instrumental real-life banking case study, where marketing analytics was utilized to drive a product differentiation strategy. In particular, the study applied unsupervised machine learning techniques of link analysis, latent class analysis, and association analysis to undertake behavioral-based market segmentation, in view of attaining a profitable competitive advantage. To underpin the product development process with well grounded theoretical framing, this study asked the research question: “How may we establish a theory-driven approach for an analytics-driven process?” Findings of this study include a theoretical conceptual framework that underpinned the end-to-end segmentation-driven new product development process, backed by the empirical literature. The study hopes to provide: (i) for managerial practitioners, the use of case-based reasoning for practice-oriented new product development design, planning, and diagnosis efforts, and (ii) for researchers, the potentiality to test of the validity and robustness of an analytical-driven NPD process. The study also hopes to drive a wider research interest that studies the theory-driven approach for analytics-driven processes.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90448763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MAFFN_YOLOv5: Multi-Scale Attention Feature Fusion Network on the YOLOv5 Model for the Health Detection of Coral-Reefs Using a Built-In Benchmark Dataset","authors":"Sivamani Kalyana Sundara Rajan, Nedumaran Damodaran","doi":"10.3390/analytics2010006","DOIUrl":"https://doi.org/10.3390/analytics2010006","url":null,"abstract":"Coral-reefs are a significant species in marine life, which are affected by multiple diseases due to the stress and variation in heat under the impact of the ocean. The autonomous monitoring and detection of coral health are crucial for researchers to protect it at an early stage. The detection of coral diseases is a difficult task due to the inadequate coral-reef datasets. Therefore, we have developed a coral-reef benchmark dataset and proposed a Multi-scale Attention Feature Fusion Network (MAFFN) as a neck part of the YOLOv5’s network, called “MAFFN_YOLOv5”. The MAFFN_YOLOv5 model outperforms the state-of-the-art object detectors, such as YOLOv5, YOLOX, and YOLOR, by improving the detection accuracy to 8.64%, 3.78%, and 18.05%, respectively, based on the mean average precision (mAP@.5), and 7.8%, 3.72%, and 17.87%, respectively, based on the mAP@.5:.95. Consequently, we have tested a hardware-based deep neural network for the detection of coral-reef health.","PeriodicalId":93078,"journal":{"name":"Big data analytics","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135251678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}