{"title":"Sentiment Analysis using Dictionary-Based Lexicon Approach: Analysis on the Opinion of Indian Community for the Topic of Cryptocurrency","authors":"Sankalp Loomba, Madhavi Dave, Harshal Arolkar, Sachin Sharma","doi":"10.1007/s40745-023-00496-y","DOIUrl":"10.1007/s40745-023-00496-y","url":null,"abstract":"<div><p>Due to the ever-increasing computing power and easy availability, social-networking platforms like Facebook, Twitter, etc. have become a popular medium to express one’s views instantly, be it about political situations, commercial products, or social occurrences. Twitter is a powerful source of information, whose data can be utilized to investigate the opinions of users through a process called Opinion Mining or Sentiment Analysis. Using the principles of Natural Language Processing and data science, this paper presents a comparative evaluation of multiple lexicon-based sentiment analysis algorithms to extract public opinion from tweets. The study explores the nuances of sentiment analysis using data science methodology, assessing how various lexicon-based algorithms may successfully identify and classify sentiments expressed in tweets from the Indian community about cryptocurrency.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2019 - 2034"},"PeriodicalIF":0.0,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-023-00496-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135218031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Discrete Mixture of Moment Exponential Using Lagrangian Probability Model: Properties and Applications in Count Data with Excess Zeros","authors":"Mohanan Monisha, Damodaran Santhamani Shibu","doi":"10.1007/s40745-023-00498-w","DOIUrl":"10.1007/s40745-023-00498-w","url":null,"abstract":"<div><p>In this paper, we introduce a new distribution for modeling count datasets with some unique characteristics, obtained by mixing the generalized Poisson distribution and the moment exponential distribution based on the framework of the Lagrangian probability distribution, so-called generalized Poisson moment exponential distribution (GPMED). It is shown that the Poisson-moment exponential and Poisson-Ailamujia distributions are special cases of the GPMED. Some important mathematical properties of the GPMED, including median, mode and non-central moment are also discussed through this paper. It is shown that the moment of the GPMED do not exist in some situations and have increasing, decreasing, and upside-down bathtub shaped hazard rates. The maximum likelihood method has been discussed for estimating its parameters. The likelihood ratio test is used to assess the effectiveness of the additional parameter included in the GPMED. The behaviour of these estimators is assessed using simulation study based on the inverse tranformation method. A zero-inflated version of the GPMED is also defined for the situation with an excessive number of zeros in the datasets. Applications of the GPMED and zero-inflated GPMED in various fields are presented and compared with some other existing distributions. In general, the GPMED or its zero-inflated version performs better than the other models, especially for the cases where the data are highly skewed or excessive number of zeros.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2035 - 2057"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135858861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Wind Turbine Fault Diagnosis Method Realized by Vibration Monitoring","authors":"Xiuhua Jiang","doi":"10.1007/s40745-023-00497-x","DOIUrl":"10.1007/s40745-023-00497-x","url":null,"abstract":"<div><p>Wind energy is one of the fast evolving renewable energy sources that has seen widespread application. Therefore, research on its carrier, the wind turbine, is growing, and the majority of them concentrate on the diagnosis of wind turbine faults. In this paper, the vibration signals collected in the time domain by vibration monitoring were analyzed, and the fault characteristic parameters were identified. These parameters were then inputted into a genetic algorithm back-propagation neural network (GA-BPNN) for wind turbine fault diagnosis. It was found that the presence of defects in the wind turbine depended on the effective value, peak value, and kurtosis of the vibration signal. The overall recognition accuracy of the GA-BPNN was 94.89%, which was much higher than that of the support vector machine (88.7%) and random forest (88.35%). Therefore, it is feasible and highly accurate to extract fault characteristic parameters through vibration monitoring and input them into a GA-BPNN for wind turbine fault diagnosis.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 2","pages":"749 - 758"},"PeriodicalIF":0.0,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136295454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of Various Job Opportunities in IT Companies Using Enhanced Integrated Gated Recurrent Unit (EIGRU)","authors":"R. Santhosh Kumar, N. Prakash","doi":"10.1007/s40745-023-00495-z","DOIUrl":"10.1007/s40745-023-00495-z","url":null,"abstract":"<div><p>The fresh engineering graduates are looking only for the popular jobs where the competition is high and the number of job openings is minimal, but they fail to look for the other job openings. The major problem is that the graduates fail to look at the number of requirements needed for a job role in the present and future. So there is a need for a prediction model that provides the number of job opportunities in a job role in the future. Many research studies have been carried out to predict the placement status of students, but they have not predicted the number of job opportunities in a job role. Many existing prediction models focus on improving prediction accuracy but fail to consider the handling of data fluctuations. When there is a data fluctuation, the predicted value deviates from the actual value. This paper presents a hybrid time-series prediction model called the enhanced integrated gated recurrent unit (EIGRU) Model to predict the number of job opportunities in a job role based on the company, salary, and experience. The proposed EIGRU model tries to minimize the divergence in the predicted value. The proposed time series prediction model is achieving a prediction accuracy of 98%. Based on the experimental evaluation of the Job dataset, the proposed model’s mean absolute percentage error and mean absolute error values are lower than the baseline models. As a result, the graduates will know about the number of job opportunities in their job role and make an effective decision.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2001 - 2018"},"PeriodicalIF":0.0,"publicationDate":"2023-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135197829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Diabetic Retinopathy Diagnosis with ResNet-50-Based Transfer Learning: A Promising Approach","authors":"S. Karthika, M. Durgadevi, T. Yamuna Rani","doi":"10.1007/s40745-023-00494-0","DOIUrl":"10.1007/s40745-023-00494-0","url":null,"abstract":"<div><p>Diabetic retinopathy is considered the leading cause of blindness in the population. High blood sugar levels can damage the tiny blood vessels in the retina at any time, leading to retinal detachment and sometimes glaucoma blindness. Treatment involves maintaining the current visual quality of the patient, as the disease is irreversible. Early diagnosis and timely treatment are crucial to minimizing the risk of vision loss. However, existing DR recognition strategies face numerous challenges, such as limited training datasets, high training loss, high-dimensional features, and high misclassification rates, which can significantly affect classification accuracies. In this paper, we propose a ResNet-50-based transfer learning method for classifying DR, which leverages the knowledge and expertise gained from training on a large dataset such as ImageNet. Our method involves preprocessing and segmenting the input images, which are then fed into ResNet-50 for extracting optimal features. We freeze a few layers of the pre-trained ResNet-50 and add Global Average Pooling to generate feature maps. The reduced feature maps are then classified to categorize the type of diabetic retinopathy. We evaluated the proposed method on 40 Real-time fundus images gathered from ICF Hospital together with the APTOS-2019 dataset and used various metrics to evaluate its performance. The experimentation results revealed that the proposed method achieved an accuracy of 99.82%, a sensitivity of 99%, a specificity of 96%, and an AUC score of 0.99 compared to existing DR recognition techniques. Overall, our ResNet-50-based transfer learning method presents a promising approach for DR classification and addresses the existing challenges of DR recognition strategies. It has the potential to aid in early DR diagnosis, leading to timely treatment and improved visual outcomes for patients.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135014170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Selim, Taha E. Taha, Adel S. El-Fishawy, O. Zahran, M. M. Hadhoud, M. I. Dessouky, Fathi E. Abd El-Samie, Noha El-Hag
{"title":"Spiral Fractal Compression in Transform Domains for Underwater Communication","authors":"A. Selim, Taha E. Taha, Adel S. El-Fishawy, O. Zahran, M. M. Hadhoud, M. I. Dessouky, Fathi E. Abd El-Samie, Noha El-Hag","doi":"10.1007/s40745-023-00466-4","DOIUrl":"10.1007/s40745-023-00466-4","url":null,"abstract":"<div><p>This paper presents a simplified fractal image compression algorithm, which is implemented on a block-by-block basis. This algorithm achieves a Compression Ratio (CR) of up to 10 with a Peak Signal-to-Noise Ratio (PSNR) as high as 35 dB. Hence, it is very appropriate for the new applications of underwater communication. The idea of the proposed algorithm is based on the segmentation of the image, first, into blocks to setup reference blocks. The image is then decomposed again into block ranges, and a search process is carried out to find the reference blocks with the best match. The transmitted or stored values, after compression, are the reference block values and the indices of the reference block that achieves the best match. If there is no match, the average value of the block range is transmitted or stored instead. The effect of the spiral architecture instead of square block decomposition is studied. A comparison between different algorithms, including the conventional square search, the proposed simplified fractal compression algorithm and the standard JPEG compression algorithm, is introduced. We applied the types of fractal compression on a video sequence. In addition, the effect of using the fractal image compression algorithms in transform domain is investigated. The image is transferred firstly to a transform domain. The Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) are used. After transformation takes place, the fractal algorithm is applied. A comparison between three fractal algorithms, namely conventional square, spiral, and simplified fractal compression, is presented. The comparison is repeated in the two cases of transformation. The DWT is used also in this paper to increase the CR of the block domain pool. We decompose the block domain by wavelet decomposition to two levels. This process gives a CR for block domain transmission as high as 16. The advantage of the proposed implementation is the simplicity of computation. We found that with the spiral architecture in fractal compression, the video sequence visual quality is better than those produced with conventional square fractal compression and the proposed simplified algorithm at the same CR, but with longer time consumed. We found also that all types of fractal compression give better quality than that of the standard JPEG. In addition, the decoded images, in case of using the wavelet transform, are the best. On the other hand, in case of using DCT, the decoded images have poor quality.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"1003 - 1030"},"PeriodicalIF":0.0,"publicationDate":"2023-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135307314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linda Joel, S. Parthasarathy, P. Venkatesan, S. Nandhini
{"title":"IPH2O: Island Parallel-Harris Hawks Optimizer-Based CLSTM for Stock Price Movement Prediction","authors":"Linda Joel, S. Parthasarathy, P. Venkatesan, S. Nandhini","doi":"10.1007/s40745-023-00489-x","DOIUrl":"10.1007/s40745-023-00489-x","url":null,"abstract":"<div><p>Stock price movement forecasting is the process of predicting the future price of a financial and company stock from chaotic data. In recent years, many financial institutions and academics have shown interest in stock market forecasting. The accurate and successful predictions of the future price of stock yield a substantial profit. However, the current approaches are a major challenge due to the dynamic, chaotic, high-noise, non-linear, highly complex, and nonparametric characteristics of stock data. Furthermore, it is not sufficient to consider only the target firms' information because the stock prices of the target firms may be influenced by their related firms. Significant profits can be made by correct forecasting of stock prices, while poor forecasts can cause huge problems. Thus, we propose a novel Island Parallel-Harris Hawks Optimizer (IP-HHO)-optimized Convolutional Long Short Term Memory (ConvLSTM) with an autocorrelation model to predict stock price movement. Then, using the IP-HHO algorithm, the hyperparameters of ConvLSTM are optimized to minimize the Mean Absolute Percentage Error (MAPE). Four different types of financial time series datasets are utilized to validate the performance of the evaluation measures such as root mean square error, MAPE, Index of Agreement, accuracy, and F1 score. The results show that the IP-HHO-optimized ConvLSTM model outperforms others by improving the prediction rate accuracy and effectively minimizing the MAPE rate by 19.62%.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"1959 - 1974"},"PeriodicalIF":0.0,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-023-00489-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42572163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amal S. Hassan, Rana M. Mousa, Mahmoud H. Abu-Moussa
{"title":"Bayesian Analysis of Generalized Inverted Exponential Distribution Based on Generalized Progressive Hybrid Censoring Competing Risks Data","authors":"Amal S. Hassan, Rana M. Mousa, Mahmoud H. Abu-Moussa","doi":"10.1007/s40745-023-00488-y","DOIUrl":"10.1007/s40745-023-00488-y","url":null,"abstract":"<div><p>In this study, a competing risk model was developed under a generalized progressive hybrid censoring scheme using a generalized inverted exponential distribution. The latent causes of failure were presumed to be independent. Estimating the unknown parameters is performed using maximum likelihood (ML) and Bayesian methods. Using the Markov chain Monte Carlo technique, Bayesian estimators were obtained under gamma priors with various loss functions. ML estimate was used to create confidence intervals (CIs). In addition, we present two bootstrap CIs for the unknown parameters. Further, credible CIs and the highest posterior density intervals were constructed based on the conditional posterior distribution. Monte Carlo simulation is used to examine the performance of different estimates. Applications to real data were used to check the estimates and compare the proposed model with alternative distributions.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1225 - 1264"},"PeriodicalIF":0.0,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48748310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative Analysis of Group for Epidemiology Architectural Approach","authors":"Dephney Mathebula","doi":"10.1007/s40745-023-00493-1","DOIUrl":"10.1007/s40745-023-00493-1","url":null,"abstract":"<div><p>Epidemiology, the aspect of research focusing on disease modelling is date intensive. Research epidemiologists in different research groups played a key role in developing different data driven model for COVID-19 and monkeypox. The requirement of accessing highly accurate data useful for disease modelling is beneficial but not without having challenges. Currently, the task of data acquisition is executed by select individuals in different research groups. This approach experiences the drawbacks associated with getting permission to access the desired data and inflexibility to change data acquisition goals due to dynamic epidemiological research objectives. The presented research addresses these challenges and proposes the design and use of dynamic intelligent crawlers for acquiring epidemiological data related to a given goal. In addition, the research aims to quantify how the use of computing entities enhances the process of data acquisition in epidemiological related studies. This is done by formulating and investigating the metrics of the data acquisition efficiency and the data analytics efficiency. The use of human assisted crawlers in the global information networks is found to enhance data acquisition efficiency (DAqE) and data analytics efficiency (DAnE). The use of human assisted crawlers in a hybrid configuration outperforms the case where manual research group member efforts are expended enhancing the DAqE and DAnE by up to 35% and 99% on average, respectively.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"979 - 1001"},"PeriodicalIF":0.0,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-023-00493-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44922195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of ( P[Y<X] ) for Dependence of Stress–Strength Models with Weibull Marginals","authors":"Dipak D. Patil, U. V. Naik-Nimbalkar, M. M. Kale","doi":"10.1007/s40745-023-00487-z","DOIUrl":"10.1007/s40745-023-00487-z","url":null,"abstract":"<div><p>The stress–strength model is a basic tool used in evaluating the reliability <span>( R = P(Y < X))</span>. We consider an expression for <i>R</i> where the random variables X and Y denote strength and stress, respectively. The system fails only if the stress exceeds the strength. We aim to study the effect of the dependency between X and Y on <i>R</i>. We assume that X and Y follow Weibull distributions and their dependency is modeled by a copula with the dependency parameter <span>( theta )</span>. We compute <i>R</i> for Farlie–Gumbel–Morgenstern (FGM), Ali–Mikhail–Haq (AMH), Gumbel’s bivariate exponential copulas, and for Gumbel–Hougaard (GH) copula using a Monte-Carlo integration technique. We plot the graph of <i>R</i> versus <span>(theta )</span> to study the effect of dependency on <i>R</i>. We estimate <i>R</i> by plugging in the estimates of the marginal parameters and of <span>( theta )</span> in its expression. The estimates of the marginal parameters are based on the marginal likelihood. The estimates of <span>(theta )</span> are obtained from two different methods; one is based on the conditional likelihood and the other is based on the method of moments using Blomqvist’s beta. Asymptotic distribution of both the estimators of <i>R</i> is obtained. Finally, analysis of real data set is also performed for illustrative purposes.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1303 - 1340"},"PeriodicalIF":0.0,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48578734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}