{"title":"The dynamics of price discovery between the U.S. and Chinese soybean market: A wavelet approach to understanding the effects of Sino-US trade conflict and COVID-19 pandemic","authors":"Xiang Gao , Apicha Insuwan , Ziran Li , Shuairu Tian","doi":"10.1016/j.dsm.2023.10.004","DOIUrl":"10.1016/j.dsm.2023.10.004","url":null,"abstract":"<div><p>During geopolitical crises, the price stability of agricultural commodities is critical for national security. Understanding the dynamics of pricing power between the U.S. and China and how it varies over time can help smaller nations navigate unpredictable moments. This study uses a unified framework and wavelet approach to examine soybean price discovery in the U.S. and China from the standpoints of price interdependence and information flows. We begin by illustrating the integrated link between the soybean futures markets in the U.S. and China, which includes multiple structural breaks. The pricing difference between the two nations acts as the primary information spillover route for their integrated relationship. Furthermore, we show that the direction and degree of information spillover change dramatically in proportion to the strength of the U.S.–Chinese soybean interaction. Finally, we find that China’s recent retaliatory tax on the U.S. soybeans gave the Chinese market a more powerful position in soybean futures price discovery. After the first-stage trade deal was reached, and during the epidemic phase of the coronavirus pandemic, the pricing power of the U.S. soybean market showed no signs of full recovery.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 35-46"},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000474/pdfft?md5=d7204e6a2907f70af43aad7d2dd59bf2&pid=1-s2.0-S2666764923000474-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136010008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using machine learning to identify primary features in choosing electric vehicles based on income levels","authors":"Mingjun Ma, Eugene Pinsky","doi":"10.1016/j.dsm.2023.10.001","DOIUrl":"10.1016/j.dsm.2023.10.001","url":null,"abstract":"<div><p>An electric vehicle is becoming one of the popular choices when choosing a vehicle. People are generally impressed with electric vehicles’ zero-emission and smooth drives, while unstable battery duration keeps people away. This study tries to identify the primary factors that affect the likelihood of owning an electric vehicle based on different income levels. We divide the dataset into three subgroups by household income from $50,000 to $150,000 or low-medium income level, $150,000 to $250,000 or medium-high income level, and $250,000 or above, the high-income level. We considered several machine learning classifiers, and naive Bayes gave us a relatively higher accuracy than other algorithms in terms of overall accuracy and <em>F</em><sub>1</sub> scores. Based on the probability analysis, we found that for each of these groups, one-way commuting distance is the most important for all three income levels.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 1-6"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000449/pdfft?md5=54e2341f8925187b9b44e58073977c1c&pid=1-s2.0-S2666764923000449-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135763031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid distributed feature selection using particle swarm optimization-mutual information","authors":"Khumukcham Robindro, Sanasam Surjalata Devi, Urikhimbam Boby Clinton, Linthoingambi Takhellambam, Yambem Ranjan Singh, Nazrul Hoque","doi":"10.1016/j.dsm.2023.10.003","DOIUrl":"10.1016/j.dsm.2023.10.003","url":null,"abstract":"<div><p>Feature selection (FS) is a data preprocessing step in machine learning (ML) that selects a subset of relevant and informative features from a large feature pool. FS helps ML models improve their predictive accuracy at lower computational costs. Moreover, FS can handle the model overfitting problem on a high-dimensional dataset. A major problem with the filter and wrapper FS methods is that they consume a significant amount of time during FS on high-dimensional datasets. The proposed “HDFS(PSO-MI): hybrid distribute feature selection using particle swarm optimization-mutual information (PSO-MI)”, which is a PSO-based hybrid method that can overcome the problem mentioned above. This method hybridizes the filter and wrapper techniques in a distributed manner. A new combiner is also introduced to merge the effective features selected from multiple data distributions. The effectiveness of the proposed HDFS(PSO-MI) method is evaluated using five ML classifiers, i.e., logistic regression (LR), k-NN, support vector machine (SVM), decision tree (DT), and random forest (RF), on various datasets in terms of accuracy and Matthew’s correlation coefficient (MCC). From the experimental analysis, we observed that HDFS(PSO-MI) method yielded more than 98%, 95%, 92%, 90%, and 85% accuracy for the unbalanced, kidney disease, emotions, wafer manufacturing, and breast cancer datasets, respectively. Our method shows promising results comapred to other methods, such as mutual information, gain ratio, Spearman correlation, analysis of variance (ANOVA), Pearson correlation, and an ensemble feature selection with ranking method (EFSRank).</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 64-73"},"PeriodicalIF":0.0,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000462/pdfft?md5=712938edf51c71c99b1a5d68d7ef20da&pid=1-s2.0-S2666764923000462-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135762720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Audiovisual speech recognition based on a deep convolutional neural network","authors":"Shashidhar Rudregowda , Sudarshan Patilkulkarni , Vinayakumar Ravi , Gururaj H.L. , Moez Krichen","doi":"10.1016/j.dsm.2023.10.002","DOIUrl":"10.1016/j.dsm.2023.10.002","url":null,"abstract":"<div><p>Audiovisual speech recognition is an emerging research topic. Lipreading is the recognition of what someone is saying using visual information, primarily lip movements. In this study, we created a custom dataset for Indian English linguistics and categorized it into three main categories: (1) audio recognition, (2) visual feature extraction, and (3) combined audio and visual recognition. Audio features were extracted using the mel-frequency cepstral coefficient, and classification was performed using a one-dimension convolutional neural network. Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks. Finally, integration was performed using a deep convolutional network. The audio speech of Indian English was successfully recognized with accuracies of 93.67% and 91.53%, respectively, using testing data from two hundred epochs. The training accuracy for visual speech recognition using the Indian English dataset was 77.48% and the test accuracy was 76.19% using 60 epochs. After integration, the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67% and 91.75%, respectively.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 25-34"},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000450/pdfft?md5=597d60fcaaa84868fbbf5a954573c7c1&pid=1-s2.0-S2666764923000450-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135605527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Xu , Yixiang Zhang , Clare Anne McGrory , Jinran Wu , You-Gan Wang
{"title":"Forecasting stock closing prices with an application to airline company data","authors":"Xu Xu , Yixiang Zhang , Clare Anne McGrory , Jinran Wu , You-Gan Wang","doi":"10.1016/j.dsm.2023.09.005","DOIUrl":"10.1016/j.dsm.2023.09.005","url":null,"abstract":"<div><p>Forecasting stock market movements is a challenging task from the practitioners’ point of view. We explore how model selection via the least absolute shrinkage and selection operator (LASSO) approach can be better used to forecast stock closing prices using real-world datasets of daily stock closing prices of three major international airlines. Combining the LASSO method with multiple external data sources in our model leads to a robust and efficient method to predict stock behavior. We also compare our approach with ridge, tree, and support vector machine regressions, as well as neural network approaches to model the data. We include lags of each external variable and response variable in the model, resulting in a total of 870 predictor variables. The empirical results indicate that the LASSO-fitted model is the most effective when compared to other approaches we consider. The results show that the closing price of an airline stock is affected by its closing price for the previous days and those of other types of airlines and is significantly correlated with the Shanghai Composite Index for the previous day and 3 days prior. Other influencing factors include the positive impact of the Shanghai Composite Index daily share volume, the negative impact of loan interest rates, the amount of highway passenger and railway freight turnover, etc.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"6 4","pages":"Pages 239-246"},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000437/pdfft?md5=b882e5b9557ed7e229d1a7c9d7d79989&pid=1-s2.0-S2666764923000437-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134977673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic relationship between volume and volatility in the Chinese stock market: evidence from the MS-VAR model","authors":"Feipeng Zhang , Yilin Zhang , Yixiong Xu , Yan Chen","doi":"10.1016/j.dsm.2023.09.003","DOIUrl":"10.1016/j.dsm.2023.09.003","url":null,"abstract":"<div><p>Since market uncertainty, or volatility, serves as a crucial gauge for assessing the traits of market fluctuations, the link between stock market volume and price continues to be a focal point of interest in finance. This study examines the dynamic, nonlinear correlations between Chinese stock volatility, trading volume, and return using a hybrid approach that combines the Markov switching regime with the vector autoregressive model (MS-VAR). The empirical findings are as follows. (1) The Chinese stock market can be divided into three regional systems: steady downward, steady upward, and high volatility. The three states have similar frequencies of occurrence, and their corresponding stable probabilities are not high, indicating that the Chinese stock market is unstable. (2) Asymmetric dynamic relationships exist between market volatility, investment return, and trading volume. For different regimes, while the effect of trading volume on volatility and return appears to be insignificant, the impacts of volatility and return on trading volume are considerably strong. (3) A regime-dependent, contemporaneous correlation between volatility and return is observed, which also reflects the behavior of the Chinese stock market “chasing up and down”. However, a positive contemporaneous correlation always exists between volatility and trading volumes in different regimes, indicating that uncertainty in the Chinese stock market is closely related to information inflow.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 17-24"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000413/pdfft?md5=59508e63b1ebdc760b29360b3e38fd1b&pid=1-s2.0-S2666764923000413-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135409058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating customer churn in banking: A machine learning approach and visualization app for data science and management","authors":"Pahul Preet Singh , Fahim Islam Anik , Rahul Senapati , Arnav Sinha , Nazmus Sakib , Eklas Hossain","doi":"10.1016/j.dsm.2023.09.002","DOIUrl":"10.1016/j.dsm.2023.09.002","url":null,"abstract":"<div><p>Customer attrition in the banking industry occurs when consumers quit using the goods and services offered by the bank for some time and, after that, end their connection with the bank. Therefore, customer retention is essential in today’s extremely competitive banking market. Additionally, having a solid customer base helps attract new consumers by fostering confidence and a referral from a current clientele. These factors make reducing client attrition a crucial step that banks must pursue. In our research, we aim to examine bank data and forecast which users will most likely discontinue using the bank’s services and become paying customers. We use various machine learning algorithms to analyze the data and show comparative analysis on different evaluation metrics. In addition, we developed a Data Visualization RShiny app for data science and management regarding customer churn analysis. Analyzing this data will help the bank indicate the trend and then try to retain customers on the verge of attrition.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 7-16"},"PeriodicalIF":0.0,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000401/pdfft?md5=cfc2f4530901aaf2ea8c8c1c0289f259&pid=1-s2.0-S2666764923000401-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134993822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding the antecedents of patients’ missed appointments: The perspective of attribution theory","authors":"Guorui Fan , Zhaohua Deng , Lai C. Liu","doi":"10.1016/j.dsm.2023.09.004","DOIUrl":"10.1016/j.dsm.2023.09.004","url":null,"abstract":"<div><p>The occurrence of missed appointment appointments from online outpatient bookings significantly hinders the operational efficiency of outpatient services. This study aimed to investigate various factors influencing patients’ missed appointments from online outpatient bookings. Drawing on attribution theory, an empirical analysis was conducted using 382,004 authentic online outpatient appointments. The empirical findings revealed that appointment lead-time, appointment time, weekday appointments, online doctor rating, appointment doctor’s expertise, patient distance, and previous outpatient visit experience significantly influenced patients’ missed appointment behaviors from online outpatient bookings. Importantly, previous outpatient experience positively moderated the relationship between the appointment doctor’s expertise and patients’ missed-appointment behavior. This study provides insights into the factors influencing patients’ missed-appointment behavior from online outpatient bookings. It further offers a theoretical foundation for medical institutions in China to mitigate the likelihood and adverse effects of patients’ missed-appointment behavior from online outpatient bookings.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"6 4","pages":"Pages 247-255"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000425/pdfft?md5=71ebf712a9bb6a9bf75d7915cb4d0602&pid=1-s2.0-S2666764923000425-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134918275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual-stage ensemble approach using online knowledge distillation for forecasting carbon emissions in the electric power industry","authors":"Ruibin Lin, Xing Lv, Huanling Hu, Liwen Ling, Zehui Yu, Dabin Zhang","doi":"10.1016/j.dsm.2023.09.001","DOIUrl":"10.1016/j.dsm.2023.09.001","url":null,"abstract":"<div><p>The electric power industry is the key to achieving the goals of carbon peak and neutrality. Accurate forecasting of carbon emissions in the electric power industry can aid in the prompt adjustment of power generation policies and the early achievement of carbon reduction targets. This study proposes a new approach that combines the decomposition-ensemble paradigm with knowledge distillation to forecast daily carbon emissions. First, seasonal and trend decomposition using locally weighted scatterplot smoothing (STL) is used to decompose the data into three subcomponents. Second, two heterogeneous deep neural network models are jointly trained to predict each subcomponent based on online knowledge distillation. During training, the two models learn and provide feedback to each other. The first model-ensemble stage is performed by synthesizing the predictions for each subcomponent of the two models. Finally, the second model-ensemble stage is performed. The predictions for each subcomponent are integrated using linear addition to obtain the final results. In addition, to avoid leakage of test data caused by decomposing the entire time series, a recursive forecasting strategy is applied. Multistep predictions are obtained by forecasting 7, 15, and 30 days in the future. Experimental results using metaheuristic algorithms to optimize hyperparameters show that the proposed method evaluated on the daily carbon emissions dataset has better forecasting performance than all baselines.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"6 4","pages":"Pages 227-238"},"PeriodicalIF":0.0,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000395/pdfft?md5=f20a2e0ce3f1de499a3c6ddbd9113351&pid=1-s2.0-S2666764923000395-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74621919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Zhu , Qingyong Chu , Xinchang Song , Ping Hu , Lu Peng
{"title":"Explainable prediction of loan default based on machine learning models","authors":"Xu Zhu , Qingyong Chu , Xinchang Song , Ping Hu , Lu Peng","doi":"10.1016/j.dsm.2023.04.003","DOIUrl":"https://doi.org/10.1016/j.dsm.2023.04.003","url":null,"abstract":"<div><p>Owing to the convenience of online loans, an increasing number of people are borrowing money on online platforms. With the emergence of machine learning technology, predicting loan defaults has become a popular topic. However, machine learning models have a black-box problem that cannot be disregarded. To make the prediction model rules more understandable and thereby increase the user’s faith in the model, an explanatory model must be used. Logistic regression, decision tree, XGBoost, and LightGBM models are employed to predict a loan default. The prediction results show that LightGBM and XGBoost outperform logistic regression and decision tree models in terms of the predictive ability. The area under curve for LightGBM is 0.7213. The accuracies of LightGBM and XGBoost exceed 0.8. The precisions of LightGBM and XGBoost exceed 0.55. Simultaneously, we employed the local interpretable model-agnostic explanations approach to undertake an explainable analysis of the prediction findings. The results show that factors such as the loan term, loan grade, credit rating, and loan amount affect the predicted outcomes.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"6 3","pages":"Pages 123-133"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49765243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}