Wasim Khan , Shafiqul Abidin , Mohammad Arif , Mohammad Ishrat , Mohd Haleem , Anwar Ahamed Shaikh , Nafees Akhtar Farooqui , Syed Mohd Faisal
{"title":"Anomalous node detection in attributed social networks using dual variational autoencoder with generative adversarial networks","authors":"Wasim Khan , Shafiqul Abidin , Mohammad Arif , Mohammad Ishrat , Mohd Haleem , Anwar Ahamed Shaikh , Nafees Akhtar Farooqui , Syed Mohd Faisal","doi":"10.1016/j.dsm.2023.10.005","DOIUrl":"10.1016/j.dsm.2023.10.005","url":null,"abstract":"<div><p>Many types of real-world information systems, including social media and e-commerce platforms, can be modelled by means of attribute-rich, connected networks. The goal of anomaly detection in artificial intelligence is to identify illustrations that deviate significantly from the main distribution of data or that differ from known cases. Anomalous nodes in node-attributed networks can be identified with greater precision if both graph and node attributes are taken into account. Almost all of the studies in this area focus on supervised techniques for spotting outliers. While supervised algorithms for anomaly detection work well in theory, they cannot be applied to real-world applications owing to a lack of labelled data. Considering the possible data distribution, our model employs a dual variational autoencoder (VAE), while a generative adversarial network (GAN) assures the model is robust to adversarial training. The dual VAEs are used in another capacity: as a fake-node generator. Adversarial training is used to ensure that our latent codes have a Gaussian or uniform distribution. To provide a fair presentation of the graph, the discriminator instructs the generator to generate latent variables with distributions that are more consistent with the actual distribution of the data. Once the model has been learned, the discriminator is used for anomaly detection via reconstruction loss it has been trained to distinguish between the normal and artificial distributions of data. First, using a dual VAE, our model simultaneously captures cross-modality interactions between topological structure and node characteristics and overcomes the problem of unlabeled anomalies, allowing us to better understand the network sparsity and nonlinearity. Second, the proposed model considers the regularization of the latent codes while solving the issue of unregularized embedding techniques that can quickly lead to unsatisfactory representation. Finally, we use the discriminator reconstruction loss for anomaly detection as the discriminator is well-trained to separate the normal and generated data distributions because reconstruction-based loss does not include the adversarial component. Experiments conducted on attributed networks demonstrate the effectiveness of the proposed model and show that it greatly surpasses the previous methods. The area under the curve scores of our proposed model for the BlogCatalog, Flickr, and Enron datasets are 0.83680, 0.82020, and 0.71180, respectively, proving the effectiveness of the proposed model. The result of the proposed model on the Enron dataset is slightly worse than the other models; we attribute this to the dataset's low dimensionality as the most probable explanation.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 2","pages":"Pages 89-98"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000486/pdfft?md5=e26fa7989cfa05fc83b6e2a56b647889&pid=1-s2.0-S2666764923000486-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virtual manufacturing in Industry 4.0: A review","authors":"Mohsen Soori , Behrooz Arezoo , Roza Dastres","doi":"10.1016/j.dsm.2023.10.006","DOIUrl":"10.1016/j.dsm.2023.10.006","url":null,"abstract":"<div><p>Virtual manufacturing is one of the key components of Industry 4.0, the fourth industrial revolution, in improving manufacturing processes. Virtual manufacturing enables manufacturers to optimize their production processes using real-time data from sensors and other connected devices in Industry 4.0. Web-based virtual manufacturing platforms are a critical component of Industry 4.0, enabling manufacturers to design, test, and optimize their processes collaboratively and efficiently. In Industry 4.0, radio frequency identification (RFID) technology is used to provide real-time visibility and control of the supply chain as well as to enable the automation of various manufacturing processes. Big data analytics can be used in conjunction with virtual manufacturing to provide valuable insights and optimize production processes in Industry 4.0. Artificial intelligence (AI) and virtual manufacturing have the potential to enhance the effectiveness, consistency, and adaptability of manufacturing processes, resulting in faster production cycles, better-quality products, and lower prices. Recent developments in the application of virtual manufacturing systems to digital manufacturing platforms from different perspectives, such as the Internet of things, big data analytics, additive manufacturing, autonomous robots, cybersecurity, and RFID technology in Industry 4.0, are discussed in this study to analyze and develop the part manufacturing process in Industry 4.0. The limitations and advantages of virtual manufacturing systems in Industry 4.0 are discussed, and future research projects are also proposed. Thus, productivity in the part manufacturing process can be enhanced by reviewing and analyzing the applications of virtual manufacturing in Industry 4.0.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 47-63"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000498/pdfft?md5=52edcee468e2181649498edc52be3bd1&pid=1-s2.0-S2666764923000498-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135220924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The dynamics of price discovery between the U.S. and Chinese soybean market: A wavelet approach to understanding the effects of Sino-US trade conflict and COVID-19 pandemic","authors":"Xiang Gao , Apicha Insuwan , Ziran Li , Shuairu Tian","doi":"10.1016/j.dsm.2023.10.004","DOIUrl":"10.1016/j.dsm.2023.10.004","url":null,"abstract":"<div><p>During geopolitical crises, the price stability of agricultural commodities is critical for national security. Understanding the dynamics of pricing power between the U.S. and China and how it varies over time can help smaller nations navigate unpredictable moments. This study uses a unified framework and wavelet approach to examine soybean price discovery in the U.S. and China from the standpoints of price interdependence and information flows. We begin by illustrating the integrated link between the soybean futures markets in the U.S. and China, which includes multiple structural breaks. The pricing difference between the two nations acts as the primary information spillover route for their integrated relationship. Furthermore, we show that the direction and degree of information spillover change dramatically in proportion to the strength of the U.S.–Chinese soybean interaction. Finally, we find that China’s recent retaliatory tax on the U.S. soybeans gave the Chinese market a more powerful position in soybean futures price discovery. After the first-stage trade deal was reached, and during the epidemic phase of the coronavirus pandemic, the pricing power of the U.S. soybean market showed no signs of full recovery.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 35-46"},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000474/pdfft?md5=d7204e6a2907f70af43aad7d2dd59bf2&pid=1-s2.0-S2666764923000474-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136010008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using machine learning to identify primary features in choosing electric vehicles based on income levels","authors":"Mingjun Ma, Eugene Pinsky","doi":"10.1016/j.dsm.2023.10.001","DOIUrl":"10.1016/j.dsm.2023.10.001","url":null,"abstract":"<div><p>An electric vehicle is becoming one of the popular choices when choosing a vehicle. People are generally impressed with electric vehicles’ zero-emission and smooth drives, while unstable battery duration keeps people away. This study tries to identify the primary factors that affect the likelihood of owning an electric vehicle based on different income levels. We divide the dataset into three subgroups by household income from $50,000 to $150,000 or low-medium income level, $150,000 to $250,000 or medium-high income level, and $250,000 or above, the high-income level. We considered several machine learning classifiers, and naive Bayes gave us a relatively higher accuracy than other algorithms in terms of overall accuracy and <em>F</em><sub>1</sub> scores. Based on the probability analysis, we found that for each of these groups, one-way commuting distance is the most important for all three income levels.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 1-6"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000449/pdfft?md5=54e2341f8925187b9b44e58073977c1c&pid=1-s2.0-S2666764923000449-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135763031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid distributed feature selection using particle swarm optimization-mutual information","authors":"Khumukcham Robindro, Sanasam Surjalata Devi, Urikhimbam Boby Clinton, Linthoingambi Takhellambam, Yambem Ranjan Singh, Nazrul Hoque","doi":"10.1016/j.dsm.2023.10.003","DOIUrl":"10.1016/j.dsm.2023.10.003","url":null,"abstract":"<div><p>Feature selection (FS) is a data preprocessing step in machine learning (ML) that selects a subset of relevant and informative features from a large feature pool. FS helps ML models improve their predictive accuracy at lower computational costs. Moreover, FS can handle the model overfitting problem on a high-dimensional dataset. A major problem with the filter and wrapper FS methods is that they consume a significant amount of time during FS on high-dimensional datasets. The proposed “HDFS(PSO-MI): hybrid distribute feature selection using particle swarm optimization-mutual information (PSO-MI)”, which is a PSO-based hybrid method that can overcome the problem mentioned above. This method hybridizes the filter and wrapper techniques in a distributed manner. A new combiner is also introduced to merge the effective features selected from multiple data distributions. The effectiveness of the proposed HDFS(PSO-MI) method is evaluated using five ML classifiers, i.e., logistic regression (LR), k-NN, support vector machine (SVM), decision tree (DT), and random forest (RF), on various datasets in terms of accuracy and Matthew’s correlation coefficient (MCC). From the experimental analysis, we observed that HDFS(PSO-MI) method yielded more than 98%, 95%, 92%, 90%, and 85% accuracy for the unbalanced, kidney disease, emotions, wafer manufacturing, and breast cancer datasets, respectively. Our method shows promising results comapred to other methods, such as mutual information, gain ratio, Spearman correlation, analysis of variance (ANOVA), Pearson correlation, and an ensemble feature selection with ranking method (EFSRank).</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 64-73"},"PeriodicalIF":0.0,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000462/pdfft?md5=712938edf51c71c99b1a5d68d7ef20da&pid=1-s2.0-S2666764923000462-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135762720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Audiovisual speech recognition based on a deep convolutional neural network","authors":"Shashidhar Rudregowda , Sudarshan Patilkulkarni , Vinayakumar Ravi , Gururaj H.L. , Moez Krichen","doi":"10.1016/j.dsm.2023.10.002","DOIUrl":"10.1016/j.dsm.2023.10.002","url":null,"abstract":"<div><p>Audiovisual speech recognition is an emerging research topic. Lipreading is the recognition of what someone is saying using visual information, primarily lip movements. In this study, we created a custom dataset for Indian English linguistics and categorized it into three main categories: (1) audio recognition, (2) visual feature extraction, and (3) combined audio and visual recognition. Audio features were extracted using the mel-frequency cepstral coefficient, and classification was performed using a one-dimension convolutional neural network. Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks. Finally, integration was performed using a deep convolutional network. The audio speech of Indian English was successfully recognized with accuracies of 93.67% and 91.53%, respectively, using testing data from two hundred epochs. The training accuracy for visual speech recognition using the Indian English dataset was 77.48% and the test accuracy was 76.19% using 60 epochs. After integration, the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67% and 91.75%, respectively.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 25-34"},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000450/pdfft?md5=597d60fcaaa84868fbbf5a954573c7c1&pid=1-s2.0-S2666764923000450-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135605527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Xu , Yixiang Zhang , Clare Anne McGrory , Jinran Wu , You-Gan Wang
{"title":"Forecasting stock closing prices with an application to airline company data","authors":"Xu Xu , Yixiang Zhang , Clare Anne McGrory , Jinran Wu , You-Gan Wang","doi":"10.1016/j.dsm.2023.09.005","DOIUrl":"10.1016/j.dsm.2023.09.005","url":null,"abstract":"<div><p>Forecasting stock market movements is a challenging task from the practitioners’ point of view. We explore how model selection via the least absolute shrinkage and selection operator (LASSO) approach can be better used to forecast stock closing prices using real-world datasets of daily stock closing prices of three major international airlines. Combining the LASSO method with multiple external data sources in our model leads to a robust and efficient method to predict stock behavior. We also compare our approach with ridge, tree, and support vector machine regressions, as well as neural network approaches to model the data. We include lags of each external variable and response variable in the model, resulting in a total of 870 predictor variables. The empirical results indicate that the LASSO-fitted model is the most effective when compared to other approaches we consider. The results show that the closing price of an airline stock is affected by its closing price for the previous days and those of other types of airlines and is significantly correlated with the Shanghai Composite Index for the previous day and 3 days prior. Other influencing factors include the positive impact of the Shanghai Composite Index daily share volume, the negative impact of loan interest rates, the amount of highway passenger and railway freight turnover, etc.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"6 4","pages":"Pages 239-246"},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000437/pdfft?md5=b882e5b9557ed7e229d1a7c9d7d79989&pid=1-s2.0-S2666764923000437-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134977673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic relationship between volume and volatility in the Chinese stock market: evidence from the MS-VAR model","authors":"Feipeng Zhang , Yilin Zhang , Yixiong Xu , Yan Chen","doi":"10.1016/j.dsm.2023.09.003","DOIUrl":"10.1016/j.dsm.2023.09.003","url":null,"abstract":"<div><p>Since market uncertainty, or volatility, serves as a crucial gauge for assessing the traits of market fluctuations, the link between stock market volume and price continues to be a focal point of interest in finance. This study examines the dynamic, nonlinear correlations between Chinese stock volatility, trading volume, and return using a hybrid approach that combines the Markov switching regime with the vector autoregressive model (MS-VAR). The empirical findings are as follows. (1) The Chinese stock market can be divided into three regional systems: steady downward, steady upward, and high volatility. The three states have similar frequencies of occurrence, and their corresponding stable probabilities are not high, indicating that the Chinese stock market is unstable. (2) Asymmetric dynamic relationships exist between market volatility, investment return, and trading volume. For different regimes, while the effect of trading volume on volatility and return appears to be insignificant, the impacts of volatility and return on trading volume are considerably strong. (3) A regime-dependent, contemporaneous correlation between volatility and return is observed, which also reflects the behavior of the Chinese stock market “chasing up and down”. However, a positive contemporaneous correlation always exists between volatility and trading volumes in different regimes, indicating that uncertainty in the Chinese stock market is closely related to information inflow.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 17-24"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000413/pdfft?md5=59508e63b1ebdc760b29360b3e38fd1b&pid=1-s2.0-S2666764923000413-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135409058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating customer churn in banking: A machine learning approach and visualization app for data science and management","authors":"Pahul Preet Singh , Fahim Islam Anik , Rahul Senapati , Arnav Sinha , Nazmus Sakib , Eklas Hossain","doi":"10.1016/j.dsm.2023.09.002","DOIUrl":"10.1016/j.dsm.2023.09.002","url":null,"abstract":"<div><p>Customer attrition in the banking industry occurs when consumers quit using the goods and services offered by the bank for some time and, after that, end their connection with the bank. Therefore, customer retention is essential in today’s extremely competitive banking market. Additionally, having a solid customer base helps attract new consumers by fostering confidence and a referral from a current clientele. These factors make reducing client attrition a crucial step that banks must pursue. In our research, we aim to examine bank data and forecast which users will most likely discontinue using the bank’s services and become paying customers. We use various machine learning algorithms to analyze the data and show comparative analysis on different evaluation metrics. In addition, we developed a Data Visualization RShiny app for data science and management regarding customer churn analysis. Analyzing this data will help the bank indicate the trend and then try to retain customers on the verge of attrition.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"7 1","pages":"Pages 7-16"},"PeriodicalIF":0.0,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000401/pdfft?md5=cfc2f4530901aaf2ea8c8c1c0289f259&pid=1-s2.0-S2666764923000401-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134993822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding the antecedents of patients’ missed appointments: The perspective of attribution theory","authors":"Guorui Fan , Zhaohua Deng , Lai C. Liu","doi":"10.1016/j.dsm.2023.09.004","DOIUrl":"10.1016/j.dsm.2023.09.004","url":null,"abstract":"<div><p>The occurrence of missed appointment appointments from online outpatient bookings significantly hinders the operational efficiency of outpatient services. This study aimed to investigate various factors influencing patients’ missed appointments from online outpatient bookings. Drawing on attribution theory, an empirical analysis was conducted using 382,004 authentic online outpatient appointments. The empirical findings revealed that appointment lead-time, appointment time, weekday appointments, online doctor rating, appointment doctor’s expertise, patient distance, and previous outpatient visit experience significantly influenced patients’ missed appointment behaviors from online outpatient bookings. Importantly, previous outpatient experience positively moderated the relationship between the appointment doctor’s expertise and patients’ missed-appointment behavior. This study provides insights into the factors influencing patients’ missed-appointment behavior from online outpatient bookings. It further offers a theoretical foundation for medical institutions in China to mitigate the likelihood and adverse effects of patients’ missed-appointment behavior from online outpatient bookings.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":"6 4","pages":"Pages 247-255"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000425/pdfft?md5=71ebf712a9bb6a9bf75d7915cb4d0602&pid=1-s2.0-S2666764923000425-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134918275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}