{"title":"Hybrid scientific article recommendation system with COOT optimization","authors":"R. Sivasankari, J. Dhilipan","doi":"10.1016/j.dsm.2023.11.002","DOIUrl":"10.1016/j.dsm.2023.11.002","url":null,"abstract":"<div><p>Today, recommendation systems are everywhere, making a variety of activities considerably more manageable. These systems help users by personalizing their suggestions to their interests and needs. They can propose various goods, including music, courses, articles, agricultural products, fertilizers, books, movies, and foods. In the case of research articles, recommendation algorithms play an essential role in minimizing the time required for researchers to find relevant articles. Despite multiple challenges, these systems must solve serious issues such as the cold start problem, article privacy, and changing user interests. This research addresses these issues through the use of two techniques: hybrid recommendation systems and COOT optimization. To generate article recommendations, a hybrid recommendation system integrates features from content-based and graph-based recommendation systems. COOT optimization is used to optimize the results, inspired by the movement of water birds. The proposed method combines a graph-based recommendation system with COOT optimization to increase accuracy and reduce result inaccuracies. When compared to the baseline approaches described, the model provided in this study improves precision by 2.3%, recall by 1.6%, and Mean Reciprocal Rank (MRR) by 5.7%.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000516/pdfft?md5=a8f578b0b252fb9a7cc519cb31df8416&pid=1-s2.0-S2666764923000516-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135614110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdessatar Ati , Patrick Bouchet , Roukaya Ben Jeddou
{"title":"Using multi-criteria decision-making and machine learning for football player selection and performance prediction: A systematic review","authors":"Abdessatar Ati , Patrick Bouchet , Roukaya Ben Jeddou","doi":"10.1016/j.dsm.2023.11.001","DOIUrl":"10.1016/j.dsm.2023.11.001","url":null,"abstract":"<div><p>Evaluating and selecting players to suit football clubs and decision-makers (coaches, managers, technical, and medical staff) is a difficult process from a managerial-financial and sporting perspective. Football is a highly competitive sport where sponsors and fans are attracted by success. The most successful players, based on their characteristics (criteria and sub-criteria), can influence the outcome of a football game at any given time. Consequently, the D-day of selection should employ a more appropriate approach to human resource management. To effectively address this issue, a detailed study and analysis of the available literature are needed to assist practitioners and professionals in making decisions about football player selection and hiring. Peer-reviewed journals were selected for collecting published papers between 2018 and 2023. A total of 66 relevant articles (journal articles, conference articles, book sections, and review articles) were selected for evaluation and analysis. The purpose of the study is to present a systematic literature review (SLR) on how to solve this problem and organize the published research papers that answer our four research questions.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000504/pdfft?md5=4dfe252f3db079e14d75e256bf48da67&pid=1-s2.0-S2666764923000504-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135614979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wasim Khan , Shafiqul Abidin , Mohammad Arif , Mohammad Ishrat , Mohd Haleem , Anwar Ahamed Shaikh , Nafees Akhtar Farooqui , Syed Mohd Faisal
{"title":"Anomalous node detection in attributed social networks using dual variational autoencoder with generative adversarial networks","authors":"Wasim Khan , Shafiqul Abidin , Mohammad Arif , Mohammad Ishrat , Mohd Haleem , Anwar Ahamed Shaikh , Nafees Akhtar Farooqui , Syed Mohd Faisal","doi":"10.1016/j.dsm.2023.10.005","DOIUrl":"10.1016/j.dsm.2023.10.005","url":null,"abstract":"<div><p>Many types of real-world information systems, including social media and e-commerce platforms, can be modelled by means of attribute-rich, connected networks. The goal of anomaly detection in artificial intelligence is to identify illustrations that deviate significantly from the main distribution of data or that differ from known cases. Anomalous nodes in node-attributed networks can be identified with greater precision if both graph and node attributes are taken into account. Almost all of the studies in this area focus on supervised techniques for spotting outliers. While supervised algorithms for anomaly detection work well in theory, they cannot be applied to real-world applications owing to a lack of labelled data. Considering the possible data distribution, our model employs a dual variational autoencoder (VAE), while a generative adversarial network (GAN) assures the model is robust to adversarial training. The dual VAEs are used in another capacity: as a fake-node generator. Adversarial training is used to ensure that our latent codes have a Gaussian or uniform distribution. To provide a fair presentation of the graph, the discriminator instructs the generator to generate latent variables with distributions that are more consistent with the actual distribution of the data. Once the model has been learned, the discriminator is used for anomaly detection via reconstruction loss it has been trained to distinguish between the normal and artificial distributions of data. First, using a dual VAE, our model simultaneously captures cross-modality interactions between topological structure and node characteristics and overcomes the problem of unlabeled anomalies, allowing us to better understand the network sparsity and nonlinearity. Second, the proposed model considers the regularization of the latent codes while solving the issue of unregularized embedding techniques that can quickly lead to unsatisfactory representation. Finally, we use the discriminator reconstruction loss for anomaly detection as the discriminator is well-trained to separate the normal and generated data distributions because reconstruction-based loss does not include the adversarial component. Experiments conducted on attributed networks demonstrate the effectiveness of the proposed model and show that it greatly surpasses the previous methods. The area under the curve scores of our proposed model for the BlogCatalog, Flickr, and Enron datasets are 0.83680, 0.82020, and 0.71180, respectively, proving the effectiveness of the proposed model. The result of the proposed model on the Enron dataset is slightly worse than the other models; we attribute this to the dataset's low dimensionality as the most probable explanation.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000486/pdfft?md5=e26fa7989cfa05fc83b6e2a56b647889&pid=1-s2.0-S2666764923000486-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virtual manufacturing in Industry 4.0: A review","authors":"Mohsen Soori , Behrooz Arezoo , Roza Dastres","doi":"10.1016/j.dsm.2023.10.006","DOIUrl":"10.1016/j.dsm.2023.10.006","url":null,"abstract":"<div><p>Virtual manufacturing is one of the key components of Industry 4.0, the fourth industrial revolution, in improving manufacturing processes. Virtual manufacturing enables manufacturers to optimize their production processes using real-time data from sensors and other connected devices in Industry 4.0. Web-based virtual manufacturing platforms are a critical component of Industry 4.0, enabling manufacturers to design, test, and optimize their processes collaboratively and efficiently. In Industry 4.0, radio frequency identification (RFID) technology is used to provide real-time visibility and control of the supply chain as well as to enable the automation of various manufacturing processes. Big data analytics can be used in conjunction with virtual manufacturing to provide valuable insights and optimize production processes in Industry 4.0. Artificial intelligence (AI) and virtual manufacturing have the potential to enhance the effectiveness, consistency, and adaptability of manufacturing processes, resulting in faster production cycles, better-quality products, and lower prices. Recent developments in the application of virtual manufacturing systems to digital manufacturing platforms from different perspectives, such as the Internet of things, big data analytics, additive manufacturing, autonomous robots, cybersecurity, and RFID technology in Industry 4.0, are discussed in this study to analyze and develop the part manufacturing process in Industry 4.0. The limitations and advantages of virtual manufacturing systems in Industry 4.0 are discussed, and future research projects are also proposed. Thus, productivity in the part manufacturing process can be enhanced by reviewing and analyzing the applications of virtual manufacturing in Industry 4.0.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000498/pdfft?md5=52edcee468e2181649498edc52be3bd1&pid=1-s2.0-S2666764923000498-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135220924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The dynamics of price discovery between the U.S. and Chinese soybean market: A wavelet approach to understanding the effects of Sino-US trade conflict and COVID-19 pandemic","authors":"Xiang Gao , Apicha Insuwan , Ziran Li , Shuairu Tian","doi":"10.1016/j.dsm.2023.10.004","DOIUrl":"10.1016/j.dsm.2023.10.004","url":null,"abstract":"<div><p>During geopolitical crises, the price stability of agricultural commodities is critical for national security. Understanding the dynamics of pricing power between the U.S. and China and how it varies over time can help smaller nations navigate unpredictable moments. This study uses a unified framework and wavelet approach to examine soybean price discovery in the U.S. and China from the standpoints of price interdependence and information flows. We begin by illustrating the integrated link between the soybean futures markets in the U.S. and China, which includes multiple structural breaks. The pricing difference between the two nations acts as the primary information spillover route for their integrated relationship. Furthermore, we show that the direction and degree of information spillover change dramatically in proportion to the strength of the U.S.–Chinese soybean interaction. Finally, we find that China’s recent retaliatory tax on the U.S. soybeans gave the Chinese market a more powerful position in soybean futures price discovery. After the first-stage trade deal was reached, and during the epidemic phase of the coronavirus pandemic, the pricing power of the U.S. soybean market showed no signs of full recovery.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000474/pdfft?md5=d7204e6a2907f70af43aad7d2dd59bf2&pid=1-s2.0-S2666764923000474-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136010008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using machine learning to identify primary features in choosing electric vehicles based on income levels","authors":"Mingjun Ma, Eugene Pinsky","doi":"10.1016/j.dsm.2023.10.001","DOIUrl":"10.1016/j.dsm.2023.10.001","url":null,"abstract":"<div><p>An electric vehicle is becoming one of the popular choices when choosing a vehicle. People are generally impressed with electric vehicles’ zero-emission and smooth drives, while unstable battery duration keeps people away. This study tries to identify the primary factors that affect the likelihood of owning an electric vehicle based on different income levels. We divide the dataset into three subgroups by household income from $50,000 to $150,000 or low-medium income level, $150,000 to $250,000 or medium-high income level, and $250,000 or above, the high-income level. We considered several machine learning classifiers, and naive Bayes gave us a relatively higher accuracy than other algorithms in terms of overall accuracy and <em>F</em><sub>1</sub> scores. Based on the probability analysis, we found that for each of these groups, one-way commuting distance is the most important for all three income levels.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000449/pdfft?md5=54e2341f8925187b9b44e58073977c1c&pid=1-s2.0-S2666764923000449-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135763031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid distributed feature selection using particle swarm optimization-mutual information","authors":"Khumukcham Robindro, Sanasam Surjalata Devi, Urikhimbam Boby Clinton, Linthoingambi Takhellambam, Yambem Ranjan Singh, Nazrul Hoque","doi":"10.1016/j.dsm.2023.10.003","DOIUrl":"10.1016/j.dsm.2023.10.003","url":null,"abstract":"<div><p>Feature selection (FS) is a data preprocessing step in machine learning (ML) that selects a subset of relevant and informative features from a large feature pool. FS helps ML models improve their predictive accuracy at lower computational costs. Moreover, FS can handle the model overfitting problem on a high-dimensional dataset. A major problem with the filter and wrapper FS methods is that they consume a significant amount of time during FS on high-dimensional datasets. The proposed “HDFS(PSO-MI): hybrid distribute feature selection using particle swarm optimization-mutual information (PSO-MI)”, which is a PSO-based hybrid method that can overcome the problem mentioned above. This method hybridizes the filter and wrapper techniques in a distributed manner. A new combiner is also introduced to merge the effective features selected from multiple data distributions. The effectiveness of the proposed HDFS(PSO-MI) method is evaluated using five ML classifiers, i.e., logistic regression (LR), k-NN, support vector machine (SVM), decision tree (DT), and random forest (RF), on various datasets in terms of accuracy and Matthew’s correlation coefficient (MCC). From the experimental analysis, we observed that HDFS(PSO-MI) method yielded more than 98%, 95%, 92%, 90%, and 85% accuracy for the unbalanced, kidney disease, emotions, wafer manufacturing, and breast cancer datasets, respectively. Our method shows promising results comapred to other methods, such as mutual information, gain ratio, Spearman correlation, analysis of variance (ANOVA), Pearson correlation, and an ensemble feature selection with ranking method (EFSRank).</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000462/pdfft?md5=712938edf51c71c99b1a5d68d7ef20da&pid=1-s2.0-S2666764923000462-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135762720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Audiovisual speech recognition based on a deep convolutional neural network","authors":"Shashidhar Rudregowda , Sudarshan Patilkulkarni , Vinayakumar Ravi , Gururaj H.L. , Moez Krichen","doi":"10.1016/j.dsm.2023.10.002","DOIUrl":"10.1016/j.dsm.2023.10.002","url":null,"abstract":"<div><p>Audiovisual speech recognition is an emerging research topic. Lipreading is the recognition of what someone is saying using visual information, primarily lip movements. In this study, we created a custom dataset for Indian English linguistics and categorized it into three main categories: (1) audio recognition, (2) visual feature extraction, and (3) combined audio and visual recognition. Audio features were extracted using the mel-frequency cepstral coefficient, and classification was performed using a one-dimension convolutional neural network. Visual feature extraction uses Dlib and then classifies visual speech using a long short-term memory type of recurrent neural networks. Finally, integration was performed using a deep convolutional network. The audio speech of Indian English was successfully recognized with accuracies of 93.67% and 91.53%, respectively, using testing data from two hundred epochs. The training accuracy for visual speech recognition using the Indian English dataset was 77.48% and the test accuracy was 76.19% using 60 epochs. After integration, the accuracies of audiovisual speech recognition using the Indian English dataset for training and testing were 94.67% and 91.75%, respectively.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000450/pdfft?md5=597d60fcaaa84868fbbf5a954573c7c1&pid=1-s2.0-S2666764923000450-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135605527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Xu , Yixiang Zhang , Clare Anne McGrory , Jinran Wu , You-Gan Wang
{"title":"Forecasting stock closing prices with an application to airline company data","authors":"Xu Xu , Yixiang Zhang , Clare Anne McGrory , Jinran Wu , You-Gan Wang","doi":"10.1016/j.dsm.2023.09.005","DOIUrl":"10.1016/j.dsm.2023.09.005","url":null,"abstract":"<div><p>Forecasting stock market movements is a challenging task from the practitioners’ point of view. We explore how model selection via the least absolute shrinkage and selection operator (LASSO) approach can be better used to forecast stock closing prices using real-world datasets of daily stock closing prices of three major international airlines. Combining the LASSO method with multiple external data sources in our model leads to a robust and efficient method to predict stock behavior. We also compare our approach with ridge, tree, and support vector machine regressions, as well as neural network approaches to model the data. We include lags of each external variable and response variable in the model, resulting in a total of 870 predictor variables. The empirical results indicate that the LASSO-fitted model is the most effective when compared to other approaches we consider. The results show that the closing price of an airline stock is affected by its closing price for the previous days and those of other types of airlines and is significantly correlated with the Shanghai Composite Index for the previous day and 3 days prior. Other influencing factors include the positive impact of the Shanghai Composite Index daily share volume, the negative impact of loan interest rates, the amount of highway passenger and railway freight turnover, etc.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000437/pdfft?md5=b882e5b9557ed7e229d1a7c9d7d79989&pid=1-s2.0-S2666764923000437-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134977673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic relationship between volume and volatility in the Chinese stock market: evidence from the MS-VAR model","authors":"Feipeng Zhang , Yilin Zhang , Yixiong Xu , Yan Chen","doi":"10.1016/j.dsm.2023.09.003","DOIUrl":"10.1016/j.dsm.2023.09.003","url":null,"abstract":"<div><p>Since market uncertainty, or volatility, serves as a crucial gauge for assessing the traits of market fluctuations, the link between stock market volume and price continues to be a focal point of interest in finance. This study examines the dynamic, nonlinear correlations between Chinese stock volatility, trading volume, and return using a hybrid approach that combines the Markov switching regime with the vector autoregressive model (MS-VAR). The empirical findings are as follows. (1) The Chinese stock market can be divided into three regional systems: steady downward, steady upward, and high volatility. The three states have similar frequencies of occurrence, and their corresponding stable probabilities are not high, indicating that the Chinese stock market is unstable. (2) Asymmetric dynamic relationships exist between market volatility, investment return, and trading volume. For different regimes, while the effect of trading volume on volatility and return appears to be insignificant, the impacts of volatility and return on trading volume are considerably strong. (3) A regime-dependent, contemporaneous correlation between volatility and return is observed, which also reflects the behavior of the Chinese stock market “chasing up and down”. However, a positive contemporaneous correlation always exists between volatility and trading volumes in different regimes, indicating that uncertainty in the Chinese stock market is closely related to information inflow.</p></div>","PeriodicalId":100353,"journal":{"name":"Data Science and Management","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666764923000413/pdfft?md5=59508e63b1ebdc760b29360b3e38fd1b&pid=1-s2.0-S2666764923000413-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135409058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}