Nehad M. Ibrahim, D. Gabr, Atta Rahman, Dhiaa Musleh, Dania Alkhulaifi, Mariam Alkharraa
{"title":"Transfer Learning Approach to Seed Taxonomy: A Wild Plant Case Study","authors":"Nehad M. Ibrahim, D. Gabr, Atta Rahman, Dhiaa Musleh, Dania Alkhulaifi, Mariam Alkharraa","doi":"10.3390/bdcc7030128","DOIUrl":"https://doi.org/10.3390/bdcc7030128","url":null,"abstract":"Plant taxonomy is the scientific study of the classification and naming of various plant species. It is a branch of biology that aims to categorize and organize the diverse variety of plant life on earth. Traditionally, plant taxonomy has been performed using morphological and anatomical characteristics, such as leaf shape, flower structure, and seed and fruit characters. Artificial intelligence (AI), machine learning, and especially deep learning can also play an instrumental role in plant taxonomy by automating the process of categorizing plant species based on the available features. This study investigated transfer learning techniques to analyze images of plants and extract features that can be used to cluster the species hierarchically using the k-means clustering algorithm. Several pretrained deep learning models were employed and evaluated. In this regard, two separate datasets were used in the study comprising of seed images of wild plants collected from Egypt. Extensive experiments using the transfer learning method (DenseNet201) demonstrated that the proposed methods achieved superior accuracy compared to traditional methods with the highest accuracy of 93% and F1-score and area under the curve (AUC) of 95%, respectively. That is considerable in contrast to the state-of-the-art approaches in the literature.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48693249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dhiaa Musleh, Ibrahim Alkhwaja, Ali Alkhwaja, Mohammed Alghamdi, Hussam Abahussain, Faisal Alfawaz, N. Min-Allah, M. M. Abdulqader
{"title":"Arabic Sentiment Analysis of YouTube Comments: NLP-Based Machine Learning Approaches for Content Evaluation","authors":"Dhiaa Musleh, Ibrahim Alkhwaja, Ali Alkhwaja, Mohammed Alghamdi, Hussam Abahussain, Faisal Alfawaz, N. Min-Allah, M. M. Abdulqader","doi":"10.3390/bdcc7030127","DOIUrl":"https://doi.org/10.3390/bdcc7030127","url":null,"abstract":"YouTube is a popular video-sharing platform that offers a diverse range of content. Assessing the quality of a video without watching it poses a significant challenge, especially considering the recent removal of the dislike count feature on YouTube. Although comments have the potential to provide insights into video content quality, navigating through the comments section can be time-consuming and overwhelming work for both content creators and viewers. This paper proposes an NLP-based model to classify Arabic comments as positive or negative. It was trained on a novel dataset of 4212 labeled comments, with a Kappa score of 0.818. The model uses six classifiers: SVM, Naïve Bayes, Logistic Regression, KNN, Decision Tree, and Random Forest. It achieved 94.62% accuracy and an MCC score of 91.46% with NB. Precision, Recall, and F1-measure for NB were 94.64%, 94.64%, and 94.62%, respectively. The Decision Tree had a suboptimal performance with 84.10% accuracy and an MCC score of 69.64% without TF-IDF. This study provides valuable insights for content creators to improve their content and audience engagement by analyzing viewers’ sentiments toward the videos. Furthermore, it bridges a literature gap by offering a comprehensive approach to Arabic sentiment analysis, which is currently limited in the field.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45851972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Industrial Insights on Digital Twins in Manufacturing: Application Landscape, Current Practices, and Future Needs","authors":"R. D. D’Amico, S. Addepalli, J. Erkoyuncu","doi":"10.3390/bdcc7030126","DOIUrl":"https://doi.org/10.3390/bdcc7030126","url":null,"abstract":"The digital twin (DT) research field is experiencing rapid expansion; yet, the research on industrial practices in this area remains poorly understood. This paper aims to address this knowledge gap by sharing feedback and future requirements from the manufacturing industry. The methodology employed in this study involves an examination of a survey that received 99 responses and interviews with 14 experts from 10 prominent UK organisations, most of which are involved in the defence industry in the UK. The survey and interviews explored topics such as DT design, return on investment, drivers, inhibitors, and future directions for DT development in manufacturing. This study’s findings indicate that DTs should possess characteristics such as adaptability, scalability, interoperability, and the ability to support assets throughout their entire life cycle. On average, completed DT projects reach the breakeven point in less than two years. The primary motivators behind DT development were identified to be autonomy, customer satisfaction, safety, awareness, optimisation, and sustainability. Meanwhile, the main obstacles include a lack of expertise, funding, and interoperability. This study concludes that the federation of twins and a paradigm shift in industrial thinking are essential components for the future of DT development.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49253465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. Horani, A. Khatibi, A. Al-Soud, J. Tham, A. Al-Adwan
{"title":"Determining the Factors Influencing Business Analytics Adoption at Organizational Level: A Systematic Literature Review","authors":"O. Horani, A. Khatibi, A. Al-Soud, J. Tham, A. Al-Adwan","doi":"10.3390/bdcc7030125","DOIUrl":"https://doi.org/10.3390/bdcc7030125","url":null,"abstract":"The adoption of business analytics (BA) has become increasingly important for organizations seeking to gain a competitive edge in today’s data-driven business landscape. Hence, understanding the key factors influencing the adoption of BA at the organizational level is decisive for the successful implementation of these technologies. This paper presents a systematic literature review that utilizes the PRISMA technique to investigate the organizational, technological, and environmental factors that affect the adoption of BA. By conducting a thorough examination of pertinent research, this review consolidates the current understanding and pinpoints essential elements that shape the process of adoption. Out of a total of 614 articles published between 2012 and 2022, 29 final articles were carefully chosen. The findings highlight the significance of organizational factors, technological factors, and environmental factors in shaping the adoption of the BA process. By consolidating and analyzing the current body of research, this paper offers valuable insights for organizations aiming to adopt BA successfully and maximize their benefits at the organizational level. The synthesized findings also contribute to the existing literature and provide a foundation for future research in this field.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49649712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Big Data Processing Framework for the Online Roadshow","authors":"Kang-Ren Leow, M. Leow, Lee-Yeng Ong","doi":"10.3390/bdcc7030123","DOIUrl":"https://doi.org/10.3390/bdcc7030123","url":null,"abstract":"The Online Roadshow, a new type of web application, is a digital marketing approach that aims to maximize contactless business engagement. It leverages web computing to conduct interactive game sessions via the internet. As a result, massive amounts of personal data are generated during the engagement process between the audience and the Online Roadshow (e.g., gameplay data and clickstream information). The high volume of data collected is valuable for more effective market segmentation in strategic business planning through data-driven processes such as web personalization and trend evaluation. However, the data storage and processing techniques used in conventional data analytic approaches are typically overloaded in such a computing environment. Hence, this paper proposed a new big data processing framework to improve the processing, handling, and storing of these large amounts of data. The proposed framework aims to provide a better dual-mode solution for processing the generated data for the Online Roadshow engagement process in both historical and real-time scenarios. Multiple functional modules, such as the Application Controller, the Message Broker, the Data Processing Module, and the Data Storage Module, were reformulated to provide a more efficient solution that matches the new needs of the Online Roadshow data analytics procedures. Some tests were conducted to compare the performance of the proposed frameworks against existing similar frameworks and verify the performance of the proposed framework in fulfilling the data processing requirements of the Online Roadshow. The experimental results evidenced multiple advantages of the proposed framework for Online Roadshow compared to similar existing big data processing frameworks.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49162579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katherine Abramski, Salvatore Citraro, L. Lombardi, Giulio Rossetti, Massimo Stella
{"title":"Cognitive Network Science Reveals Bias in GPT-3, GPT-3.5 Turbo, and GPT-4 Mirroring Math Anxiety in High-School Students","authors":"Katherine Abramski, Salvatore Citraro, L. Lombardi, Giulio Rossetti, Massimo Stella","doi":"10.3390/bdcc7030124","DOIUrl":"https://doi.org/10.3390/bdcc7030124","url":null,"abstract":"Large Language Models (LLMs) are becoming increasingly integrated into our lives. Hence, it is important to understand the biases present in their outputs in order to avoid perpetuating harmful stereotypes, which originate in our own flawed ways of thinking. This challenge requires developing new benchmarks and methods for quantifying affective and semantic bias, keeping in mind that LLMs act as psycho-social mirrors that reflect the views and tendencies that are prevalent in society. One such tendency that has harmful negative effects is the global phenomenon of anxiety toward math and STEM subjects. In this study, we introduce a novel application of network science and cognitive psychology to understand biases towards math and STEM fields in LLMs from ChatGPT, such as GPT-3, GPT-3.5, and GPT-4. Specifically, we use behavioral forma mentis networks (BFMNs) to understand how these LLMs frame math and STEM disciplines in relation to other concepts. We use data obtained by probing the three LLMs in a language generation task that has previously been applied to humans. Our findings indicate that LLMs have negative perceptions of math and STEM fields, associating math with negative concepts in 6 cases out of 10. We observe significant differences across OpenAI’s models: newer versions (i.e., GPT-4) produce 5× semantically richer, more emotionally polarized perceptions with fewer negative associations compared to older versions and N=159 high-school students. These findings suggest that advances in the architecture of LLMs may lead to increasingly less biased models that could even perhaps someday aid in reducing harmful stereotypes in society rather than perpetuating them.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42743736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Value of Web Data Scraping: An Application to TripAdvisor","authors":"Gianluca Barbera, Luiz Araujo, Silvia Fernandes","doi":"10.3390/bdcc7030121","DOIUrl":"https://doi.org/10.3390/bdcc7030121","url":null,"abstract":"Social Media Analytics (SMA) is more and more relevant in today’s market dynamics. However, it is necessary to use it wisely, either in promoting any kind of product/brand, or interacting with customers. This requires its effective understanding and monitoring. One way is through web data scraping (WDS) tools that allow to select sites and platforms to compare them in their performances. They can optimize extraction of big data published on social media. Due to current challenges, a sector that can particularly take advantage of this source is tourism (and its related sectors). This year has the hope of tourism’s revival after a pandemic whose impacts are still affecting several activities. Many traders and entrepreneurs have already used these versatile tools. However, do they really know their potential? The present study highlights the use of WDS to collect data from TripAdvisor’s social pages. Besides comparing competitors’ performance, companies also gain new knowledge of unnoticed preferences/habits. This contributes to more interesting innovations and results for them and for their customers. The approach used here is based on a project for smart tourism consultancy, from the identification of a gap in our region, to aid tourism organizations to enhance their digital presence and business model. Many things can be detected in this big source of unstructured data very quickly and easily without programming. Moreover, exploring code, either to refine the web scraper or connect it with other platforms/apps, can be an object of future research to leverage consumer behavior prediction for more advanced interactions.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136295906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wael H. Gomaa, Abdelrahman E. Nagib, Mostafa M. Saeed, Abdulmohsen Algarni, Emad Nabil
{"title":"Empowering Short Answer Grading: Integrating Transformer-Based Embeddings and BI-LSTM Network","authors":"Wael H. Gomaa, Abdelrahman E. Nagib, Mostafa M. Saeed, Abdulmohsen Algarni, Emad Nabil","doi":"10.3390/bdcc7030122","DOIUrl":"https://doi.org/10.3390/bdcc7030122","url":null,"abstract":"Automated scoring systems have been revolutionized by natural language processing, enabling the evaluation of students’ diverse answers across various academic disciplines. However, this presents a challenge as students’ responses may vary significantly in terms of length, structure, and content. To tackle this challenge, this research introduces a novel automated model for short answer grading. The proposed model uses pretrained “transformer” models, specifically T5, in conjunction with a BI-LSTM architecture which is effective in processing sequential data by considering the past and future context. This research evaluated several preprocessing techniques and different hyperparameters to identify the most efficient architecture. Experiments were conducted using a standard benchmark dataset named the North Texas Dataset. This research achieved a state-of-the-art correlation value of 92.5 percent. The proposed model’s accuracy has significant implications for education as it has the potential to save educators considerable time and effort, while providing a reliable and fair evaluation for students, ultimately leading to improved learning outcomes.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136296077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karima Khettabi, Zineddine Kouahla, Brahim Farou, Hamid Seridi, M. Ferrag
{"title":"Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing Level","authors":"Karima Khettabi, Zineddine Kouahla, Brahim Farou, Hamid Seridi, M. Ferrag","doi":"10.3390/bdcc7020119","DOIUrl":"https://doi.org/10.3390/bdcc7020119","url":null,"abstract":"Internet of Things (IoT) systems include many smart devices that continuously generate massive spatio-temporal data, which can be difficult to process. These continuous data streams need to be stored smartly so that query searches are efficient. In this work, we propose an efficient method, in the fog-cloud computing architecture, to index continuous and heterogeneous data streams in metric space. This method divides the fog layer into three levels: clustering, clusters processing and indexing. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is used to group the data from each stream into homogeneous clusters at the clustering fog level. Each cluster in the first data stream is stored in the clusters processing fog level and indexed directly in the indexing fog level in a Binary tree with Hyperplane (BH tree). The indexing of clusters in the subsequent data stream is determined by the coefficient of variation (CV) value of the union of the new cluster with the existing clusters in the cluster processing fog layer. An analysis and comparison of our experimental results with other results in the literature demonstrated the effectiveness of the CV method in reducing energy consumption during BH tree construction, as well as reducing the search time and energy consumption during a k Nearest Neighbor (kNN) parallel query search.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46197027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"YOLO-v5 Variant Selection Algorithm Coupled with Representative Augmentations for Modelling Production-Based Variance in Automated Lightweight Pallet Racking Inspection","authors":"Muhammad Hussain","doi":"10.3390/bdcc7020120","DOIUrl":"https://doi.org/10.3390/bdcc7020120","url":null,"abstract":"The aim of this research is to develop an automated pallet inspection architecture with two key objectives: high performance with respect to defect classification and computational efficacy, i.e., lightweight footprint. As automated pallet racking via machine vision is a developing field, the procurement of racking datasets can be a difficult task. Therefore, the first contribution of this study was the proposal of several tailored augmentations that were generated based on modelling production floor conditions/variances within warehouses. Secondly, the variant selection algorithm was proposed, starting with extreme-end analysis and providing a protocol for selecting the optimal architecture with respect to accuracy and computational efficiency. The proposed YOLO-v5n architecture generated the highest MAP@0.5 of 96.8% compared to previous works in the racking domain, with a computational footprint in terms of the number of parameters at its lowest, i.e., 1.9 M compared to YOLO-v5x at 86.7 M.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43877373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}