{"title":"A Comparison of Bias Mitigation Techniques for Educational Classification Tasks Using Supervised Machine Learning","authors":"Tarid Wongvorachan, Okan Bulut, Joyce Xinle Liu, Elisabetta Mazzullo","doi":"10.3390/info15060326","DOIUrl":"https://doi.org/10.3390/info15060326","url":null,"abstract":"Machine learning (ML) has become integral in educational decision-making through technologies such as learning analytics and educational data mining. However, the adoption of machine learning-driven tools without scrutiny risks perpetuating biases. Despite ongoing efforts to tackle fairness issues, their application to educational datasets remains limited. To address the mentioned gap in the literature, this research evaluates the effectiveness of four bias mitigation techniques in an educational dataset aiming at predicting students’ dropout rate. The overarching research question is: “How effective are the techniques of reweighting, resampling, and Reject Option-based Classification (ROC) pivoting in mitigating the predictive bias associated with high school dropout rates in the HSLS:09 dataset?\" The effectiveness of these techniques was assessed based on performance metrics including false positive rate (FPR), accuracy, and F1 score. The study focused on the biological sex of students as the protected attribute. The reweighting technique was found to be ineffective, showing results identical to the baseline condition. Both uniform and preferential resampling techniques significantly reduced predictive bias, especially in the FPR metric but at the cost of reduced accuracy and F1 scores. The ROC pivot technique marginally reduced predictive bias while maintaining the original performance of the classifier, emerging as the optimal method for the HSLS:09 dataset. This research extends the understanding of bias mitigation in educational contexts, demonstrating practical applications of various techniques and providing insights for educators and policymakers. By focusing on an educational dataset, it contributes novel insights beyond the commonly studied datasets, highlighting the importance of context-specific approaches in bias mitigation.","PeriodicalId":510156,"journal":{"name":"Information","volume":"9 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141266878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-06-02DOI: 10.3390/info15060325
S. Qadhi, Ahmed Alduais, Youmen Chaaban, M. Khraisheh
{"title":"Generative AI, Research Ethics, and Higher Education Research: Insights from a Scientometric Analysis","authors":"S. Qadhi, Ahmed Alduais, Youmen Chaaban, M. Khraisheh","doi":"10.3390/info15060325","DOIUrl":"https://doi.org/10.3390/info15060325","url":null,"abstract":"In the digital age, the intersection of artificial intelligence (AI) and higher education (HE) poses novel ethical considerations, necessitating a comprehensive exploration of this multifaceted relationship. This study aims to quantify and characterize the current research trends and critically assess the discourse on ethical AI applications within HE. Employing a mixed-methods design, we integrated quantitative data from the Web of Science, Scopus, and the Lens databases with qualitative insights from selected studies to perform scientometric and content analyses, yielding a nuanced landscape of AI utilization in HE. Our results identified vital research areas through citation bursts, keyword co-occurrence, and thematic clusters. We provided a conceptual model for ethical AI integration in HE, encapsulating dichotomous perspectives on AI’s role in education. Three thematic clusters were identified: ethical frameworks and policy development, academic integrity and content creation, and student interaction with AI. The study concludes that, while AI offers substantial benefits for educational advancement, it also brings challenges that necessitate vigilant governance to uphold academic integrity and ethical standards. The implications extend to policymakers, educators, and AI developers, highlighting the need for ethical guidelines, AI literacy, and human-centered AI tools.","PeriodicalId":510156,"journal":{"name":"Information","volume":"31 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141273367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-05-24DOI: 10.3390/info15060301
M. R. Ezilarasan, Man-Fai Leung
{"title":"An Efficient EEG Signal Analysis for Emotion Recognition Using FPGA","authors":"M. R. Ezilarasan, Man-Fai Leung","doi":"10.3390/info15060301","DOIUrl":"https://doi.org/10.3390/info15060301","url":null,"abstract":"Electroencephalography (EEG), electromyography (EMG), galvanic skin response (GSR), and electrocardiogram (ECG) are among the techniques developed for collecting psychophysiological data from humans. This study presents a feature extraction technique for identifying emotions in EEG-based data from the human brain. Independent component analysis (ICA) was employed to eliminate artifacts from the raw brain signals before applying signal extraction to a convolutional neural network (CNN) for emotion identification. These features were then learned by the proposed CNN-LSTM (long short-term memory) algorithm, which includes a ResNet-152 classifier. The CNN-LSTM with ResNet-152 algorithm was used for the accurate detection and analysis of human emotional data. The SEED V dataset was employed for data collection in this study, and the implementation was carried out using an Altera DE2 FPGA development board, demonstrating improved performance in terms of FPGA speed and area optimization.","PeriodicalId":510156,"journal":{"name":"Information","volume":"12 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141100561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-05-24DOI: 10.3390/info15060302
Izdehar M. Aldyaflah, Wenbing Zhao, Shunkun Yang, Xiong Luo
{"title":"The Impact of Input Types on Smart Contract Vulnerability Detection Performance Based on Deep Learning: A Preliminary Study","authors":"Izdehar M. Aldyaflah, Wenbing Zhao, Shunkun Yang, Xiong Luo","doi":"10.3390/info15060302","DOIUrl":"https://doi.org/10.3390/info15060302","url":null,"abstract":"Stemming vulnerabilities out of a smart contract prior to its deployment is essential to ensure the security of decentralized applications. As such, numerous tools and machine-learning-based methods have been proposed to help detect vulnerabilities in smart contracts. Furthermore, various ways of encoding the smart contracts for analysis have also been proposed. However, the impact of these input methods has not been systematically studied, which is the primary goal of this paper. In this preliminary study, we experimented with four common types of input, including Word2Vec, FastText, Bag-of-Words (BoW), and Term Frequency–Inverse Document Frequency (TF-IDF). To focus on the comparison of these input types, we used the same deep-learning model, i.e., convolutional neural networks, in all experiments. Using a public dataset, we compared the vulnerability detection performance of the four input types both in the binary classification scenarios and the multiclass classification scenario. Our findings show that TF-IDF is the best overall input type among the four. TF-IDF has excellent detection performance in all scenarios: (1) it has the best F1 score and accuracy in binary classifications for all vulnerability types except for the delegate vulnerability where TF-IDF comes in a close second, and (2) it comes in a very close second behind BoW (within 0.8%) in the multiclass classification.","PeriodicalId":510156,"journal":{"name":"Information","volume":"7 40","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141099019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-05-24DOI: 10.3390/info15060306
Marco Del-Coco, Marco Leo, P. Carcagnì
{"title":"Machine Learning for Smart Irrigation in Agriculture: How Far along Are We?","authors":"Marco Del-Coco, Marco Leo, P. Carcagnì","doi":"10.3390/info15060306","DOIUrl":"https://doi.org/10.3390/info15060306","url":null,"abstract":"The management of water resources is becoming increasingly important in several contexts, including agriculture. Recently, innovative agricultural practices, advanced sensors, and Internet of Things (IoT) devices have made it possible to improve the efficiency of water use. However, it is the application of control strategies based on advanced machine learning techniques that enables the adoption of smart irrigation scheduling and the immediate economic, social, and environmental benefits. This challenging research area has attracted the attention of many researchers worldwide, who have proposed several technological and methodological solutions. Unfortunately, the results of these scientific efforts have not yet been categorized in a thematic survey, making it difficult to understand how far we are from optimal water management based on machine learning. This paper fills this gap by focusing on smart irrigation systems with an emphasis on machine learning. More specifically, the generic structure of a smart agriculture system is presented, and existing machine learning strategies and available datasets are discussed. Furthermore, several open issues are identified, especially in the processing of long-term data, also due to the lack of corresponding annotated datasets. Finally, some interesting future research directions to be pursued in order to build scalable, domain-independent approaches are proposed.","PeriodicalId":510156,"journal":{"name":"Information","volume":"11 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141102543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-05-24DOI: 10.3390/info15060300
Leandro Stöckli, Luca Joho, Felix Lehner, Thomas Hanne
{"title":"The Personification of ChatGPT (GPT-4)—Understanding Its Personality and Adaptability","authors":"Leandro Stöckli, Luca Joho, Felix Lehner, Thomas Hanne","doi":"10.3390/info15060300","DOIUrl":"https://doi.org/10.3390/info15060300","url":null,"abstract":"Thanks to the publication of ChatGPT, Artificial Intelligence is now basically accessible and usable to all internet users. The technology behind it can be used in many chatbots, whereby the chatbots should be trained for the respective area of application. Depending on the application, the chatbot should react differently and thus, for example, also take on and embody personality traits to be able to help and answer people better and more personally. This raises the question of whether ChatGPT-4 is able to embody personality traits. Our study investigated whether ChatGPT-4’s personality can be analyzed using personality tests for humans. To test possible approaches to measuring the personality traits of ChatGPT-4, experiments were conducted with two of the most well-known personality tests: the Big Five and Myers–Briggs. The experiments also examine whether and how personality can be changed by user input and what influence this has on the results of the personality tests.","PeriodicalId":510156,"journal":{"name":"Information","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141099948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-05-24DOI: 10.3390/info15060304
Mohammed El-Hajj, Pim Beune
{"title":"Decentralized Zone-Based PKI: A Lightweight Security Framework for IoT Ecosystems","authors":"Mohammed El-Hajj, Pim Beune","doi":"10.3390/info15060304","DOIUrl":"https://doi.org/10.3390/info15060304","url":null,"abstract":"The advent of Internet of Things (IoT) devices has revolutionized our daily routines, fostering interconnectedness and convenience. However, this interconnected network also presents significant security challenges concerning authentication and data integrity. Traditional security measures, such as Public Key Infrastructure (PKI), encounter limitations when applied to resource-constrained IoT devices. This paper proposes a novel decentralized PKI system tailored specifically for IoT environments to address these challenges. Our approach introduces a unique “zone” architecture overseen by zone masters, facilitating efficient certificate management within IoT clusters while reducing the risk of single points of failure. Furthermore, we prioritize the use of lightweight cryptographic techniques, including Elliptic Curve Cryptography (ECC), to optimize performance without compromising security. Through comprehensive evaluation and benchmarking, we demonstrate the effectiveness of our proposed solution in bolstering the security and efficiency of IoT ecosystems. This contribution underlines the critical need for innovative security solutions in IoT deployments and presents a scalable framework to meet the evolving demands of IoT environments.","PeriodicalId":510156,"journal":{"name":"Information","volume":"25 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141102820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-05-24DOI: 10.3390/info15060303
Zongfeng Zou, Xiaochen Ji, Yingying Li
{"title":"A Framework Model of Mining Potential Public Opinion Events Pertaining to Suspected Research Integrity Issues with the Text Convolutional Neural Network Model and a Mixed Event Extractor","authors":"Zongfeng Zou, Xiaochen Ji, Yingying Li","doi":"10.3390/info15060303","DOIUrl":"https://doi.org/10.3390/info15060303","url":null,"abstract":"With the development of the Internet, the oversight of research integrity issues has extended beyond the scientific community to encompass the whole of society. If these issues are not addressed promptly, they can significantly impact the research credibility of both institutions and scholars. This article proposes a text convolutional neural network based on SMOTE to identify short texts of potential public opinion events related to suspected scientific integrity issues from common short texts. The SMOTE comprehensive sampling technique is employed to handle imbalanced datasets. To mitigate the impact of short text length on text representation quality, the Doc2vec embedding model is utilized to represent short text, yielding a one-dimensional dense vector. Additionally, the dimensions of the input layer and convolution kernel of TextCNN are adjusted. Subsequently, a short text event extraction model based on TF-IDF and TextRank is proposed to extract crucial information, for instance, names and research-related institutions, from events and facilitate the identification of potential public opinion events related to suspected scientific integrity issues. Results of experiments have demonstrated that utilizing SMOTE to balance the dataset is able to improve the classification results of TextCNN classifiers. Compared to traditional classifiers, TextCNN exhibits greater robustness in addressing the problems of imbalanced datasets. However, challenges such as low information content, non-standard writing, and polysemy in short texts may impact the accuracy of event extraction. The framework can be further optimized to address these issues in the future.","PeriodicalId":510156,"journal":{"name":"Information","volume":"2 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141099224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-05-24DOI: 10.3390/info15060305
Timotej Jagrič, Dušan Fister, Stefan Otto Grbenic, Aljaz Herman
{"title":"Private Firm Valuation Using Multiples: Can Artificial Intelligence Algorithms Learn Better Peer Groups?","authors":"Timotej Jagrič, Dušan Fister, Stefan Otto Grbenic, Aljaz Herman","doi":"10.3390/info15060305","DOIUrl":"https://doi.org/10.3390/info15060305","url":null,"abstract":"Forming optimal peer groups is a crucial step in multiplier valuation. Among others, the traditional regression methodology requires the definition of the optimal set of peer selection criteria and the optimal size of the peer group a priori. Since there exists no universally applicable set of closed and complementary rules on selection criteria due to the complexity and the diverse nature of firms, this research exclusively examines unlisted companies, rendering direct comparisons with existing studies impractical. To address this, we developed a bespoke benchmark model through rigorous regression analysis. Our aim was to juxtapose its outcomes with our unique approach, enriching the understanding of unlisted company transaction dynamics. To stretch the performance of the linear regression method to the maximum, various datasets on selection criteria (full as well as F- and NCA-optimized) were employed. Using a sample of over 20,000 private firm transactions, model performance was evaluated employing multiplier prediction error measures (emphasizing bias and accuracy) as well as prediction superiority directly. Emphasizing five enterprise and equity value multiples, the results allow for the overall conclusion that the self-organizing map algorithm outperforms the traditional linear regression model in both minimizing the valuation error as measured by the multiplier prediction error measures as well as in direct prediction superiority. Consequently, the machine learning methodology offers a promising way to improve peer selection in private firm multiplier valuation.","PeriodicalId":510156,"journal":{"name":"Information","volume":"5 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141099952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
InformationPub Date : 2024-05-23DOI: 10.3390/info15060298
S. N. Nobel, Shirin Sultana, Sondip Poul Singha, S. Chaki, M. J. N. Mahi, Tony Jan, Alistair Barros, Md. Whaiduzzaman
{"title":"Unmasking Banking Fraud: Unleashing the Power of Machine Learning and Explainable AI (XAI) on Imbalanced Data","authors":"S. N. Nobel, Shirin Sultana, Sondip Poul Singha, S. Chaki, M. J. N. Mahi, Tony Jan, Alistair Barros, Md. Whaiduzzaman","doi":"10.3390/info15060298","DOIUrl":"https://doi.org/10.3390/info15060298","url":null,"abstract":"Recognizing fraudulent activity in the banking system is essential due to the significant risks involved. When fraudulent transactions are vastly outnumbered by non-fraudulent ones, dealing with imbalanced datasets can be difficult. This study aims to determine the best model for detecting fraud by comparing four commonly used machine learning algorithms: Support Vector Machine (SVM), XGBoost, Decision Tree, and Logistic Regression. Additionally, we utilized the Synthetic Minority Over-sampling Technique (SMOTE) to address the issue of class imbalance. The XGBoost Classifier proved to be the most successful model for fraud detection, with an accuracy of 99.88%. We utilized SHAP and LIME analyses to provide greater clarity into the decision-making process of the XGBoost model and improve overall comprehension. This research shows that the XGBoost Classifier is highly effective in detecting banking fraud on imbalanced datasets, with an impressive accuracy score. The interpretability of the XGBoost Classifier model was further enhanced by applying SHAP and LIME analysis, which shed light on the significant features that contribute to fraud detection. The insights and findings presented here are valuable contributions to the ongoing efforts aimed at developing effective fraud detection systems for the banking industry.","PeriodicalId":510156,"journal":{"name":"Information","volume":"43 49","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141103823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}