Journal of Computer Science最新文献_第6页

Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models 使用基于阿拉伯语 BERT 的模型改进摩洛哥方言情感分析

Journal of Computer Science Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.157.167

Ghizlane Bourahouat, Manar Abourezq, N. Daoudi

{"title":"Improvement of Moroccan Dialect Sentiment Analysis Using Arabic BERT-Based Models","authors":"Ghizlane Bourahouat, Manar Abourezq, N. Daoudi","doi":"10.3844/jcssp.2024.157.167","DOIUrl":"https://doi.org/10.3844/jcssp.2024.157.167","url":null,"abstract":": This study addresses the crucial task of sentiment analysis in natural language processing, with a particular focus on Arabic, especially dialectal Arabic, which has been relatively understudied due to inherent challenges. Our approach centers on sentiment analysis in Moroccan Arabic, leveraging BERT models that are pre-trained in the Arabic language, namely AraBERT, QARIB, ALBERT, AraELECTRA, and CAMeLBERT. These models are integrated alongside deep learning and machine learning algorithms, including SVM and CNN, with additional fine-tuning of the pre-trained model. Furthermore, we examine the impact of data imbalance by evaluating the models on three distinct datasets: An unbalanced set, a balanced set obtained through under-sampling, and a balanced set created by combining the initial dataset with another unbalanced one. Notably, our proposed approach demonstrates impressive accuracy, achieving a notable 96% when employing the QARIB model even on imbalanced data. The novelty of this research lies in the integration of pre-trained Arabic BERT models for Moroccan sentiment analysis, as well as the exploration of their combined use with CNN and SVM algorithms. Furthermore, our findings reveal that employing BERT-based models yields superior results compared to their application in conjunction with CNN or SVM, marking a significant advancement in sentiment analysis for Moroccan Arabic. Our method's effectiveness is highlighted through a comparative analysis with state-of-the-art approaches, providing valuable insights that contribute to the advancement of sentiment analysis in Arabic dialects","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"49 29","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139683801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimization of Expert System Based on Interpolation, Forward Chaining, and Certainty Factor for Diagnosing Abdominal Colic 基于插值、前向连锁和确定性因子的专家系统在诊断腹绞痛方面的优化

Journal of Computer Science Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.191.197

Hari Soetanto, Painem, Muhammad Kamil Suryadewiansyah

{"title":"Optimization of Expert System Based on Interpolation, Forward Chaining, and Certainty Factor for Diagnosing Abdominal Colic","authors":"Hari Soetanto, Painem, Muhammad Kamil Suryadewiansyah","doi":"10.3844/jcssp.2024.191.197","DOIUrl":"https://doi.org/10.3844/jcssp.2024.191.197","url":null,"abstract":": Abdominal colic is a common condition that affects infants and it can be difficult to diagnose because it shares many symptoms with other conditions, such as gastric disease and appendicitis. Limitations of existing diagnostic methods include the unreliability of physical examinations and medical histories and the high cost and time-consuming nature of imaging tests. This research proposes an expert system based on interpolation, forward chaining, and certainty factors for diagnosing abdominal colic. This system has the potential to provide a more accurate and efficient way to diagnose abdominal colic, which could lead to better patient outcomes. This research proposes an expert system based on interpolation, forward chaining, and certainty factors for diagnosing abdominal colic. This system is implemented as a web application model. The forward chaining method is used to establish rules for the expert system. The rules are based on the symptoms and diseases that are included in the system's knowledge base. The interpolation method is used to normalize lab results and the certainty factor method is used to process medical history and physical examinations. This is necessary because medical history and physical examinations can be imprecise. The expert system was tested on a dataset of 100 cases and it was able to accurately diagnose 96 patients, achieving a 96% accuracy rate. This suggests that the expert system has the potential to provide a more accurate and efficient way to diagnose abdominal colic, which could lead to better patient outcomes.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"51 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139687689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting Smartphone Addiction in Teenagers: An Integrative Model Incorporating Machine Learning and Big Five Personality Traits 预测青少年的智能手机成瘾：结合机器学习和五大人格特质的综合模型

Journal of Computer Science Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.181.190

Jacobo Osorio, Marko Figueroa, Lenis Wong

{"title":"Predicting Smartphone Addiction in Teenagers: An Integrative Model Incorporating Machine Learning and Big Five Personality Traits","authors":"Jacobo Osorio, Marko Figueroa, Lenis Wong","doi":"10.3844/jcssp.2024.181.190","DOIUrl":"https://doi.org/10.3844/jcssp.2024.181.190","url":null,"abstract":": Smartphone addiction has emerged as a growing concern in society, particularly among teenagers, due to its potential negative impact on physical, emotional social well-being. The excessive use of smartphones has consistently shown associations with negative outcomes, highlighting a strong dependence on these devices, which often leads to detrimental effects on mental health, including heightened levels of anxiety, distress, stress depression. This psychological burden can further result in the neglect of daily activities as individuals become increasingly engrossed in seeking pleasure through their smartphones. The aim of this study is to develop a predictive model utilizing machine learning techniques to identify smartphone addiction based on the \"Big Five Personality Traits (BFPT)\". The model was developed by following five out of the six phases of the \"Cross Industry Standard Process for Data Mining (CRISP-DM)\" methodology, namely \"business understanding,\" \"data understanding,\" \"data preparation,\" \"modeling,\" and \"evaluation.\" To construct the database, data was collected from a school using the Big Five Inventory (BFI) and the Smartphone Addiction Scale (SAS) questionnaires. Subsequently, four algorithms (DT, RF, XGB LG) were employed the correlation between the personality traits and addiction was examined. The analysis revealed a relationship between the traits of neuroticism and conscientiousness with smartphone addiction. The results demonstrated that the RF algorithm achieved an accuracy of 89.7%, a precision of 87.3% the highest AUC value on the ROC curve. These findings highlight the effectiveness of the proposed model in accurately predicting smartphone addiction among adolescents","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"43 18","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Flock Optimization Algorithm-Based Deep Learning Model for Diabetic Disease Detection Improvement 基于羊群优化算法的深度学习模型用于糖尿病疾病检测改进

Journal of Computer Science Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.168.180

Divager Balasubramaniyan, N. Husin, N. Mustapha, N. Sharef, T.N. Mohd Aris

{"title":"Flock Optimization Algorithm-Based Deep Learning Model for Diabetic Disease Detection Improvement","authors":"Divager Balasubramaniyan, N. Husin, N. Mustapha, N. Sharef, T.N. Mohd Aris","doi":"10.3844/jcssp.2024.168.180","DOIUrl":"https://doi.org/10.3844/jcssp.2024.168.180","url":null,"abstract":": Worldwide, 422 million people suffer from diabetic disease, and 1.5 million die yearly. Diabetes is a threat to people who still fail to cure or maintain it, so it is challenging to predict this disease accurately. The existing systems face data over-fitting issues, convergence problems, non-converging optimization complex predictions, and latent and predominant feature extraction. These issues affect the system's performance and reduce diabetic disease detection accuracy. Hence, the research objective is to create an improved diabetic disease detection system using a Flock Optimization Algorithm-Based Deep Learning Model (FOADLM) feature modeling approach that leverages the PIMA Indian dataset to predict and classify diabetic disease cases. The collected data is processed by a Gaussian filtering approach that eliminates irrelevant information, reducing the overfitting issues. Then flock optimization algorithm is applied to detect the sequence; this process is used to reduce the convergence and optimization problems. Finally, the recurrent neural approach is applied to classify the normal and abnormal features. The entire research implementation result is carried out with the help of the MATLAB program and the results are analyzed with accuracy, precision, recall, computational time, reliability scalability, and error rate measures like root mean square error, mean square error, and correlation coefficients. In conclusion, the system evaluation result produced 99.23% accuracy in predicting diabetic disease with the metrics.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"33 23","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139684102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Impact of Information Reliability and Cloud Computing Efficiency on Website Design and E-Commerce Business in Thailand 信息可靠性和云计算效率对泰国网站设计和电子商务业务的影响

Journal of Computer Science Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.198.206

Charuay Savithi, Arisaphat Suttidee

{"title":"The Impact of Information Reliability and Cloud Computing Efficiency on Website Design and E-Commerce Business in Thailand","authors":"Charuay Savithi, Arisaphat Suttidee","doi":"10.3844/jcssp.2024.198.206","DOIUrl":"https://doi.org/10.3844/jcssp.2024.198.206","url":null,"abstract":": The security and reliability of cloud computing services continue to be major concerns that hinder their widespread adoption. This study explores how information reliability and cloud computing efficiency influence website design and e-commerce business development decisions on cloud computing. The researchers distributed 379 questionnaires to determine the sample size, resulting in a 46.50% response rate of 46.50% with 186 participants. Various statistical tests, including the t-test, the f-test (ANOVA and MANOVA), multiple correlation analysis and multiple regression analysis, are used to analyses the collected data. The results of the study show a positive correlation and influence between the reliability of information, specifically in terms of confidentiality, stability and verifiability and the decision to design and develop websites. Furthermore, the efficiency of cloud computing, particularly in communication and processing, demonstrates a positive relationship and impact on website design and development. These findings highlight the importance for e-commerce business leaders to understand the importance of information reliability and cloud computing efficiency. Recognizing these factors can enhance their competitive advantage in the e-commerce industry and foster consistent and sustainable growth. Research also highlights the contribution of cloud technology and security to increasing confidence in the development of e-commerce businesses.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"22 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139685595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A New Algorithm for Earthquake Prediction Using Machine Learning Methods 利用机器学习方法预测地震的新算法

Journal of Computer Science Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.150.156

N. Jarah, Abbas Hanon Hassin Alasadi, K. M. Hashim

引用次数: 0

Machine Learning Oceanographic Data for Prediction of the Potential of Marine Resources 机器学习海洋学数据，预测海洋资源潜力

Journal of Computer Science Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.129.139

Denny Arbahri, O. Nurhayati, Imam Mudita

{"title":"Machine Learning Oceanographic Data for Prediction of the Potential of Marine Resources","authors":"Denny Arbahri, O. Nurhayati, Imam Mudita","doi":"10.3844/jcssp.2024.129.139","DOIUrl":"https://doi.org/10.3844/jcssp.2024.129.139","url":null,"abstract":": Marine data and information are very important for human survival, therefore this data and information is attractive to investors because of the potential economic value. This data and information has been difficult to obtain, the solution to overcome this is by analyzing oceanographic data for 2009-2019 collected from the marine database belonging to the Agency for the Study and Application of Technology (BPPT). The data is the result of a collaborative marine survey between Indonesian and foreign researchers from various countries who sailed in various Indonesian waters. Raw oceanographic data is converted and classified into Conductivity, Temperature, and Depth (CTD) data as oceanographic data parameters identified as predictor variables (X) that are correlated with each other. CTD data is processed into numeric data attributes that have been labeled for input and training. The data was modeled using the Machine Learning (ML) type Supervised Learning (SL) method with the Decision Tree (DT), Linear Regression (LR) and Random Forest (RF) algorithms which were interpreted according to the characteristics of the CTD data. ML will learn data models to understand and store. Next, the model is evaluated using accuracy metrics by measuring the difference between the predicted value and the actual value to obtain a good prediction model. The prediction results show a salinity level of 34.0 parts per thousand (ppt), meaning that in this area of marine waters salinity will affect the solubility of Oxygen (O 2 ) and play a major role in the sustainability and growth of the fertility level of biological resources which is supported by sea surface temperature conditions 29.2°C. So the salinity values obtained using ML techniques and marine resource potential can be assumed to have a strong correlation. The research results show that the RF model has the lowest level of prediction error based on the values: Mean Square Error (MSE) = 0.007; Root Mean Squared Error (RMSE) = 0.082; Mean Absolute Error (MAE) = 0.007 compared to DT model: MSE = 0.008; RMSE = 0.088; MAE = 0.012 and LR model: MSE = 1.008; RMSE = 1.004; MAE = 0.281. The equivalent RF and DT models have a Determination Coefficient (R 2 ) = 0.999, meaning that a model is created that is good at predicting, compared to the LR model with a value of R 2 = 0.914. The correlation between variables shows that the LR model is very linear with a Correlation Coefficient (r) = 1.000 compared to the DT model (r) = 0.621 and the RF model (r) = 0.379. Therefore the algorithm that has a value of (r) +1 has the best level of accuracy. The use of ML to predict marine resource potential is a relatively new research field, so this research has the potential to contribute data and information as a reference for innovative studies and investment decision material for investors.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"24 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139687562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Periodic Service Behavior Strain Analysis-Based Intrusion Detection in Cloud 基于周期性服务行为应变分析的云入侵检测

Journal of Computer Science Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.140.149

S. Priya, R. S. Ponmagal

{"title":"Periodic Service Behavior Strain Analysis-Based Intrusion Detection in Cloud","authors":"S. Priya, R. S. Ponmagal","doi":"10.3844/jcssp.2024.140.149","DOIUrl":"https://doi.org/10.3844/jcssp.2024.140.149","url":null,"abstract":": The problem of intrusion detection in cloud environments has been well studied. The presence of adversaries would challenge data security in the cloud by generating intrusion attacks towards the cloud data and should be mitigated for the development of the cloud environment. In mitigating intrusion attacks, there exist several techniques in the literature. The method uses different features like frequency of access, payload details, protocol mapping, etc. However, the methods need to improve to achieve the expected performance in detecting intrusion attacks. An efficient Periodic Service Behavior Strain Analysis (PSBSA) is presented to handle this issue. Unlike earlier methods, the PSBSA model analyzes the behavior of users in various time frames like historical, recent, and current spans. The model focused on identifying intrusion attacks in several constraints, not just considering the current nature. The performance of intrusion detection can be improved by viewing the user's behavior in historical, present, and recent timespan. Unlike other approaches, the proposed PSBSA model considers the user's behavior at different times in measuring the user's trust towards intrusion detection. Accordingly, the proposed PSBSA model analyzes the behavior of users under various situations. It examines the behavior in accessing the services at historical, current, and recent times. The method performs Historical Strain Analysis (HSA) Current Strain Analysis (CSA) and Recent Strain Analysis (RSA). HSA analysis is performed according to the historical data, CSA is performed based on the current access data and RSA is performed with the recent access data. The model estimates various legitimacy support values on each analysis to conclude the trust of any user. According to the support values, intrusion detection has been performed. The proposed PSBSA model introduces higher accuracy in intrusion detection in a cloud environment.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"6 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139685196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data Analytics for Imbalanced Dataset 不平衡数据集的数据分析

Journal of Computer Science Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.207.217

Madhura Prabha R, Sasikala S

{"title":"Data Analytics for Imbalanced Dataset","authors":"Madhura Prabha R, Sasikala S","doi":"10.3844/jcssp.2024.207.217","DOIUrl":"https://doi.org/10.3844/jcssp.2024.207.217","url":null,"abstract":": The primary issue in real-time big data classification is imbalanced datasets. Even though we have many balancing techniques to reduce imbalance ratio which is not suitable for big data that has scalability issues. This study is envisioned to explore different balancing techniques with experimental study. We tried comparing the effectiveness of various balancing strategies, including cutting-edge approaches for severely unbalanced data from online repositories. Here we apply SMOTE, SMOTE ENN and SMOTE Tomek balancing algorithms for dermatology, wine quality and diabetes datasets. After balancing the dataset, the balanced dataset is classified with AdaBoost and random forest algorithms. On three datasets, the outcomes show that the classification algorithm with the balancing technique improves the classification performance for imbalanced datasets. Experiment results showed that the SMOTE ENN technique produces higher classification with accuracy than the SMOTE and SMOTE Tomek techniques. The findings are analyzed with other factors like execution time and scalability. Though SMOTE Tomek produces 1.0 for a few datasets, its execution time is longer than SMOTE ENN. Therefore, SMOTE ENN with random forest classification produces 1.0 accuracy for all three datasets with less execution time. This experimental study analyses to create a novel ensemble technique for balancing highly imbalanced data.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"501 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139824265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data Analytics for Imbalanced Dataset 不平衡数据集的数据分析

Journal of Computer Science Pub Date : 2024-02-01 DOI: 10.3844/jcssp.2024.207.217

Madhura Prabha R, Sasikala S

{"title":"Data Analytics for Imbalanced Dataset","authors":"Madhura Prabha R, Sasikala S","doi":"10.3844/jcssp.2024.207.217","DOIUrl":"https://doi.org/10.3844/jcssp.2024.207.217","url":null,"abstract":": The primary issue in real-time big data classification is imbalanced datasets. Even though we have many balancing techniques to reduce imbalance ratio which is not suitable for big data that has scalability issues. This study is envisioned to explore different balancing techniques with experimental study. We tried comparing the effectiveness of various balancing strategies, including cutting-edge approaches for severely unbalanced data from online repositories. Here we apply SMOTE, SMOTE ENN and SMOTE Tomek balancing algorithms for dermatology, wine quality and diabetes datasets. After balancing the dataset, the balanced dataset is classified with AdaBoost and random forest algorithms. On three datasets, the outcomes show that the classification algorithm with the balancing technique improves the classification performance for imbalanced datasets. Experiment results showed that the SMOTE ENN technique produces higher classification with accuracy than the SMOTE and SMOTE Tomek techniques. The findings are analyzed with other factors like execution time and scalability. Though SMOTE Tomek produces 1.0 for a few datasets, its execution time is longer than SMOTE ENN. Therefore, SMOTE ENN with random forest classification produces 1.0 accuracy for all three datasets with less execution time. This experimental study analyses to create a novel ensemble technique for balancing highly imbalanced data.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"293 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139884057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0