{"title":"Developing a hyperparameter optimization method for classification of code snippets and questions of stack overflow: HyperSCC","authors":"M. Öztürk","doi":"10.4108/eai.27-5-2022.174084","DOIUrl":"https://doi.org/10.4108/eai.27-5-2022.174084","url":null,"abstract":"Although there exist various machine learning and text mining techniques to identify the programming language of complete code files, multi-label code snippet prediction was not considered by the research community. This work aims at devising a tuner for multi-label programming language prediction of stack overflow posts. To that end, a Hyper Source Code Classifier (HyperSCC) is devised along with rule-based automatic labeling by considering the bottlenecks of multi-label classification. The proposed method is evaluated on seven multi-label predictors to conduct an extensive analysis. The method is further compared with the three competitive alternatives in terms of one-label programming language prediction. HyperSCC outperformed the other methods in terms of the F1 score. Preprocessing results in a high reduction (50%) of training time when ensemble multi-label predictors are employed. In one-label programming language prediction, Gradient Boosting Machine (gbm) yields the highest accuracy (0.99) in predicting R posts that have a lot of distinctive words determining labels. The findings support the hypothesis that multi-label predictors can be strengthened with sophisticated feature selection and labeling approaches.","PeriodicalId":43034,"journal":{"name":"EAI Endorsed Transactions on Scalable Information Systems","volume":"2012 1","pages":"e5"},"PeriodicalIF":1.3,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86421451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amel Laidi, Mohammed Ammar, Mostafa EL HABIB DAHO, S. Mahmoudi
{"title":"GAN Data Augmentation for Improved Automated Atherosclerosis Screening from Coronary CT Angiography","authors":"Amel Laidi, Mohammed Ammar, Mostafa EL HABIB DAHO, S. Mahmoudi","doi":"10.4108/eai.17-5-2022.173981","DOIUrl":"https://doi.org/10.4108/eai.17-5-2022.173981","url":null,"abstract":"INTRODUCTION: Atherosclerosis is a chronic medical condition that can result in coronary artery disease, strokes, or even heart attacks. early detection can result in timely interventions and save lives. OBJECTIVES: In this work, a fully automatic transfer learning-based model was proposed for Atherosclerosis detection in coronary CT angiography (CCTA). The model’s performance was improved by generating training data using a Generative Adversarial Network. METHODS: A first experiment was established on the original dataset with a Resnet network, reaching 95.2% accuracy, 60.8% sensitivity, 99.25% specificity and 90.48% PPV. A Generative Adversarial Network (GAN) was then used to generate a new set of images to balance the dataset, creating more positive images. Experiments were made adding from 100 to 1000 images to the dataset. RESULTS: adding 1000 images resulted in a small drop in accuracy to 93.2%, but an improvement in overall performance with 89.0% sensitivity, 97.37% specificity and 97.13% PPV. CONCLUSION: This paper was one of the early research projects investigating the e ffi ciency of data augmentation using GANs for atherosclerosis, with results comparable to the state of the art. long as the original work is properly cited.","PeriodicalId":43034,"journal":{"name":"EAI Endorsed Transactions on Scalable Information Systems","volume":"22 1","pages":"e4"},"PeriodicalIF":1.3,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78729354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Data Clustering using Dynamic Crow Search Algorithm","authors":"Rajesh Ranjan, J. Chhabra","doi":"10.4108/eai.17-5-2022.173982","DOIUrl":"https://doi.org/10.4108/eai.17-5-2022.173982","url":null,"abstract":"This work proposes Automatic clustering using Dynamic Crow Search Algorithm, which updates its parameters dynamically. Crow Search is a recently proposed algorithm that imitates the working of crow. Clustering is an essential aspect of data analysis whose significance has increased manifold since the advancements of technology which has led to enormous data generation, which need to be analysed in real-time. Automatic clustering detects optimal cluster numbers and produces sustainable cluster centroids. ACDCSA uses Cluster Validity using Nearest Neighbour as an internal validity measure that acts as a fitness function to find the optimal cluster centres. The present work is compared with some well-known other meta-heuristic search algorithms like PSO, DE, WOA and GWO for the automatic clustering task over seven benchmark clustering datasets. Inter-cluster distance, intra-cluster distance and the optimal cluster number produced are used to assess the performance of ACDCSA.","PeriodicalId":43034,"journal":{"name":"EAI Endorsed Transactions on Scalable Information Systems","volume":"10 4","pages":"e5"},"PeriodicalIF":1.3,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72410373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forecasting Diabetes Correlated Non-alcoholic Fatty Liver Disease by Exploiting Naïve Bayes Tree","authors":"S. Reddy, Nilambar Sethi, R. Rajender, G. Mahesh","doi":"10.4108/eai.29-4-2022.173975","DOIUrl":"https://doi.org/10.4108/eai.29-4-2022.173975","url":null,"abstract":"INTRODUCTION: In recent years, non-alcoholic fatty liver disease (NAFLD) has been identified as the most vulnerable chronic disease. Fat is accumulated in the liver cells of persons with NAFLD. Diabetes is the most common ailment among people of all ages, so it is critical to recognize and prevent its adverse effects. OBJECTIVES: A relevant dataset with appropriate features was selected. Ensemble algorithms were applied for the prediction task, and finally, the method with the best performance was extracted. METHODS: In addition to Ensemble approaches namely bagging, Random forest and Ada-boost, individual classifiers Naive Bayes (NB) and C4.5 Decision tree were considered. These ML techniques were compared with the proposed NB tree algorithm, a combination of C4.5 and Naive Bayes. RESULTS: The following evaluation parameters were computed for each analyzed algorithm: accuracy, detection rate, negative predictive value (NPV), false negative rate (FNR), and false positive rate (FPR). The algorithms are then compared based on these metrics to determine the best algorithm. The NB tree was obtained to be the best method with 97.55% accuracy, 0.4853 detection rate, 0.9615 NPV, 0.0388 FNR, and 0.0099 FPR. CONCLUSION: The NB tree outperformed individual Naive bayes and C4.5 classifiers, and the other techniques studied. The developed algorithm could be applied in NAFLD-related research. accuracy, detection rate, NPV, FNR and FPR, diabetes mellitus (DM).","PeriodicalId":43034,"journal":{"name":"EAI Endorsed Transactions on Scalable Information Systems","volume":"34 1","pages":"e2"},"PeriodicalIF":1.3,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86123202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Patil, T. Pattewar, Shailendra M. Pardeshi, Vipul D. Punjabi, Rajnikant Wagh
{"title":"Learning to Detect Phishing Web Pages Using Lexical and String Complexity Analysis","authors":"D. Patil, T. Pattewar, Shailendra M. Pardeshi, Vipul D. Punjabi, Rajnikant Wagh","doi":"10.4108/eai.20-4-2022.173950","DOIUrl":"https://doi.org/10.4108/eai.20-4-2022.173950","url":null,"abstract":"Phishing is the most common and effective sort of attack employed by cybercriminals to deceive and steal sensitive information from innocent Web users. Researchers have developed major solutions to deal with this problem in recent years, but there are still a number of open challenges due to the ever-changing nature of phishing attacks. To discriminate between benign and phishing URLs, this paper proposes a static method based on lexical and string complexity analysis and distinguishing URL features. Proposed approach has been evaluated on the basis of two state of the art online learning classifiers. The confidence weighted learning classifier achieved a significant phishing URL detection accuracy of 98.35 %, error-rate of 1.65%, FPR of 0.026 and FNR of 0.005. Also, adaptive regularization of weight classifier achieved accuracy of 97.28%, error-rate of 2.72%, FPR of 0.000 and FNR of 0.052. Similar approach shows the improvement in the detection of the phishing web pages.","PeriodicalId":43034,"journal":{"name":"EAI Endorsed Transactions on Scalable Information Systems","volume":"87 1","pages":"e1"},"PeriodicalIF":1.3,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77225150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of Features Reduction on Machine Learning Based Intrusion Detection Systems","authors":"Masooma Fatima, O. Rehman, Ibrahim M. H. Rahman","doi":"10.4108/eetsis.vi.447","DOIUrl":"https://doi.org/10.4108/eetsis.vi.447","url":null,"abstract":"INTRODUCTION: As the use of the internet is increasing rapidly, cyber-attacks over user’s personal data and network resources are on the rise. Due to the easily accessible cyber-attack tools, attacks on cyber resources are becoming common including Distributed Denial-of-Service (DDoS) attacks. Intruders are using enhanced techniques for executing DDoS attacks. OBJECTIVES: Machine Learning (ML) based classification modules integrated with Intrusion Detection System (IDS) has the potential to detect cyber-attacks. This research aims to study the performance of several machine learning algorithms, namely Naïve Bayes, Decision Tree, Random Forest, and Support Vector Machine in classifying DDoS attacks from normal traffic. METHODS: The paper focuses on DDoS attacks identification for which multiclass dataset is being used including Smurf, SIDDoS, HTTP-Flood and UDP-Flood. balanced datasets are used for both training and testing purposes in order to obtain biased free results. four experimental scenarios are conducted in which each experiment contains a different set of reduced features. RESULTS: Result of each experiment is computed individually and the best algorithm among the four is highlighted by mean of its accuracy, detection rates and processing time required to build and test the classifiers. CONCLUSION: Based on all experimental results, it is found that Decision Tree algorithm has shown promising cumulative performances in terms of the metrics investigated.","PeriodicalId":43034,"journal":{"name":"EAI Endorsed Transactions on Scalable Information Systems","volume":"23 1","pages":"e9"},"PeriodicalIF":1.3,"publicationDate":"2022-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86989845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RETRACTED: Encoder-decoder structure based on conditional random field for building extraction in remote sensing images [EAI Endorsed Scal Inf Syst (2022), Online First]","authors":"Yian Xu","doi":"10.4108/eai.8-4-2022.173801","DOIUrl":"https://doi.org/10.4108/eai.8-4-2022.173801","url":null,"abstract":"","PeriodicalId":43034,"journal":{"name":"EAI Endorsed Transactions on Scalable Information Systems","volume":"158 1","pages":"e15"},"PeriodicalIF":1.3,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76879857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RETRACTED: Feature extraction of dance movement based on deep learning and deformable part model [EAI Endorsed Scal Inf Syst (2022), Online First]","authors":"Shuang Gao, Xiaowei Wang","doi":"10.4108/eai.8-4-2022.173790","DOIUrl":"https://doi.org/10.4108/eai.8-4-2022.173790","url":null,"abstract":"","PeriodicalId":43034,"journal":{"name":"EAI Endorsed Transactions on Scalable Information Systems","volume":"30 1","pages":"23"},"PeriodicalIF":1.3,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73079086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}