{"title":"Assessing Feature Selection Techniques for Machine Learning Models using Cardiac Dataset","authors":"Shital Patil, Surendra Bhosale","doi":"10.1109/AIKE55402.2022.00027","DOIUrl":"https://doi.org/10.1109/AIKE55402.2022.00027","url":null,"abstract":"Cardiac disorders are the leading causes of morbidity and mortality in the world, accounting for a large number of deaths over the last few decades, and have emerged as the most life-threatening disease globally. Machine learning and Artificial intelligence have been playing key role in predicting the heart diseases. A relevant set of feature can be very helpful in predicting the disease accurately. In this study, we proposed a comparative analysis of 4 different features selection methods and evaluated their performance with both raw (Unbalanced dataset) and sampled (Balanced) dataset. The publicly available Z-Alizadeh Sani dataset have been used for this study. Four different feature selection techniques: Data Analysis, minimum Redundancy maximum Relevance (mRMR), and Recursive Feature Elimination (RFE) are used in this study. These methods are tested with 8 different classification models to get the best accuracy possible. Using balanced and unbalanced dataset, the study shows promising results in terms of various performance metrics in accurately predicting heart disease. Experimental results obtained by the proposed method with the raw data obtains maximum AUC of 100%, maximum F1 score of 94%, maximum SENS of 98%, maximum precision (PREC) of 93%. While with the balanced dataset obtained results are, maximum AUC of 100%, F1-score 95%, maximum SENS of 95%, maximum PREC of 97%.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123521923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SimE4KG: Distributed and Explainable Multi-Modal Semantic Similarity Estimation for Knowledge Graphs","authors":"C. Draschner, Hajira Jabeen, Jens Lehmann","doi":"10.1109/AIKE55402.2022.00007","DOIUrl":"https://doi.org/10.1109/AIKE55402.2022.00007","url":null,"abstract":"In recent years, more and more exciting sources of data have been modeled as Knowledge Graphs (KGs). This modeling represents both structural relationships and the entity specific multi-modal data in KGs. In various data analytic pipelines and Machine Learning (ML), the task of semantic similarity estimation plays a significant role. Assigning similarity values to entity pairs is needed in recommendation systems, clustering, classification, entity matching/disambiguation and many others. Efficient and scalable frameworks are needed to handle the quadratic complexity of all pair semantic similarity on Big Data KGs. Moreover, heterogeneous KGs demand multi-modal semantic similarity estimation to cover the versatile content like categorical relations between classes or their attribute literals like strings, timestamp or numeric data. In this paper we propose SimE4KG framework as a resource providing generic open source modules that compute semantic similarity estimation in multi-modal KGs. To justify the computational costs of similarity estimation, the SimE4KG generates reproducible, reusable and explainable results. The pipeline results are a native semantic RDF-KG, including the experiment results, hyper-parameter setup, and explanation of the results, like the most influential features. For fast and scalable execution in memory, we implemented the distributed approach using Apache Spark. The entire development of this framework is integrated into the holistic distributed semantic analytics stack SANSA.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127044600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the Program Chairs","authors":"Srimat, Chakradhar, T. Nanya, C. Ravikumar","doi":"10.1109/WORDS.2005.41","DOIUrl":"https://doi.org/10.1109/WORDS.2005.41","url":null,"abstract":"A high quality review process was done by the highly qualified program committee members, and each paper was reviewed by at least three independent reviewers (and about four review reports in average). We would like to appreciate the efforts of the Program Vice Chairs: Shi-Jinn Horng, National United University, Taiwan; Sigi Benkner, University of Vienna, Austria, Michela Taufer, University of Delaware, USA; Joao Pedro Sousa, George Mason University, USA; Rosa M. Badia, Barcelona Supercomputing Center, Spain; Rajeev Raje, Indiana University Purdue University Indianapolis, USA; Horacio Gonzalez-Velez, the Robert Gordon University, UK; Gagan Agrawal, Ohio State University, USA; Sabri Pllana, University of Vienna, Austria; Heng Tao Shen, University of Queensland, Australia; Joanna Kolodziej, University of Bielsko-Biala, Poland; Keqiu Li, Dalian University of Technology, China; Bessam Abdulrazak, University of Sherbrooke, Canada; Eli Katsiri Birkbeck, University of London, UK; Bo Yang, University of Electronic Science and Technology of China, China; Guojun Wang, Central South University, China. They all greatly assisted us during the conference organization, from setting up the program committees of their respective tracks to the reviewing process and paper selection for the program of the conference with reputed experts in their track fields. We would like to extend our thanks to the program committee members and to additional reviewers that contributed their valuable time and expertise to provide professional reviews and very interesting feedback to authors in a narrow time schedule.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114430256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An approach for recommending relevant articles in news portal based on Doc2Vec","authors":"Bogdan Walek, Patrik Müller","doi":"10.1109/AIKE55402.2022.00010","DOIUrl":"https://doi.org/10.1109/AIKE55402.2022.00010","url":null,"abstract":"News portals are among the most popular websites, and their main goal is to bring the latest news to their readers. Also, it is important to provide relevant content to various types of readers. In this article, we propose an approach for recommending relevant articles on the news portal based on the content of a specific article. The proposed approach is based on Doc2Vec. The main steps of the proposed approach and training of the Doc2Vec model are described. The article also deals with text similarity problems and limitations of the Czech language in the context of recommending relevant articles. For experiment verification of our approach, random articles from the selected news portal were selected. For each article, our approach recommends the most relevant similar articles. Then, the relevant and irrelevant articles were marked. And finally, the ratio of proposed relevant articles for each random article was calculated. The experimental results show the accuracy and relevancy of the proposed approach.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122053040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gengjia Zhang, Siho Shin, Jaehyo Jung, Meina Li, Y. Kim
{"title":"Development of a Variety of Fast Machine Learning Model for ECG-based Arrhythmia Classifier","authors":"Gengjia Zhang, Siho Shin, Jaehyo Jung, Meina Li, Y. Kim","doi":"10.1109/AIKE55402.2022.00021","DOIUrl":"https://doi.org/10.1109/AIKE55402.2022.00021","url":null,"abstract":"Although deep learning has been proving its capability in various fields, training and testing by learning a large amount of data and deep neural networks remain time consuming. To address this issue, a high-performance GPU and CPU, SSD storage, and a large amount of RAM is required, which is expensive. We propose a new classifier algorithm by feature point extraction that can be trained and tested quickly. The performance of the proposed algorithm was verified by classifying heart diseases by applying the MIT-BIH arrhythmia data set. First, the noise was removed by Wavelet transform, and feature points were extracted using root mean square (RMS), crest factor, margin factor, form factor, kurtosis, and pulse factor. Then, the performance was compared using various classification algorithms. The two feature extraction methods are compared to evaluate the accuracy of each algorithm, the execution time of the model during training, and the memory usage. Our proposed algorithm is applied to various health care systems such as heart disease and depression, and it is predicted that it will be able to help users toward health care at low cost.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128487746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cheat Detection Through Temporal Inference of Constrained Orders for Subsequences","authors":"Jon Rogers, R. S. Aygün, L. Etzkorn","doi":"10.1109/AIKE55402.2022.00014","DOIUrl":"https://doi.org/10.1109/AIKE55402.2022.00014","url":null,"abstract":"For select domains and datasets, duplicates may be, in part or in whole, instances of cheating. We may specifically observe this for Sony's PlayStation Network (PSN) that services the world's most popular gaming platform. The key to cheat detection in like domains is the ability to perform temporal deduplication. Temporal data is increasingly prevalent and is not well suited to traditional similarity and distance-based deduplication techniques. We strengthen the well-established Adaptive Sorted Neighborhood Method (ASNM) with an approach for temporal data domains ($text{ASNM}+text{LCS}$) that applies ASNM, infers attribute metadata, and further detects duplicates through inference of temporal ordering requirements using Longest Common Subsequence (LCS) for records of a shared type. Using LCS, we split each record's temporal sequence into constrained and unconstrained sequences. We flag suspicious (errant) records that are non-adherent to the inferred constrained order and we flag a record as a duplicate if its unconstrained order, of sufficient length, matches that of another record. ASNM and $text{ASNM}+text{LCS}$ were evaluated against a labeled dataset of 22,794 records from PSN trophy data where duplication may be indicative of cheating. $text{ASNM}+text{LCS}$ F1 scores outperformed ASNM at every similarity threshold with at least 32% improvement. ASNM's best performance was an F1 of. 708 at the 0.99 threshold; $text{ASNM}+text{LCS}$ yielded an F1 of. 938. The significant performance improvement costs little overhead as $text{ASNM}+text{LCS}$ averaged only 3.79% additional runtime.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"89 15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125004197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lavdim Halilaj, J. Luettin, C. Henson, Sebastian Monka
{"title":"Knowledge Graphs for Automated Driving","authors":"Lavdim Halilaj, J. Luettin, C. Henson, Sebastian Monka","doi":"10.1109/AIKE55402.2022.00023","DOIUrl":"https://doi.org/10.1109/AIKE55402.2022.00023","url":null,"abstract":"Automated Driving (AD) datasets, when used in combination with deep learning techniques, have enabled significant progress on difficult AD tasks such as perception, trajectory prediction and motion planning. These datasets represent the content of driving scenes as captured by various sensors, including cameras, RADAR, and LiDAR, along with 2D/3D annotations of traffic participants. Such datasets, however, often fail to capture and to represent the spatial, temporal, functional, and semantic relations between entities in a scene. This lack of knowledge leads to a shallow understanding of the true complexity and dynamics inherent in a driving scene. In this paper, we argue that a knowledge graph based representation of driving scenes, that provides a richer structure and semantics, will lead to further improvements in automated driving. Towards this goal, we developed a layered architecture and ontologies for specific automated driving datasets and a fundamental ontology of shared concepts. We also built knowledge graphs (KG) for three different AD datasets. We perform an analysis w.r.t. information contained in the AD KGs and outline how the additional semantic information contained in the KGs could improve the performance of different AD tasks. Moreover, example queries are provided to retrieve relevant information that can be exploited for augmenting the AD pipelines. All artefacts needed for reproducability purposes are provided via a Dropbox folder11shorturl.at/iwyCV - we will go through an internal approval process for making all artefacts publicly available. We removed our internal namespaces of reused ontologies, because of confidentiality and to provide self-contained ontologies. As the original datasets are under specific licences we can not publish the KGs themselves, but we provided the scripts to generate them.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126306078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Effects of Model Capacity in Modelling Variability between Training and Testing Environments for Automatic Speech Recognition","authors":"Anwar Tantawy, D. O'Shaughnessy","doi":"10.1109/AIKE55402.2022.00016","DOIUrl":"https://doi.org/10.1109/AIKE55402.2022.00016","url":null,"abstract":"Automatic Speech Recognition (ASR) applications have increased greatly during the last decade due to the emergence of new devices and home automation hardware that can benefit a lot from allowing users to interact hands free, such as smart watches, earbuds, portable translators and home assistants. ASR implemented for these applications inevitably suffers from performance degradation in real life scenarios. Most ASR systems expect that the working environments are similar to the training environment, which is often not the case, especially for new applications with limited data availability. This study is concerned with experimentally showing the effect of variations in the environment on different ASR models and the capacity of different models to improve performance when provided with training data similar to the testing environment. The experiments were conducted using discrepant training and testing datasets with varying levels of discrepancy. These tests can help researchers for novel applications identify suitable models according to the anticipated variabilities between the training data used and the real-life application.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128964614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingjun Lei, Pradeep Chintam, C. Luo, Shahram Rahimi
{"title":"Multi-Robot Directed Coverage Path Planning in Row-based Environments","authors":"Tingjun Lei, Pradeep Chintam, C. Luo, Shahram Rahimi","doi":"10.1109/AIKE55402.2022.00025","DOIUrl":"https://doi.org/10.1109/AIKE55402.2022.00025","url":null,"abstract":"Multiple autonomous robots are deployed to fulfill tasks collaboratively in real-world applications with row-based settings as found in precision agriculture, warehouses, factory inspections, and wind farms. One batch of robots are assigned to explore, search and localize objects in large-scale row-based environments, while the other batch of robots move directly to the detected targets to retrieve the objects. In this paper, a multi-robot collaborative navigation framework with two different batches of robots is proposed to explore the environment and achieve the obtained targets, respectively. The first batch of robots act as detection robots, which are driven by a proposed informative-based directed coverage path planning (DCPP) through a multi-robot minimum spanning tree algorithm. It refines and optimizes the coverage path based on the information gained from the environment. The second batch of robot reaches the multiple targets by guidance from a hub-based multi-target routing (HMTR) scheme, which is applicable to row-based environments. The feasibility and effectiveness of the proposed methods are validated by simulation and comparison studies.","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121332331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Woosang Cho, Hojun Lee, Sangchul Han, Young-Sup Hwang, Seong-je Cho
{"title":"Sustainability of Machine Learning-based Android Malware Detection Using API calls and Permissions","authors":"Woosang Cho, Hojun Lee, Sangchul Han, Young-Sup Hwang, Seong-je Cho","doi":"10.1109/AIKE55402.2022.00009","DOIUrl":"https://doi.org/10.1109/AIKE55402.2022.00009","url":null,"abstract":"As the Android platform and malicious apps continue to evolve, most existing Android malware detection techniques using machine learning are turning out to be unsustainable. In this paper, we propose machine learning-based Android malware detection techniques which uses both API calls and permissions as a feature set. These features are complementary and are often used to detect malicious apps. We first analyze whether a ‘yearly dataset-based trained classifier’ (YDataC) is sustainable or not. The ‘yearly dataset-based trained classifier’ refers to the classifier that learns from 80% of the dataset of a specific year from 2014 to 2021, and is tested with 20% of the datasets of every year between 2014 and 2021. Through experiments, we discovered that the classification rate has dropped significantly since 2019, and something big has changed. Therefore, the ‘yearly dataset-based trained classifier’ is judged to be unsustainable. Next, we present and evaluate two incremental learning methods for gradual training: an incrementally trained Random Forest (RF) and an incrementally trained Neural Network (NN). Evaluation results show that two incremental learning classifiers have better sustainability than the ‘yearly dataset-based trained classifier’. The incrementally trained RF has better sustainability than the incrementally trained NN in terms of given metrics such as $f_{1} score$ and AUT (Area under Time).","PeriodicalId":441077,"journal":{"name":"2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129018594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}