{"title":"The need for a systematic machine-learning process: A proposal via a mobile malware classification case study","authors":"Gürol Canbek","doi":"10.1109/ISCTURKEY53027.2021.9654378","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) seems a highly promising solution for many problems in many domains including healthcare and cyber security. Researchers and practitioners try to make use of ML with high expectations of a return of investment in terms of not only money but also effort and time. Those expectations might become similar to “if your only tool is a hammer, then every problem looks like nails” mood. Conducting anML workflow efficiently and correctly is difficult to achieve in reality considering both ML challenges and domain-specific issues. Hence, the interaction and dependencies between ML and domain should be clearly addressed and the steps should be planned and conducted according to certain requirements. This study provides insights into achieving such goals through a systematic ML process that should be conducted from beginning to end. The systematic process is designed as a cycle with eight sub-processes going through introduced spaces (file, sample, class, feature, dataset, model, and finally metric spaces). The dataset quality analysis/comparison sub-process is specifically formed as a quality control gateway. The proposed process is explained via a case study of the Android mobile malware classification problem domain where practical and research problems, as well as possible solutions, are provided.","PeriodicalId":383915,"journal":{"name":"2021 International Conference on Information Security and Cryptology (ISCTURKEY)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information Security and Cryptology (ISCTURKEY)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCTURKEY53027.2021.9654378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Machine learning (ML) seems a highly promising solution for many problems in many domains including healthcare and cyber security. Researchers and practitioners try to make use of ML with high expectations of a return of investment in terms of not only money but also effort and time. Those expectations might become similar to “if your only tool is a hammer, then every problem looks like nails” mood. Conducting anML workflow efficiently and correctly is difficult to achieve in reality considering both ML challenges and domain-specific issues. Hence, the interaction and dependencies between ML and domain should be clearly addressed and the steps should be planned and conducted according to certain requirements. This study provides insights into achieving such goals through a systematic ML process that should be conducted from beginning to end. The systematic process is designed as a cycle with eight sub-processes going through introduced spaces (file, sample, class, feature, dataset, model, and finally metric spaces). The dataset quality analysis/comparison sub-process is specifically formed as a quality control gateway. The proposed process is explained via a case study of the Android mobile malware classification problem domain where practical and research problems, as well as possible solutions, are provided.