S. Kinger, B. V. Reddy, Sanket Jadhao, Kaustubh Hambarde, Aamir Hullur
{"title":"Malware Analysis Using Machine Learning Techniques","authors":"S. Kinger, B. V. Reddy, Sanket Jadhao, Kaustubh Hambarde, Aamir Hullur","doi":"10.1109/CONIT55038.2022.9848045","DOIUrl":null,"url":null,"abstract":"The number of malware samples intercepted and analyzed by antivirus providers has increased considerably in recent years. However, much of this software is essentially a repackaged version of malware that has already been identified. Consequently, assessing whether a piece of malware belongs to a known family or exhibits previously identified behavior that requires additional examination has become crucial. Random forest and Decision tree algorithms, as well as hybrid models of both algorithms, have been employed in past studies and research papers. We attempted to introduce an additional prediction technique known as SGD, which delivers good results when a dataset has over 100k variables (In our case 130k). As a result, SGD is one of our study paper's distinguishing characteristics. Our approach has also been tested on both packed and obfuscated malware samples, ensuring that it is both reliable and scalable.","PeriodicalId":270445,"journal":{"name":"2022 2nd International Conference on Intelligent Technologies (CONIT)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT55038.2022.9848045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The number of malware samples intercepted and analyzed by antivirus providers has increased considerably in recent years. However, much of this software is essentially a repackaged version of malware that has already been identified. Consequently, assessing whether a piece of malware belongs to a known family or exhibits previously identified behavior that requires additional examination has become crucial. Random forest and Decision tree algorithms, as well as hybrid models of both algorithms, have been employed in past studies and research papers. We attempted to introduce an additional prediction technique known as SGD, which delivers good results when a dataset has over 100k variables (In our case 130k). As a result, SGD is one of our study paper's distinguishing characteristics. Our approach has also been tested on both packed and obfuscated malware samples, ensuring that it is both reliable and scalable.