{"title":"Learning from 6,000 Projects: Mining Models in the Large","authors":"A. Zeller","doi":"10.1109/SCAM.2010.23","DOIUrl":null,"url":null,"abstract":"Models - abstract and simple descriptions of some artifact - are the backbone of all software engineering activities. While writing models is hard, existing code can serve as a source for abstract descriptions of how software behaves. To infer correct usage, code analysis needs usage examples, though, the more, the better. We have built a lightweight parser that efficiently extracts API usage models from source code - models that can then be used to detect anomalies. Applied on the 200 million lines of code of the Gen too Linux distribution, we would extract more than 15 million API constraints, encoding and abstracting the \"wisdom of Linux code\".","PeriodicalId":222204,"journal":{"name":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 10th IEEE Working Conference on Source Code Analysis and Manipulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2010.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Models - abstract and simple descriptions of some artifact - are the backbone of all software engineering activities. While writing models is hard, existing code can serve as a source for abstract descriptions of how software behaves. To infer correct usage, code analysis needs usage examples, though, the more, the better. We have built a lightweight parser that efficiently extracts API usage models from source code - models that can then be used to detect anomalies. Applied on the 200 million lines of code of the Gen too Linux distribution, we would extract more than 15 million API constraints, encoding and abstracting the "wisdom of Linux code".