{"title":"Automatic Categorization of Software Libraries Using Bytecode","authors":"Javier Escobar-Avila","doi":"10.1109/ICSE.2015.249","DOIUrl":null,"url":null,"abstract":"Automatic software categorization is the task of assigning categories or tags to software libraries in order to summarize their functionality. Correctly assigning these categories is essential to ensure that relevant libraries can be easily retrieved by developers from large repositories. Current categorization approaches rely on the semantics reflected in the source code, or use supervised machine learning techniques, which require a set of labeled software as a training data. These approaches fail when such information is not available. We propose a novel unsupervised approach for the automatic categorization of Java libraries, which uses the bytecode of a library in order to determine its category. We show that the approach is able to successfully categorize libraries from the Apache Foundation Repository.","PeriodicalId":330487,"journal":{"name":"2015 IEEE/ACM 37th IEEE International Conference on Software Engineering","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE/ACM 37th IEEE International Conference on Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2015.249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Automatic software categorization is the task of assigning categories or tags to software libraries in order to summarize their functionality. Correctly assigning these categories is essential to ensure that relevant libraries can be easily retrieved by developers from large repositories. Current categorization approaches rely on the semantics reflected in the source code, or use supervised machine learning techniques, which require a set of labeled software as a training data. These approaches fail when such information is not available. We propose a novel unsupervised approach for the automatic categorization of Java libraries, which uses the bytecode of a library in order to determine its category. We show that the approach is able to successfully categorize libraries from the Apache Foundation Repository.
自动软件分类是为软件库分配类别或标签以总结其功能的任务。正确分配这些类别对于确保开发人员可以从大型存储库中轻松检索相关库至关重要。当前的分类方法依赖于源代码中反映的语义,或者使用监督机器学习技术,这需要一组标记软件作为训练数据。当无法获得这些信息时,这些方法就会失败。我们提出了一种新的无监督的Java库自动分类方法,该方法使用库的字节码来确定其类别。我们展示了该方法能够成功地对Apache Foundation Repository中的库进行分类。