Utilizing Web Search Engines for Program Analysis

2010 IEEE 18th International Conference on Program Comprehension Pub Date : 2010-06-30 DOI:10.1109/ICPC.2010.26

D. Ratiu, L. Heinemann

{"title":"Utilizing Web Search Engines for Program Analysis","authors":"D. Ratiu, L. Heinemann","doi":"10.1109/ICPC.2010.26","DOIUrl":null,"url":null,"abstract":"Programming involves representing domain concepts by using programming abstractions. In object-oriented programs, concepts and relations of the business domain are represented as classes, attributes and methods. However, the concepts and relations that logically belong together are scattered across different modules, interleaved with technical concepts, and distorted due to implementation details. In this paper, we present an automatic method to identify logically related concepts and the relations among them. To achieve this, we systematically transform program identifiers into fragments of natural language sentences and check whether these sentence fragments are meaningful for humans. In order to automatically perform such checks, we use the World Wide Web as a knowledge base that contains a huge number of meaningful texts, and use the Google web search engine to validate the meaningfulness of these sentences. If the search engine returns a sufficient number of hits, we discovered a piece of knowledge in the code. By systematically applying this method, we obtain a condensed form of the knowledge embodied in the program which is an enabler for automatic analyses. We present our experience with several use-cases: (1) assessing the meaningfulness of identifiers, (2) extracting complex concepts from compound identifiers, (3) extracting a meaningful taxonomy from the class hierarchy, and (4) extracting complex conceptual relations from the code. We report on our observations during the analysis of real world Java code, discuss the limitations of our approach and sketch extension possibilities.","PeriodicalId":110667,"journal":{"name":"2010 IEEE 18th International Conference on Program Comprehension","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 18th International Conference on Program Comprehension","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC.2010.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Programming involves representing domain concepts by using programming abstractions. In object-oriented programs, concepts and relations of the business domain are represented as classes, attributes and methods. However, the concepts and relations that logically belong together are scattered across different modules, interleaved with technical concepts, and distorted due to implementation details. In this paper, we present an automatic method to identify logically related concepts and the relations among them. To achieve this, we systematically transform program identifiers into fragments of natural language sentences and check whether these sentence fragments are meaningful for humans. In order to automatically perform such checks, we use the World Wide Web as a knowledge base that contains a huge number of meaningful texts, and use the Google web search engine to validate the meaningfulness of these sentences. If the search engine returns a sufficient number of hits, we discovered a piece of knowledge in the code. By systematically applying this method, we obtain a condensed form of the knowledge embodied in the program which is an enabler for automatic analyses. We present our experience with several use-cases: (1) assessing the meaningfulness of identifiers, (2) extracting complex concepts from compound identifiers, (3) extracting a meaningful taxonomy from the class hierarchy, and (4) extracting complex conceptual relations from the code. We report on our observations during the analysis of real world Java code, discuss the limitations of our approach and sketch extension possibilities.

查看原文本刊更多论文

利用网络搜索引擎进行程序分析

编程涉及到通过使用编程抽象来表示领域概念。在面向对象程序中，业务领域的概念和关系表示为类、属性和方法。然而，逻辑上属于一起的概念和关系分散在不同的模块中，与技术概念交织在一起，并且由于实现细节而扭曲。本文提出了一种自动识别逻辑相关概念及其相互关系的方法。为了实现这一点，我们系统地将程序标识符转换为自然语言句子片段，并检查这些句子片段对人类是否有意义。为了自动执行这样的检查，我们使用万维网作为知识库，其中包含大量有意义的文本，并使用谷歌网络搜索引擎来验证这些句子的意义。如果搜索引擎返回足够数量的点击，我们就在代码中发现了一段知识。通过系统地应用这种方法，我们得到了包含在程序中的知识的浓缩形式，这是自动分析的一个使能器。我们介绍了几个用例的经验:(1)评估标识符的意义，(2)从复合标识符中提取复杂概念，(3)从类层次结构中提取有意义的分类法，以及(4)从代码中提取复杂的概念关系。我们报告了在分析真实世界的Java代码期间的观察结果，讨论了我们的方法的局限性，并概述了扩展的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE 18th International Conference on Program Comprehension

自引率

0.00%

发文量