在TraceLab中为软件工程任务配置主题模型

2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE) Pub Date : 2013-05-19 DOI:10.1109/TEFSE.2013.6620164

Bogdan Dit, Annibale Panichella, Evan Moritz, R. Oliveto, M. D. Penta, D. Poshyvanyk, A. D. Lucia

{"title":"在TraceLab中为软件工程任务配置主题模型","authors":"Bogdan Dit, Annibale Panichella, Evan Moritz, R. Oliveto, M. D. Penta, D. Poshyvanyk, A. D. Lucia","doi":"10.1109/TEFSE.2013.6620164","DOIUrl":null,"url":null,"abstract":"A number of approaches in traceability link recovery and other software engineering tasks incorporate topic models, such as Latent Dirichlet Allocation (LDA). Although in theory these topic models can produce very good results if they are configured properly, in reality their potential may be undermined by improper calibration of their parameters (e.g., number of topics, hyper-parameters), which could potentially lead to sub-optimal results. In our previous work we addressed this issue and proposed LDA-GA, an approach that uses Genetic Algorithms (GA) to find a near-optimal configuration of parameters for LDA, which was shown to produce superior results for traceability link recovery and other tasks than reported ad-hoc configurations. LDA-GA works by optimizing the coherence of topics produced by LDA for a given dataset. In this paper, we instantiate LDA-GA as a TraceLab experiment, making publicly available all the implemented components, the datasets and the results from our previous work. In addition, we provide guidelines on how to extend our LDA-GA approach to other IR techniques and other software engineering tasks using existing TraceLab components.","PeriodicalId":330587,"journal":{"name":"2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Configuring topic models for software engineering tasks in TraceLab\",\"authors\":\"Bogdan Dit, Annibale Panichella, Evan Moritz, R. Oliveto, M. D. Penta, D. Poshyvanyk, A. D. Lucia\",\"doi\":\"10.1109/TEFSE.2013.6620164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A number of approaches in traceability link recovery and other software engineering tasks incorporate topic models, such as Latent Dirichlet Allocation (LDA). Although in theory these topic models can produce very good results if they are configured properly, in reality their potential may be undermined by improper calibration of their parameters (e.g., number of topics, hyper-parameters), which could potentially lead to sub-optimal results. In our previous work we addressed this issue and proposed LDA-GA, an approach that uses Genetic Algorithms (GA) to find a near-optimal configuration of parameters for LDA, which was shown to produce superior results for traceability link recovery and other tasks than reported ad-hoc configurations. LDA-GA works by optimizing the coherence of topics produced by LDA for a given dataset. In this paper, we instantiate LDA-GA as a TraceLab experiment, making publicly available all the implemented components, the datasets and the results from our previous work. In addition, we provide guidelines on how to extend our LDA-GA approach to other IR techniques and other software engineering tasks using existing TraceLab components.\",\"PeriodicalId\":330587,\"journal\":{\"name\":\"2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE)\",\"volume\":\"185 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TEFSE.2013.6620164\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TEFSE.2013.6620164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

跟踪性链接恢复和其他软件工程任务中的许多方法都包含主题模型，例如潜狄利克雷分配(Latent Dirichlet Allocation, LDA)。虽然理论上，如果这些主题模型配置得当，可以产生非常好的结果，但实际上，它们的潜力可能会因其参数(例如，主题数量，超参数)的不当校准而受到破坏，这可能会导致次优结果。在我们之前的工作中，我们解决了这个问题，并提出了LDA-GA，这是一种使用遗传算法(GA)为LDA找到接近最优的参数配置的方法，它被证明可以为可跟踪性链接恢复和其他任务产生比报告的特设配置更好的结果。LDA- ga通过优化LDA对给定数据集产生的主题的一致性来工作。在本文中，我们将LDA-GA实例化为TraceLab实验，公开提供所有实现的组件，数据集和我们以前工作的结果。此外，我们还提供了关于如何使用现有TraceLab组件将我们的LDA-GA方法扩展到其他IR技术和其他软件工程任务的指南。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Configuring topic models for software engineering tasks in TraceLab

A number of approaches in traceability link recovery and other software engineering tasks incorporate topic models, such as Latent Dirichlet Allocation (LDA). Although in theory these topic models can produce very good results if they are configured properly, in reality their potential may be undermined by improper calibration of their parameters (e.g., number of topics, hyper-parameters), which could potentially lead to sub-optimal results. In our previous work we addressed this issue and proposed LDA-GA, an approach that uses Genetic Algorithms (GA) to find a near-optimal configuration of parameters for LDA, which was shown to produce superior results for traceability link recovery and other tasks than reported ad-hoc configurations. LDA-GA works by optimizing the coherence of topics produced by LDA for a given dataset. In this paper, we instantiate LDA-GA as a TraceLab experiment, making publicly available all the implemented components, the datasets and the results from our previous work. In addition, we provide guidelines on how to extend our LDA-GA approach to other IR techniques and other software engineering tasks using existing TraceLab components.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE)

自引率

0.00%

发文量