The role of artefact corpus in LSI-based traceability recovery

G. Bavota, A. D. Lucia, R. Oliveto, Annibale Panichella, F. Ricci, G. Tortora
{"title":"The role of artefact corpus in LSI-based traceability recovery","authors":"G. Bavota, A. D. Lucia, R. Oliveto, Annibale Panichella, F. Ricci, G. Tortora","doi":"10.1109/TEFSE.2013.6620160","DOIUrl":null,"url":null,"abstract":"Latent Semantic Indexing (LSI) is an advanced method widely and successfully employed in Information Retrieval (IR). It is an extension of Vector Space Model (VSM) and it is able to overcome VSM in canonical IR scenarios where it is used on very large document repositories. LSI has also been used to semi-automatically generate traceability links between software artefacts. However, in such a scenario LSI is not able to overcome VSM. This contradicting result is probably due to the different characteristics of software artefact repositories as compared to document repositories. In this paper we present a preliminary empirical study to analyze how the size and the vocabulary of the repository-in terms of number of documents and terms (i.e., the vocabulary)-affects the retrieval accuracy. Even if replications are needed to generalize our findings, the study presented in this paper provides some insights that might be used as guidelines for selecting the more adequate methods to be used for traceability recovery depending on the particular application context.","PeriodicalId":330587,"journal":{"name":"2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TEFSE.2013.6620160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Latent Semantic Indexing (LSI) is an advanced method widely and successfully employed in Information Retrieval (IR). It is an extension of Vector Space Model (VSM) and it is able to overcome VSM in canonical IR scenarios where it is used on very large document repositories. LSI has also been used to semi-automatically generate traceability links between software artefacts. However, in such a scenario LSI is not able to overcome VSM. This contradicting result is probably due to the different characteristics of software artefact repositories as compared to document repositories. In this paper we present a preliminary empirical study to analyze how the size and the vocabulary of the repository-in terms of number of documents and terms (i.e., the vocabulary)-affects the retrieval accuracy. Even if replications are needed to generalize our findings, the study presented in this paper provides some insights that might be used as guidelines for selecting the more adequate methods to be used for traceability recovery depending on the particular application context.
工件语料库在基于lsi的可追溯性恢复中的作用
潜在语义索引(LSI)是一种先进的信息检索方法,在信息检索中得到了广泛而成功的应用。它是向量空间模型(VSM)的扩展,它能够在规范的IR场景中克服VSM,在这些场景中,它用于非常大的文档存储库。LSI也被用于半自动地生成软件工件之间的可追溯性链接。但是,在这种情况下,LSI无法克服VSM。这种矛盾的结果可能是由于软件工件存储库与文档存储库的不同特征。在本文中,我们提出了一个初步的实证研究,以分析存储库的大小和词汇表-在文档和术语的数量(即词汇表)-如何影响检索精度。即使需要复制来概括我们的发现,本文中提出的研究也提供了一些见解,可以作为根据特定应用程序上下文选择更适当的方法来用于可追溯性恢复的指导方针。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信