Topic Modeling of NASA Space System Problem Reports: Research in Practice

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI:10.1145/2901739.2901760

L. Layman, A. Nikora, Joshua Meek, T. Menzies

{"title":"Topic Modeling of NASA Space System Problem Reports: Research in Practice","authors":"L. Layman, A. Nikora, Joshua Meek, T. Menzies","doi":"10.1145/2901739.2901760","DOIUrl":null,"url":null,"abstract":"Problem reports at NASA are similar to bug reports: they capture defects found during test, post-launch operational anomalies, and document the investigation and corrective action of the issue. These artifacts are a rich source of lessons learned for NASA, but are expensive to analyze since problem reports are comprised primarily of natural language text. We apply {topic modeling to a corpus of NASA problem reports to extract trends in testing and operational failures. We collected 16,669 problem reports from six NASA space flight missions and applied Latent Dirichlet Allocation topic modeling to the document corpus. We analyze the most popular topics within and across missions, and how popular topics changed over the lifetime of a mission. We find that hardware material and flight software issues are common during the integration and testing phase, while ground station software and equipment issues are more common during the operations phase. We identify a number of challenges in topic modeling for trend analysis: 1) that the process of selecting the topic modeling parameters lacks definitive guidance, 2) defining semantically-meaningful topic labels requires non-trivial effort and domain expertise, 3) topic models derived from the combined corpus of the six missions were biased toward the larger missions, and 4) topics must be semantically distinct as well as cohesive to be useful. Nonetheless, topic modeling can identify problem themes within missions and across mission lifetimes, providing useful feedback to engineers and project managers.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"32 1","pages":"303-314"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2901739.2901760","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

Problem reports at NASA are similar to bug reports: they capture defects found during test, post-launch operational anomalies, and document the investigation and corrective action of the issue. These artifacts are a rich source of lessons learned for NASA, but are expensive to analyze since problem reports are comprised primarily of natural language text. We apply {topic modeling to a corpus of NASA problem reports to extract trends in testing and operational failures. We collected 16,669 problem reports from six NASA space flight missions and applied Latent Dirichlet Allocation topic modeling to the document corpus. We analyze the most popular topics within and across missions, and how popular topics changed over the lifetime of a mission. We find that hardware material and flight software issues are common during the integration and testing phase, while ground station software and equipment issues are more common during the operations phase. We identify a number of challenges in topic modeling for trend analysis: 1) that the process of selecting the topic modeling parameters lacks definitive guidance, 2) defining semantically-meaningful topic labels requires non-trivial effort and domain expertise, 3) topic models derived from the combined corpus of the six missions were biased toward the larger missions, and 4) topics must be semantically distinct as well as cohesive to be useful. Nonetheless, topic modeling can identify problem themes within missions and across mission lifetimes, providing useful feedback to engineers and project managers.

查看原文本刊更多论文

NASA空间系统问题报告的主题建模:实践研究

NASA的问题报告类似于bug报告:它们捕获在测试期间发现的缺陷，发射后的操作异常，并记录问题的调查和纠正措施。这些工件为NASA提供了丰富的经验，但是由于问题报告主要由自然语言文本组成，因此分析成本很高。我们将{主题建模应用于NASA问题报告的语料库，以提取测试和操作失败的趋势。我们收集了来自6个NASA太空飞行任务的16,669个问题报告，并将Latent Dirichlet Allocation主题建模应用于文档语料库。我们分析任务内部和任务之间最受欢迎的话题，以及在任务生命周期中流行话题的变化情况。我们发现硬件材料和飞行软件问题在集成和测试阶段很常见，而地面站软件和设备问题在操作阶段更常见。我们在趋势分析的主题建模中发现了一些挑战:1)选择主题建模参数的过程缺乏明确的指导;2)定义语义上有意义的主题标签需要付出巨大的努力和领域专业知识;3)从六个任务的组合语料库中衍生的主题模型偏向于更大的任务;4)主题必须在语义上不同，并且要有凝聚力才能有用。尽管如此，主题建模可以识别任务内和任务生命周期内的问题主题，为工程师和项目经理提供有用的反馈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

自引率

0.00%

发文量