{"title":"Evaluation of Tools for Hairy Requirements and Software Engineering Tasks","authors":"D. Berry","doi":"10.1109/REW.2017.25","DOIUrl":null,"url":null,"abstract":"Context and Motivation A hairy requirements or software engineering task involving natural language (NL) documents is one that is not inherently difficult for NL-understanding humans on a small scale but becomes unmanageable in the large scale. A hairy task demands tool assistance. Because humans need help in carrying out a hairy task completely, a tool for a hairy task should have as close to 100% recall as possible. A hairy task tool that falls short of close to 100% recall that is applied to the development of a high-dependability system may even be useless, because to find the missing information, a human has to do the entire task manually anyway. For a such a tool to have recall acceptably close to 100%, a human working with the tool on the task must achieve better recall than a human working on the task entirely manually. Problem Traditionally, many hairy requirements and software engineering tools have been evaluated mainly by how high their precision is, possibly leading to incorrect conclusions about how effective they are. Principal Ideas This paper describes using recall, a properly weighted F-measure, and a new measure called summarization to evaluate tools for hairy requirements and software engineering tasks and applies some of these measures to several tools reported in the literature. Contribution The finding is that some of these tools are actually better than they were thought to be when they were evaluated using mainly precision or an unweighted F-measure.","PeriodicalId":382958,"journal":{"name":"2017 IEEE 25th International Requirements Engineering Conference Workshops (REW)","volume":"28 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 25th International Requirements Engineering Conference Workshops (REW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/REW.2017.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 51
Abstract
Context and Motivation A hairy requirements or software engineering task involving natural language (NL) documents is one that is not inherently difficult for NL-understanding humans on a small scale but becomes unmanageable in the large scale. A hairy task demands tool assistance. Because humans need help in carrying out a hairy task completely, a tool for a hairy task should have as close to 100% recall as possible. A hairy task tool that falls short of close to 100% recall that is applied to the development of a high-dependability system may even be useless, because to find the missing information, a human has to do the entire task manually anyway. For a such a tool to have recall acceptably close to 100%, a human working with the tool on the task must achieve better recall than a human working on the task entirely manually. Problem Traditionally, many hairy requirements and software engineering tools have been evaluated mainly by how high their precision is, possibly leading to incorrect conclusions about how effective they are. Principal Ideas This paper describes using recall, a properly weighted F-measure, and a new measure called summarization to evaluate tools for hairy requirements and software engineering tasks and applies some of these measures to several tools reported in the literature. Contribution The finding is that some of these tools are actually better than they were thought to be when they were evaluated using mainly precision or an unweighted F-measure.