Assessment of software testing and quality assurance in natural language processing applications and a linguistically inspired approach to improving it.
K Bretonnel Cohen, Lawrence E Hunter, Martha Palmer
{"title":"Assessment of software testing and quality assurance in natural language processing applications and a linguistically inspired approach to improving it.","authors":"K Bretonnel Cohen, Lawrence E Hunter, Martha Palmer","doi":"10.1007/978-3-642-45260-4_6","DOIUrl":null,"url":null,"abstract":"<p><p>Significant progress has been made in addressing the scientific challenges of biomedical text mining. However, the transition from a demonstration of scientific progress to the production of tools on which a broader community can rely requires that fundamental software engineering requirements be addressed. In this paper we characterize the state of biomedical text mining software with respect to software testing and quality assurance. Biomedical natural language processing software was chosen because it frequently specifically claims to offer production-quality services, rather than just research prototypes. We examined twenty web sites offering a variety of text mining services. On each web site, we performed the most basic software test known to us and classified the results. Seven out of twenty web sites returned either bad results or the worst class of results in response to this simple test. We conclude that biomedical natural language processing tools require greater attention to software quality. We suggest a linguistically motivated approach to granular evaluation of natural language processing applications, and show how it can be used to detect performance errors of several systems and to predict overall performance on specific equivalence classes of inputs. We also assess the ability of linguistically-motivated test suites to provide good software testing, as compared to large corpora of naturally-occurring data. We measure code coverage and find that it is considerably higher when even small structured test suites are utilized than when large corpora are used.</p>","PeriodicalId":93389,"journal":{"name":"Trustworthy eternal systems via evolving software, data and knowledge : second international workshop, EternalS 2012, Montpellier, France, August 28, 2012, revised selected papers. EternalS (Workshop) (2nd : 2012 : Montpellier, France)","volume":"379 ","pages":"77-90"},"PeriodicalIF":0.0000,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8300901/pdf/nihms-1641159.pdf","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trustworthy eternal systems via evolving software, data and knowledge : second international workshop, EternalS 2012, Montpellier, France, August 28, 2012, revised selected papers. EternalS (Workshop) (2nd : 2012 : Montpellier, France)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-642-45260-4_6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Significant progress has been made in addressing the scientific challenges of biomedical text mining. However, the transition from a demonstration of scientific progress to the production of tools on which a broader community can rely requires that fundamental software engineering requirements be addressed. In this paper we characterize the state of biomedical text mining software with respect to software testing and quality assurance. Biomedical natural language processing software was chosen because it frequently specifically claims to offer production-quality services, rather than just research prototypes. We examined twenty web sites offering a variety of text mining services. On each web site, we performed the most basic software test known to us and classified the results. Seven out of twenty web sites returned either bad results or the worst class of results in response to this simple test. We conclude that biomedical natural language processing tools require greater attention to software quality. We suggest a linguistically motivated approach to granular evaluation of natural language processing applications, and show how it can be used to detect performance errors of several systems and to predict overall performance on specific equivalence classes of inputs. We also assess the ability of linguistically-motivated test suites to provide good software testing, as compared to large corpora of naturally-occurring data. We measure code coverage and find that it is considerably higher when even small structured test suites are utilized than when large corpora are used.