{"title":"从互联网上收集可靠性数据","authors":"H. Dussault, P.S. Zarubin, S. Morris, D. Nicholls","doi":"10.1109/RAMS.2008.4925816","DOIUrl":null,"url":null,"abstract":"This paper describes the initial design, development and testing of a tool that harvests reliability data from multiple internet resources. An evaluation corpus of 1544 URLs is used to assess typical reliability data collection content and challenges and to provide a basis for evaluating data harvesting tool performance and capability growth. Early results show that the ability to handle portable document format (PDF) documents, correctly parse web pages, including significant punctuation marks and number formatting, and to extract data from tables are important in reliability data collection. The results to date show that reliability data is available on the internet, and that automated tools can begin to discover and harvest that information. However, there is much work to do to be able to reliably discover, extract, cluster, and present valid component reliability to users.","PeriodicalId":143940,"journal":{"name":"2008 Annual Reliability and Maintainability Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Harvesting reliability data from the internet\",\"authors\":\"H. Dussault, P.S. Zarubin, S. Morris, D. Nicholls\",\"doi\":\"10.1109/RAMS.2008.4925816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes the initial design, development and testing of a tool that harvests reliability data from multiple internet resources. An evaluation corpus of 1544 URLs is used to assess typical reliability data collection content and challenges and to provide a basis for evaluating data harvesting tool performance and capability growth. Early results show that the ability to handle portable document format (PDF) documents, correctly parse web pages, including significant punctuation marks and number formatting, and to extract data from tables are important in reliability data collection. The results to date show that reliability data is available on the internet, and that automated tools can begin to discover and harvest that information. However, there is much work to do to be able to reliably discover, extract, cluster, and present valid component reliability to users.\",\"PeriodicalId\":143940,\"journal\":{\"name\":\"2008 Annual Reliability and Maintainability Symposium\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Annual Reliability and Maintainability Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RAMS.2008.4925816\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Annual Reliability and Maintainability Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAMS.2008.4925816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper describes the initial design, development and testing of a tool that harvests reliability data from multiple internet resources. An evaluation corpus of 1544 URLs is used to assess typical reliability data collection content and challenges and to provide a basis for evaluating data harvesting tool performance and capability growth. Early results show that the ability to handle portable document format (PDF) documents, correctly parse web pages, including significant punctuation marks and number formatting, and to extract data from tables are important in reliability data collection. The results to date show that reliability data is available on the internet, and that automated tools can begin to discover and harvest that information. However, there is much work to do to be able to reliably discover, extract, cluster, and present valid component reliability to users.