从互联网上收集可靠性数据

2008 Annual Reliability and Maintainability Symposium Pub Date : 2008-01-28 DOI:10.1109/RAMS.2008.4925816

H. Dussault, P.S. Zarubin, S. Morris, D. Nicholls

{"title":"从互联网上收集可靠性数据","authors":"H. Dussault, P.S. Zarubin, S. Morris, D. Nicholls","doi":"10.1109/RAMS.2008.4925816","DOIUrl":null,"url":null,"abstract":"This paper describes the initial design, development and testing of a tool that harvests reliability data from multiple internet resources. An evaluation corpus of 1544 URLs is used to assess typical reliability data collection content and challenges and to provide a basis for evaluating data harvesting tool performance and capability growth. Early results show that the ability to handle portable document format (PDF) documents, correctly parse web pages, including significant punctuation marks and number formatting, and to extract data from tables are important in reliability data collection. The results to date show that reliability data is available on the internet, and that automated tools can begin to discover and harvest that information. However, there is much work to do to be able to reliably discover, extract, cluster, and present valid component reliability to users.","PeriodicalId":143940,"journal":{"name":"2008 Annual Reliability and Maintainability Symposium","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Harvesting reliability data from the internet\",\"authors\":\"H. Dussault, P.S. Zarubin, S. Morris, D. Nicholls\",\"doi\":\"10.1109/RAMS.2008.4925816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes the initial design, development and testing of a tool that harvests reliability data from multiple internet resources. An evaluation corpus of 1544 URLs is used to assess typical reliability data collection content and challenges and to provide a basis for evaluating data harvesting tool performance and capability growth. Early results show that the ability to handle portable document format (PDF) documents, correctly parse web pages, including significant punctuation marks and number formatting, and to extract data from tables are important in reliability data collection. The results to date show that reliability data is available on the internet, and that automated tools can begin to discover and harvest that information. However, there is much work to do to be able to reliably discover, extract, cluster, and present valid component reliability to users.\",\"PeriodicalId\":143940,\"journal\":{\"name\":\"2008 Annual Reliability and Maintainability Symposium\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Annual Reliability and Maintainability Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RAMS.2008.4925816\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Annual Reliability and Maintainability Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAMS.2008.4925816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文描述了一种从多种互联网资源中获取可靠性数据的工具的初步设计、开发和测试。使用包含1544个url的评估语料库来评估典型的可靠性数据收集内容和挑战，并为评估数据收集工具的性能和能力增长提供基础。早期结果表明，处理便携式文档格式(PDF)文档、正确解析网页(包括重要的标点符号和数字格式)以及从表格中提取数据的能力在可靠性数据收集中非常重要。迄今为止的结果表明，可靠性数据可以在互联网上获得，自动化工具可以开始发现和收集这些信息。然而，为了能够可靠地发现、提取、聚类并向用户呈现有效的组件可靠性，还有很多工作要做。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Harvesting reliability data from the internet

This paper describes the initial design, development and testing of a tool that harvests reliability data from multiple internet resources. An evaluation corpus of 1544 URLs is used to assess typical reliability data collection content and challenges and to provide a basis for evaluating data harvesting tool performance and capability growth. Early results show that the ability to handle portable document format (PDF) documents, correctly parse web pages, including significant punctuation marks and number formatting, and to extract data from tables are important in reliability data collection. The results to date show that reliability data is available on the internet, and that automated tools can begin to discover and harvest that information. However, there is much work to do to be able to reliably discover, extract, cluster, and present valid component reliability to users.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 Annual Reliability and Maintainability Symposium

自引率

0.00%

发文量