从互联网上收集可靠性数据

H. Dussault, P.S. Zarubin, S. Morris, D. Nicholls
{"title":"从互联网上收集可靠性数据","authors":"H. Dussault, P.S. Zarubin, S. Morris, D. Nicholls","doi":"10.1109/RAMS.2008.4925816","DOIUrl":null,"url":null,"abstract":"This paper describes the initial design, development and testing of a tool that harvests reliability data from multiple internet resources. An evaluation corpus of 1544 URLs is used to assess typical reliability data collection content and challenges and to provide a basis for evaluating data harvesting tool performance and capability growth. Early results show that the ability to handle portable document format (PDF) documents, correctly parse web pages, including significant punctuation marks and number formatting, and to extract data from tables are important in reliability data collection. The results to date show that reliability data is available on the internet, and that automated tools can begin to discover and harvest that information. However, there is much work to do to be able to reliably discover, extract, cluster, and present valid component reliability to users.","PeriodicalId":143940,"journal":{"name":"2008 Annual Reliability and Maintainability Symposium","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Harvesting reliability data from the internet\",\"authors\":\"H. Dussault, P.S. Zarubin, S. Morris, D. Nicholls\",\"doi\":\"10.1109/RAMS.2008.4925816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes the initial design, development and testing of a tool that harvests reliability data from multiple internet resources. An evaluation corpus of 1544 URLs is used to assess typical reliability data collection content and challenges and to provide a basis for evaluating data harvesting tool performance and capability growth. Early results show that the ability to handle portable document format (PDF) documents, correctly parse web pages, including significant punctuation marks and number formatting, and to extract data from tables are important in reliability data collection. The results to date show that reliability data is available on the internet, and that automated tools can begin to discover and harvest that information. However, there is much work to do to be able to reliably discover, extract, cluster, and present valid component reliability to users.\",\"PeriodicalId\":143940,\"journal\":{\"name\":\"2008 Annual Reliability and Maintainability Symposium\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Annual Reliability and Maintainability Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RAMS.2008.4925816\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Annual Reliability and Maintainability Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAMS.2008.4925816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

本文描述了一种从多种互联网资源中获取可靠性数据的工具的初步设计、开发和测试。使用包含1544个url的评估语料库来评估典型的可靠性数据收集内容和挑战,并为评估数据收集工具的性能和能力增长提供基础。早期结果表明,处理便携式文档格式(PDF)文档、正确解析网页(包括重要的标点符号和数字格式)以及从表格中提取数据的能力在可靠性数据收集中非常重要。迄今为止的结果表明,可靠性数据可以在互联网上获得,自动化工具可以开始发现和收集这些信息。然而,为了能够可靠地发现、提取、聚类并向用户呈现有效的组件可靠性,还有很多工作要做。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Harvesting reliability data from the internet
This paper describes the initial design, development and testing of a tool that harvests reliability data from multiple internet resources. An evaluation corpus of 1544 URLs is used to assess typical reliability data collection content and challenges and to provide a basis for evaluating data harvesting tool performance and capability growth. Early results show that the ability to handle portable document format (PDF) documents, correctly parse web pages, including significant punctuation marks and number formatting, and to extract data from tables are important in reliability data collection. The results to date show that reliability data is available on the internet, and that automated tools can begin to discover and harvest that information. However, there is much work to do to be able to reliably discover, extract, cluster, and present valid component reliability to users.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信