PaDDMAS:并行和分布式数据挖掘应用套件

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000 Pub Date : 2000-05-01 DOI:10.1109/IPDPS.2000.846010

O. Rana, D. Walker, Maozhen Li, S. Lynden, M. Ward

{"title":"PaDDMAS:并行和分布式数据挖掘应用套件","authors":"O. Rana, D. Walker, Maozhen Li, S. Lynden, M. Ward","doi":"10.1109/IPDPS.2000.846010","DOIUrl":null,"url":null,"abstract":"Discovering complex associations, anomalies and patterns in distributed data sets is gaining popularity in a range of scientific, medical and business applications. Various algorithms are employed to perform data analysis within a domain, and range from statistical to machine learning and AI based techniques. Several issues need to be addressed however to scale such approaches to large data sets, particularly when these are applied to data distributed at various sites. As new analysis techniques are identified, the core tool set must enable easy integration of such analytical components. Similarly, results from an analysis engines must be sharable, to enable storage, visualisation or further analysis of results. We describe the architecture of PaDDMAS, a component based system for developing distributed data mining applications. PaDDMAS provides a tool set for combining pre-developed or custom components using a dataflow approach, with components performing analysis, data extraction or data management and translation. Each component is wrapped as a Java/CORBA object, and has an interface defined in XML. Components can be serial or parallel objects, and may be binary or contain a more complex internal structure. We demonstrate a prototype using a neural network analysis algorithm.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"PaDDMAS: parallel and distributed data mining application suite\",\"authors\":\"O. Rana, D. Walker, Maozhen Li, S. Lynden, M. Ward\",\"doi\":\"10.1109/IPDPS.2000.846010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Discovering complex associations, anomalies and patterns in distributed data sets is gaining popularity in a range of scientific, medical and business applications. Various algorithms are employed to perform data analysis within a domain, and range from statistical to machine learning and AI based techniques. Several issues need to be addressed however to scale such approaches to large data sets, particularly when these are applied to data distributed at various sites. As new analysis techniques are identified, the core tool set must enable easy integration of such analytical components. Similarly, results from an analysis engines must be sharable, to enable storage, visualisation or further analysis of results. We describe the architecture of PaDDMAS, a component based system for developing distributed data mining applications. PaDDMAS provides a tool set for combining pre-developed or custom components using a dataflow approach, with components performing analysis, data extraction or data management and translation. Each component is wrapped as a Java/CORBA object, and has an interface defined in XML. Components can be serial or parallel objects, and may be binary or contain a more complex internal structure. We demonstrate a prototype using a neural network analysis algorithm.\",\"PeriodicalId\":206541,\"journal\":{\"name\":\"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2000.846010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2000.846010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

发现分布式数据集中的复杂关联、异常和模式在一系列科学、医疗和商业应用中越来越受欢迎。在一个领域内，使用各种算法来执行数据分析，范围从统计到机器学习和基于人工智能的技术。然而，要将这种方法扩展到大型数据集，需要解决几个问题，特别是当这些方法应用于分布在不同站点的数据时。随着新的分析技术的确定，核心工具集必须能够轻松集成这些分析组件。同样，来自分析引擎的结果必须是可共享的，以便存储、可视化或进一步分析结果。介绍了基于组件的分布式数据挖掘系统PaDDMAS的体系结构。PaDDMAS提供了一个工具集，用于使用数据流方法将预开发或自定义组件与执行分析、数据提取或数据管理和转换的组件组合在一起。每个组件都包装为Java/CORBA对象，并具有用XML定义的接口。组件可以是串行或并行对象，也可以是二进制或包含更复杂的内部结构。我们使用神经网络分析算法演示了一个原型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PaDDMAS: parallel and distributed data mining application suite

Discovering complex associations, anomalies and patterns in distributed data sets is gaining popularity in a range of scientific, medical and business applications. Various algorithms are employed to perform data analysis within a domain, and range from statistical to machine learning and AI based techniques. Several issues need to be addressed however to scale such approaches to large data sets, particularly when these are applied to data distributed at various sites. As new analysis techniques are identified, the core tool set must enable easy integration of such analytical components. Similarly, results from an analysis engines must be sharable, to enable storage, visualisation or further analysis of results. We describe the architecture of PaDDMAS, a component based system for developing distributed data mining applications. PaDDMAS provides a tool set for combining pre-developed or custom components using a dataflow approach, with components performing analysis, data extraction or data management and translation. Each component is wrapped as a Java/CORBA object, and has an interface defined in XML. Components can be serial or parallel objects, and may be binary or contain a more complex internal structure. We demonstrate a prototype using a neural network analysis algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000

自引率

0.00%

发文量