用属性-关系文件格式连接异构数据流

Q4 Engineering
M. Diván, M. Reynoso
{"title":"用属性-关系文件格式连接异构数据流","authors":"M. Diván, M. Reynoso","doi":"10.1063/1.5133936","DOIUrl":null,"url":null,"abstract":"The processing strategy based on measurement metadata is a data stream engine running on Apache Storm, who is able to process measures in real-time. In the data stream context, the data have no an associated limit, they are al-ways arriving. The Attribute-Relation File Format (ARFF) is used by popular software like Weka, allowing offline analysis in the machine learning and data mining area. However, the ARFF file has a finite size. The CincamimisConversor library allows exporting from the data streams organized under a measurement interchange schema to a columnar-data organization in real-time. Here, an extension to the library is introduced for supporting the real-time translating and storing from the heterogeneous data streams to the ARFF file format. This is very useful, because through the library now is possible to collect data from heterogeneous data sources (e.g. Internet-of-Thing -IoTdevices) and export them in real-time for offline analysis in Weka. Even, this could foster a lot of educational applications among IoT, the measurement process with heterogeneous sources, data stream processing strategy, and Weka. A discrete simulation was carried out, obtaining promising results. It is just required at most 0.2387 ms for translating 5000 measures, while the storing operation for them consumed less than 0.2028 ms on a Solid-State disk.","PeriodicalId":39047,"journal":{"name":"Journal of Electrical and Electronics Engineering","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Articulating heterogeneous data streams with the attribute-relation file format\",\"authors\":\"M. Diván, M. Reynoso\",\"doi\":\"10.1063/1.5133936\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The processing strategy based on measurement metadata is a data stream engine running on Apache Storm, who is able to process measures in real-time. In the data stream context, the data have no an associated limit, they are al-ways arriving. The Attribute-Relation File Format (ARFF) is used by popular software like Weka, allowing offline analysis in the machine learning and data mining area. However, the ARFF file has a finite size. The CincamimisConversor library allows exporting from the data streams organized under a measurement interchange schema to a columnar-data organization in real-time. Here, an extension to the library is introduced for supporting the real-time translating and storing from the heterogeneous data streams to the ARFF file format. This is very useful, because through the library now is possible to collect data from heterogeneous data sources (e.g. Internet-of-Thing -IoTdevices) and export them in real-time for offline analysis in Weka. Even, this could foster a lot of educational applications among IoT, the measurement process with heterogeneous sources, data stream processing strategy, and Weka. A discrete simulation was carried out, obtaining promising results. It is just required at most 0.2387 ms for translating 5000 measures, while the storing operation for them consumed less than 0.2028 ms on a Solid-State disk.\",\"PeriodicalId\":39047,\"journal\":{\"name\":\"Journal of Electrical and Electronics Engineering\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electrical and Electronics Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1063/1.5133936\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electrical and Electronics Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1063/1.5133936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 1

摘要

基于度量元数据的处理策略是一个运行在Apache Storm上的数据流引擎,能够实时处理度量。在数据流上下文中,数据没有关联的限制,它们总是到达。属性-关系文件格式(Attribute-Relation File Format, ARFF)被Weka等流行软件所使用,允许在机器学习和数据挖掘领域进行离线分析。然而,ARFF文件的大小是有限的。CincamimisConversor库允许将在度量交换模式下组织的数据流实时导出到列数据组织。为了支持异构数据流到ARFF文件格式的实时转换和存储,本文对库进行了扩展。这是非常有用的,因为通过这个库,现在可以从异构数据源(例如物联网设备)收集数据,并实时导出数据,以便在Weka中进行离线分析。甚至,这可以在物联网、异构源测量过程、数据流处理策略和Weka中促进大量教育应用。进行了离散仿真,得到了令人满意的结果。转换5000个度量值最多只需要0.2387 ms,而在固态磁盘上存储这些度量值所需的时间不到0.2028 ms。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Articulating heterogeneous data streams with the attribute-relation file format
The processing strategy based on measurement metadata is a data stream engine running on Apache Storm, who is able to process measures in real-time. In the data stream context, the data have no an associated limit, they are al-ways arriving. The Attribute-Relation File Format (ARFF) is used by popular software like Weka, allowing offline analysis in the machine learning and data mining area. However, the ARFF file has a finite size. The CincamimisConversor library allows exporting from the data streams organized under a measurement interchange schema to a columnar-data organization in real-time. Here, an extension to the library is introduced for supporting the real-time translating and storing from the heterogeneous data streams to the ARFF file format. This is very useful, because through the library now is possible to collect data from heterogeneous data sources (e.g. Internet-of-Thing -IoTdevices) and export them in real-time for offline analysis in Weka. Even, this could foster a lot of educational applications among IoT, the measurement process with heterogeneous sources, data stream processing strategy, and Weka. A discrete simulation was carried out, obtaining promising results. It is just required at most 0.2387 ms for translating 5000 measures, while the storing operation for them consumed less than 0.2028 ms on a Solid-State disk.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Electrical and Electronics Engineering
Journal of Electrical and Electronics Engineering Engineering-Electrical and Electronic Engineering
CiteScore
0.90
自引率
0.00%
发文量
0
审稿时长
16 weeks
期刊介绍: Journal of Electrical and Electronics Engineering is a scientific interdisciplinary, application-oriented publication that offer to the researchers and to the PhD students the possibility to disseminate their novel and original scientific and research contributions in the field of electrical and electronics engineering. The articles are reviewed by professionals and the selection of the papers is based only on the quality of their content and following the next criteria: the papers presents the research results of the authors, the papers / the content of the papers have not been submitted or published elsewhere, the paper must be written in English, as well as the fact that the papers should include in the reference list papers already published in recent years in the Journal of Electrical and Electronics Engineering that present similar research results. The topics and instructions for authors of this journal can be found to the appropiate sections.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信