一种快速分布式的城市大数据C4.5算法

IF 0.9 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Wan-Shu Cheng, Pengchi Huang, Jheng-Yu Huang, Ju-Chin Chen, K. W. Lin
{"title":"一种快速分布式的城市大数据C4.5算法","authors":"Wan-Shu Cheng, Pengchi Huang, Jheng-Yu Huang, Ju-Chin Chen, K. W. Lin","doi":"10.3233/ida-220753","DOIUrl":null,"url":null,"abstract":"The amount of information nowadays is rapidly growing. Aside from valuable information, information that is unrelated to a target or is meaningless is also growing. Big data and broader digital technologies are considered the primary components of smart city governance and planning. Big data analysis is considered to define a new era in urban planning, research, and policy. Effective data mining and pattern detection techniques are becoming very important these days. Processing such a large amount of data entails the use of data mining, a technique that clarifies the association between valid information and excludes irrelevant data to implement a practical decision tree. A large amount of data affects processing time and I/O costs during data mining. This study proposes to distribute data among multiple clients and distribute a large amount of data computation equally to improve the resource cost problem of exploration. Following that, the main server consolidates the computation results and generates the survey results. Experiment results show that the proposed algorithm is superior, thus allowing a larger amount of data to be processed while producing high-quality results.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":" ","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A fast and distributed C4.5 algorithm for urban big data\",\"authors\":\"Wan-Shu Cheng, Pengchi Huang, Jheng-Yu Huang, Ju-Chin Chen, K. W. Lin\",\"doi\":\"10.3233/ida-220753\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The amount of information nowadays is rapidly growing. Aside from valuable information, information that is unrelated to a target or is meaningless is also growing. Big data and broader digital technologies are considered the primary components of smart city governance and planning. Big data analysis is considered to define a new era in urban planning, research, and policy. Effective data mining and pattern detection techniques are becoming very important these days. Processing such a large amount of data entails the use of data mining, a technique that clarifies the association between valid information and excludes irrelevant data to implement a practical decision tree. A large amount of data affects processing time and I/O costs during data mining. This study proposes to distribute data among multiple clients and distribute a large amount of data computation equally to improve the resource cost problem of exploration. Following that, the main server consolidates the computation results and generates the survey results. Experiment results show that the proposed algorithm is superior, thus allowing a larger amount of data to be processed while producing high-quality results.\",\"PeriodicalId\":50355,\"journal\":{\"name\":\"Intelligent Data Analysis\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2023-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Data Analysis\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.3233/ida-220753\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Data Analysis","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/ida-220753","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

当今的信息量正在迅速增长。除了有价值的信息,与目标无关或毫无意义的信息也在增长。大数据和更广泛的数字技术被认为是智慧城市治理和规划的主要组成部分。大数据分析被认为是城市规划、研究和政策的新时代。有效的数据挖掘和模式检测技术现在变得非常重要。处理如此大量的数据需要使用数据挖掘,这是一种澄清有效信息之间的关联并排除不相关数据以实现实用决策树的技术。在数据挖掘过程中,大量数据会影响处理时间和I/O成本。该研究提出在多个客户端之间分配数据,并平均分配大量数据计算,以改善勘探的资源成本问题。然后,主服务器合并计算结果并生成调查结果。实验结果表明,该算法具有优越性,可以在处理大量数据的同时产生高质量的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A fast and distributed C4.5 algorithm for urban big data
The amount of information nowadays is rapidly growing. Aside from valuable information, information that is unrelated to a target or is meaningless is also growing. Big data and broader digital technologies are considered the primary components of smart city governance and planning. Big data analysis is considered to define a new era in urban planning, research, and policy. Effective data mining and pattern detection techniques are becoming very important these days. Processing such a large amount of data entails the use of data mining, a technique that clarifies the association between valid information and excludes irrelevant data to implement a practical decision tree. A large amount of data affects processing time and I/O costs during data mining. This study proposes to distribute data among multiple clients and distribute a large amount of data computation equally to improve the resource cost problem of exploration. Following that, the main server consolidates the computation results and generates the survey results. Experiment results show that the proposed algorithm is superior, thus allowing a larger amount of data to be processed while producing high-quality results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Intelligent Data Analysis
Intelligent Data Analysis 工程技术-计算机:人工智能
CiteScore
2.20
自引率
5.90%
发文量
85
审稿时长
3.3 months
期刊介绍: Intelligent Data Analysis provides a forum for the examination of issues related to the research and applications of Artificial Intelligence techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to): all areas of data visualization, data pre-processing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, big data applications, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and post-processing. In particular, papers are preferred that discuss development of new AI related data analysis architectures, methodologies, and techniques and their applications to various domains.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信