数据转换在现代分析中的作用:一个全面的调查

IF 1.7 3区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Sanae Borrohou, Rachida Fissoune, Hassan Badir
{"title":"数据转换在现代分析中的作用:一个全面的调查","authors":"Sanae Borrohou,&nbsp;Rachida Fissoune,&nbsp;Hassan Badir","doi":"10.1016/j.cola.2025.101329","DOIUrl":null,"url":null,"abstract":"<div><div>Data transformation is a fundamental step in modern data analytics, enabling the conversion of raw data into structured, high-quality formats suitable for analysis. This process plays a crucial role in data cleaning, integration, and preprocessing, ensuring consistency across diverse data sources while addressing challenges such as missing values, inconsistencies, and redundancy. By applying techniques such as scaling, normalization, encoding, feature extraction, and aggregation, data transformation enhances the accuracy and efficiency of analytical and machine learning models. This study provides a comprehensive survey of data transformation techniques, categorizing them into key types: data cleaning and preprocessing, normalization and standardization, feature engineering, encoding categorical data, data augmentation, discretization and data aggregation. We analyze their impact on data quality and explore their interdependencies, presenting a structured framework that connects these transformations within the broader data preprocessing workflow. Additionally, we highlight the challenges of implementing transformation methods in large-scale, heterogeneous datasets, including data integration complexities, security concerns, and resource constraints. By synthesizing recent advancements in the field, this research offers a structured reference for data scientists and researchers, guiding them in selecting appropriate transformation strategies based on their specific analytical needs. Future work will focus on developing a complete data cleaning workflow that integrates transformation techniques for large-scale applications, emphasizing automation and scalability in modern analytics.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101329"},"PeriodicalIF":1.7000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The role of data transformation in modern analytics: A comprehensive survey\",\"authors\":\"Sanae Borrohou,&nbsp;Rachida Fissoune,&nbsp;Hassan Badir\",\"doi\":\"10.1016/j.cola.2025.101329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Data transformation is a fundamental step in modern data analytics, enabling the conversion of raw data into structured, high-quality formats suitable for analysis. This process plays a crucial role in data cleaning, integration, and preprocessing, ensuring consistency across diverse data sources while addressing challenges such as missing values, inconsistencies, and redundancy. By applying techniques such as scaling, normalization, encoding, feature extraction, and aggregation, data transformation enhances the accuracy and efficiency of analytical and machine learning models. This study provides a comprehensive survey of data transformation techniques, categorizing them into key types: data cleaning and preprocessing, normalization and standardization, feature engineering, encoding categorical data, data augmentation, discretization and data aggregation. We analyze their impact on data quality and explore their interdependencies, presenting a structured framework that connects these transformations within the broader data preprocessing workflow. Additionally, we highlight the challenges of implementing transformation methods in large-scale, heterogeneous datasets, including data integration complexities, security concerns, and resource constraints. By synthesizing recent advancements in the field, this research offers a structured reference for data scientists and researchers, guiding them in selecting appropriate transformation strategies based on their specific analytical needs. Future work will focus on developing a complete data cleaning workflow that integrates transformation techniques for large-scale applications, emphasizing automation and scalability in modern analytics.</div></div>\",\"PeriodicalId\":48552,\"journal\":{\"name\":\"Journal of Computer Languages\",\"volume\":\"84 \",\"pages\":\"Article 101329\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Languages\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590118425000152\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118425000152","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

数据转换是现代数据分析的基本步骤,可以将原始数据转换为适合分析的结构化、高质量格式。此过程在数据清理、集成和预处理中起着至关重要的作用,确保跨不同数据源的一致性,同时解决诸如缺失值、不一致性和冗余等挑战。通过应用缩放、归一化、编码、特征提取和聚合等技术,数据转换提高了分析和机器学习模型的准确性和效率。本研究对数据转换技术进行了全面的综述,并将其分为关键类型:数据清洗和预处理、规范化和标准化、特征工程、分类数据编码、数据增强、离散化和数据聚合。我们分析了它们对数据质量的影响,并探讨了它们的相互依赖性,提出了一个结构化框架,将这些转换连接到更广泛的数据预处理工作流程中。此外,我们强调了在大规模异构数据集中实现转换方法的挑战,包括数据集成的复杂性、安全问题和资源约束。通过综合该领域的最新进展,本研究为数据科学家和研究人员提供了一个结构化的参考,指导他们根据自己的具体分析需求选择合适的转换策略。未来的工作将侧重于开发一个完整的数据清理工作流,该工作流集成了大规模应用的转换技术,强调现代分析中的自动化和可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The role of data transformation in modern analytics: A comprehensive survey
Data transformation is a fundamental step in modern data analytics, enabling the conversion of raw data into structured, high-quality formats suitable for analysis. This process plays a crucial role in data cleaning, integration, and preprocessing, ensuring consistency across diverse data sources while addressing challenges such as missing values, inconsistencies, and redundancy. By applying techniques such as scaling, normalization, encoding, feature extraction, and aggregation, data transformation enhances the accuracy and efficiency of analytical and machine learning models. This study provides a comprehensive survey of data transformation techniques, categorizing them into key types: data cleaning and preprocessing, normalization and standardization, feature engineering, encoding categorical data, data augmentation, discretization and data aggregation. We analyze their impact on data quality and explore their interdependencies, presenting a structured framework that connects these transformations within the broader data preprocessing workflow. Additionally, we highlight the challenges of implementing transformation methods in large-scale, heterogeneous datasets, including data integration complexities, security concerns, and resource constraints. By synthesizing recent advancements in the field, this research offers a structured reference for data scientists and researchers, guiding them in selecting appropriate transformation strategies based on their specific analytical needs. Future work will focus on developing a complete data cleaning workflow that integrates transformation techniques for large-scale applications, emphasizing automation and scalability in modern analytics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Computer Languages
Journal of Computer Languages Computer Science-Computer Networks and Communications
CiteScore
5.00
自引率
13.60%
发文量
36
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信