{"title":"数据转换在现代分析中的作用:一个全面的调查","authors":"Sanae Borrohou, Rachida Fissoune, Hassan Badir","doi":"10.1016/j.cola.2025.101329","DOIUrl":null,"url":null,"abstract":"<div><div>Data transformation is a fundamental step in modern data analytics, enabling the conversion of raw data into structured, high-quality formats suitable for analysis. This process plays a crucial role in data cleaning, integration, and preprocessing, ensuring consistency across diverse data sources while addressing challenges such as missing values, inconsistencies, and redundancy. By applying techniques such as scaling, normalization, encoding, feature extraction, and aggregation, data transformation enhances the accuracy and efficiency of analytical and machine learning models. This study provides a comprehensive survey of data transformation techniques, categorizing them into key types: data cleaning and preprocessing, normalization and standardization, feature engineering, encoding categorical data, data augmentation, discretization and data aggregation. We analyze their impact on data quality and explore their interdependencies, presenting a structured framework that connects these transformations within the broader data preprocessing workflow. Additionally, we highlight the challenges of implementing transformation methods in large-scale, heterogeneous datasets, including data integration complexities, security concerns, and resource constraints. By synthesizing recent advancements in the field, this research offers a structured reference for data scientists and researchers, guiding them in selecting appropriate transformation strategies based on their specific analytical needs. Future work will focus on developing a complete data cleaning workflow that integrates transformation techniques for large-scale applications, emphasizing automation and scalability in modern analytics.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"84 ","pages":"Article 101329"},"PeriodicalIF":1.7000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The role of data transformation in modern analytics: A comprehensive survey\",\"authors\":\"Sanae Borrohou, Rachida Fissoune, Hassan Badir\",\"doi\":\"10.1016/j.cola.2025.101329\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Data transformation is a fundamental step in modern data analytics, enabling the conversion of raw data into structured, high-quality formats suitable for analysis. This process plays a crucial role in data cleaning, integration, and preprocessing, ensuring consistency across diverse data sources while addressing challenges such as missing values, inconsistencies, and redundancy. By applying techniques such as scaling, normalization, encoding, feature extraction, and aggregation, data transformation enhances the accuracy and efficiency of analytical and machine learning models. This study provides a comprehensive survey of data transformation techniques, categorizing them into key types: data cleaning and preprocessing, normalization and standardization, feature engineering, encoding categorical data, data augmentation, discretization and data aggregation. We analyze their impact on data quality and explore their interdependencies, presenting a structured framework that connects these transformations within the broader data preprocessing workflow. Additionally, we highlight the challenges of implementing transformation methods in large-scale, heterogeneous datasets, including data integration complexities, security concerns, and resource constraints. By synthesizing recent advancements in the field, this research offers a structured reference for data scientists and researchers, guiding them in selecting appropriate transformation strategies based on their specific analytical needs. Future work will focus on developing a complete data cleaning workflow that integrates transformation techniques for large-scale applications, emphasizing automation and scalability in modern analytics.</div></div>\",\"PeriodicalId\":48552,\"journal\":{\"name\":\"Journal of Computer Languages\",\"volume\":\"84 \",\"pages\":\"Article 101329\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Languages\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590118425000152\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118425000152","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
The role of data transformation in modern analytics: A comprehensive survey
Data transformation is a fundamental step in modern data analytics, enabling the conversion of raw data into structured, high-quality formats suitable for analysis. This process plays a crucial role in data cleaning, integration, and preprocessing, ensuring consistency across diverse data sources while addressing challenges such as missing values, inconsistencies, and redundancy. By applying techniques such as scaling, normalization, encoding, feature extraction, and aggregation, data transformation enhances the accuracy and efficiency of analytical and machine learning models. This study provides a comprehensive survey of data transformation techniques, categorizing them into key types: data cleaning and preprocessing, normalization and standardization, feature engineering, encoding categorical data, data augmentation, discretization and data aggregation. We analyze their impact on data quality and explore their interdependencies, presenting a structured framework that connects these transformations within the broader data preprocessing workflow. Additionally, we highlight the challenges of implementing transformation methods in large-scale, heterogeneous datasets, including data integration complexities, security concerns, and resource constraints. By synthesizing recent advancements in the field, this research offers a structured reference for data scientists and researchers, guiding them in selecting appropriate transformation strategies based on their specific analytical needs. Future work will focus on developing a complete data cleaning workflow that integrates transformation techniques for large-scale applications, emphasizing automation and scalability in modern analytics.