Pablo Suárez-Otero , Michael J. Mior , María José Suárez-Cabal , Javier Tuya
{"title":"Data migration for column family database evolution","authors":"Pablo Suárez-Otero , Michael J. Mior , María José Suárez-Cabal , Javier Tuya","doi":"10.1016/j.infsof.2025.107834","DOIUrl":null,"url":null,"abstract":"<div><h3>Context</h3><div>Database evolution involves processes such as the evolution of the schema, the adaptation of the application to the new schema, and migrations of data to the new or modified structures of the schema. Data migration is particularly crucial in databases where data repetition is common such as the NoSQL column family DBMSs. In these systems, data integrity cannot be enforced from the database side, but instead needs to be maintained from the application side. Database evolution is also affected by data repetition and the absence of data integrity enforcement from the database, as any evolution of the schema requires data migrations to maintain data integrity.</div></div><div><h3>Objectives</h3><div>Ensure data integrity in NoSQL column family DBMSs during database evolution by providing specific instructions for the execution of the necessary data migrations.</div></div><div><h3>Methods</h3><div>We propose MoDEvo, a model-driven engineering approach that provides a data migration model to ensure data integrity for database evolution in column-family DBMSs. This model is then transformed into an executable script that implements the migration procedures.</div></div><div><h3>Results</h3><div>We evaluate MoDEvo by executing data migrations in case studies obtained from open-source projects where the schema evolved. In this evaluation we use Apache Cassandra, the most popular column-family DBMS. Through this evaluation, we verify that the scripts generated from the data migration model effectively maintain data integrity within the database.</div></div><div><h3>Conclusion</h3><div>MoDEvo aids database evolution in column family DBMSs by avoiding the incurrence in the creation of inconsistencies and can also detect impossible migrations, thereby preventing errors. There is still room for improvement such as extending the supported databases to other paradigms where data repetition is common and addressing the evolution of the client applications alongside schema evolution.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"187 ","pages":"Article 107834"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925001739","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Context
Database evolution involves processes such as the evolution of the schema, the adaptation of the application to the new schema, and migrations of data to the new or modified structures of the schema. Data migration is particularly crucial in databases where data repetition is common such as the NoSQL column family DBMSs. In these systems, data integrity cannot be enforced from the database side, but instead needs to be maintained from the application side. Database evolution is also affected by data repetition and the absence of data integrity enforcement from the database, as any evolution of the schema requires data migrations to maintain data integrity.
Objectives
Ensure data integrity in NoSQL column family DBMSs during database evolution by providing specific instructions for the execution of the necessary data migrations.
Methods
We propose MoDEvo, a model-driven engineering approach that provides a data migration model to ensure data integrity for database evolution in column-family DBMSs. This model is then transformed into an executable script that implements the migration procedures.
Results
We evaluate MoDEvo by executing data migrations in case studies obtained from open-source projects where the schema evolved. In this evaluation we use Apache Cassandra, the most popular column-family DBMS. Through this evaluation, we verify that the scripts generated from the data migration model effectively maintain data integrity within the database.
Conclusion
MoDEvo aids database evolution in column family DBMSs by avoiding the incurrence in the creation of inconsistencies and can also detect impossible migrations, thereby preventing errors. There is still room for improvement such as extending the supported databases to other paradigms where data repetition is common and addressing the evolution of the client applications alongside schema evolution.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.