S. Kreutzer, Steve Grehl, Michael Höhne, Oliver Simmank, K. Dornich, Grzegorz Adamiec, Christoph Burow, H. Roberts, G. Duller
{"title":"XLUM: an open data format for exchange and long-term preservation of luminescence data","authors":"S. Kreutzer, Steve Grehl, Michael Höhne, Oliver Simmank, K. Dornich, Grzegorz Adamiec, Christoph Burow, H. Roberts, G. Duller","doi":"10.5194/gchron-5-271-2023","DOIUrl":null,"url":null,"abstract":"Abstract. The concept of open data has become the modern science meme, and major funding bodies and publishers support open data. On a daily basis, however, the\nopen data mandate frequently encounters technical obstacles, such as a lack of a suitable data format for data sharing and long-term data\npreservation. Such issues are often community-specific and best addressed through community-tailored solutions. In Quaternary sciences, luminescence\ndating is widely used for constraining the timing of event-based processes (e.g. sediment transport). Every luminescence dating study produces a\nvast body of primary data that usually remains inaccessible and incompatible with future studies or adjacent scientific disciplines. To facilitate\ndata exchange and long-term data preservation (in short, open data) in luminescence dating studies, we propose a new XML-based structured data\nformat called XLUM. The format applies a hierarchical data storage concept consisting of a root node (node 0), a sample (node 1), a sequence\n(node 2), a record (node 3), and a curve (node 4). The curve level holds information on the technical component (e.g. photomultiplier,\nthermocouple). A finite number of curves represent a record (e.g. an optically stimulated luminescence curve). Records are part of a sequence\nmeasured for a particular sample. This design concept allows the user to retain information on a technical component level from the measurement\nprocess. The additional storage of related metadata fosters future data mining projects on large datasets. The XML-based format is less\nmemory-efficient than binary formats; however, its focus is data exchange, preservation, and hence XLUM long-term format stability by\ndesign. XLUM is inherently stable to future updates and backwards-compatible. We support XLUM through a new R package xlum,\nfacilitating the conversion of different formats into the new XLUM format. XLUM is licensed under the MIT licence and hence available\nfor free to be used in open- and closed-source commercial and non-commercial software and research projects.\n","PeriodicalId":12723,"journal":{"name":"Geochronology","volume":"8 1","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geochronology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5194/gchron-5-271-2023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract. The concept of open data has become the modern science meme, and major funding bodies and publishers support open data. On a daily basis, however, the
open data mandate frequently encounters technical obstacles, such as a lack of a suitable data format for data sharing and long-term data
preservation. Such issues are often community-specific and best addressed through community-tailored solutions. In Quaternary sciences, luminescence
dating is widely used for constraining the timing of event-based processes (e.g. sediment transport). Every luminescence dating study produces a
vast body of primary data that usually remains inaccessible and incompatible with future studies or adjacent scientific disciplines. To facilitate
data exchange and long-term data preservation (in short, open data) in luminescence dating studies, we propose a new XML-based structured data
format called XLUM. The format applies a hierarchical data storage concept consisting of a root node (node 0), a sample (node 1), a sequence
(node 2), a record (node 3), and a curve (node 4). The curve level holds information on the technical component (e.g. photomultiplier,
thermocouple). A finite number of curves represent a record (e.g. an optically stimulated luminescence curve). Records are part of a sequence
measured for a particular sample. This design concept allows the user to retain information on a technical component level from the measurement
process. The additional storage of related metadata fosters future data mining projects on large datasets. The XML-based format is less
memory-efficient than binary formats; however, its focus is data exchange, preservation, and hence XLUM long-term format stability by
design. XLUM is inherently stable to future updates and backwards-compatible. We support XLUM through a new R package xlum,
facilitating the conversion of different formats into the new XLUM format. XLUM is licensed under the MIT licence and hence available
for free to be used in open- and closed-source commercial and non-commercial software and research projects.