基于植物性状数据集合生成SHACL形状的研究

D. R. Saleh, Y. Kartika, Zaenal Akbar, A. Krisnadhi, W. Fatriasari
{"title":"基于植物性状数据集合生成SHACL形状的研究","authors":"D. R. Saleh, Y. Kartika, Zaenal Akbar, A. Krisnadhi, W. Fatriasari","doi":"10.1145/3575882.3575945","DOIUrl":null,"url":null,"abstract":"Collective data collection has become common in various domains, including biodiversity science. Multiple individuals work on the same biological samples or specimens using various scientific tools to measure different characteristics. Moreover, the measurements are typically regulated by different data collection procedures and protocols. Integrating and guaranteeing the quality of the data has become a significant issue. One solution is to adopt the RDF (Resource Description Framework) data model in combination with a language for validating RDF graphs such as SHACL (Shapes Constraint Language). The RDF data model provides flexibility in accommodating multiple data schemas, while SHACL uses a set of conditions so called shapes, to validate the RDF data graphs. The remaining challenge is an effective method to define SHACL shapes that can be used to validate any given RDF data. This work introduces a semi-automatic database-driven solution to generate SHACL shapes. The solution relies on the database’s internal structure and data items’ values. The solution was applied to a traits database from natural fiber plants in Indonesia, where a high number of individual shapes were successfully generated. Furthermore, a qualitative evaluation indicated the appropriate quality of the shapes. This work contributes to increasing the quality of biodiversity data collections, which has become an essential factor in Big Biodiversity Data processing.","PeriodicalId":367340,"journal":{"name":"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On Generating SHACL Shapes from Collective Collection of Plant Trait Data\",\"authors\":\"D. R. Saleh, Y. Kartika, Zaenal Akbar, A. Krisnadhi, W. Fatriasari\",\"doi\":\"10.1145/3575882.3575945\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Collective data collection has become common in various domains, including biodiversity science. Multiple individuals work on the same biological samples or specimens using various scientific tools to measure different characteristics. Moreover, the measurements are typically regulated by different data collection procedures and protocols. Integrating and guaranteeing the quality of the data has become a significant issue. One solution is to adopt the RDF (Resource Description Framework) data model in combination with a language for validating RDF graphs such as SHACL (Shapes Constraint Language). The RDF data model provides flexibility in accommodating multiple data schemas, while SHACL uses a set of conditions so called shapes, to validate the RDF data graphs. The remaining challenge is an effective method to define SHACL shapes that can be used to validate any given RDF data. This work introduces a semi-automatic database-driven solution to generate SHACL shapes. The solution relies on the database’s internal structure and data items’ values. The solution was applied to a traits database from natural fiber plants in Indonesia, where a high number of individual shapes were successfully generated. Furthermore, a qualitative evaluation indicated the appropriate quality of the shapes. This work contributes to increasing the quality of biodiversity data collections, which has become an essential factor in Big Biodiversity Data processing.\",\"PeriodicalId\":367340,\"journal\":{\"name\":\"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3575882.3575945\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575882.3575945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

集体数据收集在包括生物多样性科学在内的各个领域都很常见。多个个体使用不同的科学工具来测量相同的生物样本或标本的不同特征。此外,测量通常由不同的数据收集程序和协议进行调节。整合和保证数据质量已成为一个重要的问题。一种解决方案是将RDF(资源描述框架)数据模型与用于验证RDF图的语言(如SHACL(形状约束语言))结合使用。RDF数据模型在容纳多个数据模式方面提供了灵活性,而SHACL使用一组称为形状的条件来验证RDF数据图。剩下的挑战是定义可用于验证任何给定RDF数据的SHACL形状的有效方法。本文介绍了一种半自动数据库驱动的解决方案来生成acl形状。该解决方案依赖于数据库的内部结构和数据项的值。该解决方案被应用于印度尼西亚天然纤维植物的特征数据库,在那里成功地生成了大量的个体形状。此外,定性评价表明形状的质量适当。这项工作有助于提高生物多样性数据收集的质量,这已成为生物多样性大数据处理的重要因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On Generating SHACL Shapes from Collective Collection of Plant Trait Data
Collective data collection has become common in various domains, including biodiversity science. Multiple individuals work on the same biological samples or specimens using various scientific tools to measure different characteristics. Moreover, the measurements are typically regulated by different data collection procedures and protocols. Integrating and guaranteeing the quality of the data has become a significant issue. One solution is to adopt the RDF (Resource Description Framework) data model in combination with a language for validating RDF graphs such as SHACL (Shapes Constraint Language). The RDF data model provides flexibility in accommodating multiple data schemas, while SHACL uses a set of conditions so called shapes, to validate the RDF data graphs. The remaining challenge is an effective method to define SHACL shapes that can be used to validate any given RDF data. This work introduces a semi-automatic database-driven solution to generate SHACL shapes. The solution relies on the database’s internal structure and data items’ values. The solution was applied to a traits database from natural fiber plants in Indonesia, where a high number of individual shapes were successfully generated. Furthermore, a qualitative evaluation indicated the appropriate quality of the shapes. This work contributes to increasing the quality of biodiversity data collections, which has become an essential factor in Big Biodiversity Data processing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信