D. R. Saleh, Y. Kartika, Zaenal Akbar, A. Krisnadhi, W. Fatriasari
{"title":"基于植物性状数据集合生成SHACL形状的研究","authors":"D. R. Saleh, Y. Kartika, Zaenal Akbar, A. Krisnadhi, W. Fatriasari","doi":"10.1145/3575882.3575945","DOIUrl":null,"url":null,"abstract":"Collective data collection has become common in various domains, including biodiversity science. Multiple individuals work on the same biological samples or specimens using various scientific tools to measure different characteristics. Moreover, the measurements are typically regulated by different data collection procedures and protocols. Integrating and guaranteeing the quality of the data has become a significant issue. One solution is to adopt the RDF (Resource Description Framework) data model in combination with a language for validating RDF graphs such as SHACL (Shapes Constraint Language). The RDF data model provides flexibility in accommodating multiple data schemas, while SHACL uses a set of conditions so called shapes, to validate the RDF data graphs. The remaining challenge is an effective method to define SHACL shapes that can be used to validate any given RDF data. This work introduces a semi-automatic database-driven solution to generate SHACL shapes. The solution relies on the database’s internal structure and data items’ values. The solution was applied to a traits database from natural fiber plants in Indonesia, where a high number of individual shapes were successfully generated. Furthermore, a qualitative evaluation indicated the appropriate quality of the shapes. This work contributes to increasing the quality of biodiversity data collections, which has become an essential factor in Big Biodiversity Data processing.","PeriodicalId":367340,"journal":{"name":"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On Generating SHACL Shapes from Collective Collection of Plant Trait Data\",\"authors\":\"D. R. Saleh, Y. Kartika, Zaenal Akbar, A. Krisnadhi, W. Fatriasari\",\"doi\":\"10.1145/3575882.3575945\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Collective data collection has become common in various domains, including biodiversity science. Multiple individuals work on the same biological samples or specimens using various scientific tools to measure different characteristics. Moreover, the measurements are typically regulated by different data collection procedures and protocols. Integrating and guaranteeing the quality of the data has become a significant issue. One solution is to adopt the RDF (Resource Description Framework) data model in combination with a language for validating RDF graphs such as SHACL (Shapes Constraint Language). The RDF data model provides flexibility in accommodating multiple data schemas, while SHACL uses a set of conditions so called shapes, to validate the RDF data graphs. The remaining challenge is an effective method to define SHACL shapes that can be used to validate any given RDF data. This work introduces a semi-automatic database-driven solution to generate SHACL shapes. The solution relies on the database’s internal structure and data items’ values. The solution was applied to a traits database from natural fiber plants in Indonesia, where a high number of individual shapes were successfully generated. Furthermore, a qualitative evaluation indicated the appropriate quality of the shapes. This work contributes to increasing the quality of biodiversity data collections, which has become an essential factor in Big Biodiversity Data processing.\",\"PeriodicalId\":367340,\"journal\":{\"name\":\"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3575882.3575945\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575882.3575945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On Generating SHACL Shapes from Collective Collection of Plant Trait Data
Collective data collection has become common in various domains, including biodiversity science. Multiple individuals work on the same biological samples or specimens using various scientific tools to measure different characteristics. Moreover, the measurements are typically regulated by different data collection procedures and protocols. Integrating and guaranteeing the quality of the data has become a significant issue. One solution is to adopt the RDF (Resource Description Framework) data model in combination with a language for validating RDF graphs such as SHACL (Shapes Constraint Language). The RDF data model provides flexibility in accommodating multiple data schemas, while SHACL uses a set of conditions so called shapes, to validate the RDF data graphs. The remaining challenge is an effective method to define SHACL shapes that can be used to validate any given RDF data. This work introduces a semi-automatic database-driven solution to generate SHACL shapes. The solution relies on the database’s internal structure and data items’ values. The solution was applied to a traits database from natural fiber plants in Indonesia, where a high number of individual shapes were successfully generated. Furthermore, a qualitative evaluation indicated the appropriate quality of the shapes. This work contributes to increasing the quality of biodiversity data collections, which has become an essential factor in Big Biodiversity Data processing.