Using SPARQL to Test for Lattices: application to quality assurance in biomedical ontologies.

Guo-Qiang Zhang, Olivier Bodenreider
{"title":"Using SPARQL to Test for Lattices: application to quality assurance in biomedical ontologies.","authors":"Guo-Qiang Zhang,&nbsp;Olivier Bodenreider","doi":"10.1007/978-3-642-17749-1_18","DOIUrl":null,"url":null,"abstract":"<p><p>We present a scalable, SPARQL-based computational pipeline for testing the lattice-theoretic properties of partial orders represented as RDF triples. The use case for this work is quality assurance in biomedical ontologies, one desirable property of which is conformance to lattice structures. At the core of our pipeline is the algorithm called <i>NuMi</i>, for detecting the <i>Nu</i>mber of <i>Mi</i>nimal upper bounds of any pair of elements in a given finite partial order. Our technical contribution is the coding of <i>NuMi</i> completely in SPARQL. To show its scalability, we applied <i>NuMi</i> to the entirety of SNOMED CT, the largest clinical ontology (over 300,000 conepts). Our experimental results have been groundbreaking: for the first time, all non-lattice pairs in SNOMED CT have been identified exhaustively from 34 million candidate pairs using over 2.5 billion queries issued to Virtuoso. The percentage of non-lattice pairs ranges from 0 to 1.66 among the 19 SNOMED CT hierarchies. These non-lattice pairs represent target areas for focused curation by domain experts. RDF, SPARQL and related tooling provide an e cient platform for implementing lattice algorithms on large data structures.</p>","PeriodicalId":90988,"journal":{"name":"The semantic Web--ISWC ... : ... International Semantic Web Conference ... proceedings. International Semantic Web Conference","volume":"6497 ","pages":"273-288"},"PeriodicalIF":0.0000,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4330995/pdf/nihms-654705.pdf","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The semantic Web--ISWC ... : ... International Semantic Web Conference ... proceedings. International Semantic Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-642-17749-1_18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

Abstract

We present a scalable, SPARQL-based computational pipeline for testing the lattice-theoretic properties of partial orders represented as RDF triples. The use case for this work is quality assurance in biomedical ontologies, one desirable property of which is conformance to lattice structures. At the core of our pipeline is the algorithm called NuMi, for detecting the Number of Minimal upper bounds of any pair of elements in a given finite partial order. Our technical contribution is the coding of NuMi completely in SPARQL. To show its scalability, we applied NuMi to the entirety of SNOMED CT, the largest clinical ontology (over 300,000 conepts). Our experimental results have been groundbreaking: for the first time, all non-lattice pairs in SNOMED CT have been identified exhaustively from 34 million candidate pairs using over 2.5 billion queries issued to Virtuoso. The percentage of non-lattice pairs ranges from 0 to 1.66 among the 19 SNOMED CT hierarchies. These non-lattice pairs represent target areas for focused curation by domain experts. RDF, SPARQL and related tooling provide an e cient platform for implementing lattice algorithms on large data structures.

Abstract Image

Abstract Image

Abstract Image

使用SPARQL测试格:应用于生物医学本体的质量保证。
我们提出了一个可扩展的、基于sparql的计算管道,用于测试用RDF三元组表示的部分顺序的格理论属性。这项工作的用例是生物医学本体的质量保证,其中一个理想的特性是符合晶格结构。流水线的核心是称为NuMi的算法,用于在给定的有限偏序中检测任何一对元素的最小上界的个数。我们的技术贡献是完全用SPARQL编码NuMi。为了展示它的可扩展性,我们将NuMi应用于最大的临床本体(超过30万个概念)SNOMED CT的整体。我们的实验结果是开创性的:首次使用向Virtuoso发出的超过25亿次查询,从3400万对候选对中详尽地确定了SNOMED CT中的所有非晶格对。在19个SNOMED CT层次中,非晶格对的百分比从0到1.66不等。这些非晶格对代表了领域专家集中管理的目标区域。RDF、SPARQL和相关工具为在大型数据结构上实现点阵算法提供了一个高效的平台。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信