Ontologies in modelling and analysing of big genetic data.

IF 1 Q3 AGRICULTURE, MULTIDISCIPLINARY

Vavilovskii Zhurnal Genetiki i Selektsii Pub Date : 2024-12-01 DOI:10.18699/vjgb-24-101

N L Podkolodnyy, O A Podkolodnaya, V A Ivanisenko, M A Marchenko

{"title":"Ontologies in modelling and analysing of big genetic data.","authors":"N L Podkolodnyy, O A Podkolodnaya, V A Ivanisenko, M A Marchenko","doi":"10.18699/vjgb-24-101","DOIUrl":null,"url":null,"abstract":"<p><p>To systematize and effectively use the huge volume of experimental data accumulated in the field of bioinformatics and biomedicine, new approaches based on ontologies are needed, including automated methods for semantic integration of heterogeneous experimental data, methods for creating large knowledge bases and self-interpreting methods for analyzing large heterogeneous data based on deep learning. The article briefly presents the features of the subject area (bioinformatics, systems biology, biomedicine), formal definitions of the concept of ontology and knowledge graphs, as well as examples of using ontologies for semantic integration of heterogeneous data and creating large knowledge bases, as well as interpreting the results of deep learning on big data. As an example of a successful project, the Gene Ontology knowledge base is described, which not only includes terminological knowledge and gene ontology annotations (GOA), but also causal influence models (GO-CAM). This makes it useful not only for genomic biology, but also for systems biology, as well as for interpreting large-scale experimental data. An approach to building large ontologies using design patterns is discussed, using the ontology of biological attributes (OBA) as an example. Here, most of the classification is automatically computed based on previously created reference ontologies using automated inference, except for a small number of high-level concepts. One of the main problems of deep learning is the lack of interpretability, since neural networks often function as \"black boxes\" unable to explain their decisions. This paper describes approaches to creating methods for interpreting deep learning models and presents two examples of self-explanatory ontology-based deep learning models: (1) Deep GONet, which integrates Gene Ontology into a hierarchical neural network architecture, where each neuron represents a biological function. Experiments on cancer diagnostic datasets show that Deep GONet is easily interpretable and has high performance in distinguishing cancerous and non-cancerous samples. (2) ONN4MST, which uses biome ontologies to trace microbial sources of samples whose niches were previously poorly studied or unknown, detecting microbial contaminants. ONN4MST can distinguish samples from ontologically similar biomes, thus offering a quantitative way to characterize the evolution of the human gut microbial community. Both examples demonstrate high performance and interpretability, making them valuable tools for analyzing and interpreting big data in biology.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"28 8","pages":"940-949"},"PeriodicalIF":1.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11813802/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vavilovskii Zhurnal Genetiki i Selektsii","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18699/vjgb-24-101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

To systematize and effectively use the huge volume of experimental data accumulated in the field of bioinformatics and biomedicine, new approaches based on ontologies are needed, including automated methods for semantic integration of heterogeneous experimental data, methods for creating large knowledge bases and self-interpreting methods for analyzing large heterogeneous data based on deep learning. The article briefly presents the features of the subject area (bioinformatics, systems biology, biomedicine), formal definitions of the concept of ontology and knowledge graphs, as well as examples of using ontologies for semantic integration of heterogeneous data and creating large knowledge bases, as well as interpreting the results of deep learning on big data. As an example of a successful project, the Gene Ontology knowledge base is described, which not only includes terminological knowledge and gene ontology annotations (GOA), but also causal influence models (GO-CAM). This makes it useful not only for genomic biology, but also for systems biology, as well as for interpreting large-scale experimental data. An approach to building large ontologies using design patterns is discussed, using the ontology of biological attributes (OBA) as an example. Here, most of the classification is automatically computed based on previously created reference ontologies using automated inference, except for a small number of high-level concepts. One of the main problems of deep learning is the lack of interpretability, since neural networks often function as "black boxes" unable to explain their decisions. This paper describes approaches to creating methods for interpreting deep learning models and presents two examples of self-explanatory ontology-based deep learning models: (1) Deep GONet, which integrates Gene Ontology into a hierarchical neural network architecture, where each neuron represents a biological function. Experiments on cancer diagnostic datasets show that Deep GONet is easily interpretable and has high performance in distinguishing cancerous and non-cancerous samples. (2) ONN4MST, which uses biome ontologies to trace microbial sources of samples whose niches were previously poorly studied or unknown, detecting microbial contaminants. ONN4MST can distinguish samples from ontologically similar biomes, thus offering a quantitative way to characterize the evolution of the human gut microbial community. Both examples demonstrate high performance and interpretability, making them valuable tools for analyzing and interpreting big data in biology.

Abstract Image

查看原文本刊更多论文

大遗传数据建模与分析中的本体。

为了使生物信息学和生物医学领域积累的大量实验数据系统化和有效利用，需要基于本体的新方法，包括异构实验数据语义集成的自动化方法、大型知识库的创建方法和基于深度学习的大型异构数据分析的自解释方法。本文简要介绍了学科领域（生物信息学、系统生物学、生物医学）的特点，本体和知识图概念的正式定义，以及使用本体进行异构数据的语义集成和创建大型知识库的示例，以及对大数据深度学习结果的解释。以一个成功的项目为例，描述了基因本体知识库，该知识库不仅包括术语知识和基因本体注释（GOA），还包括因果影响模型（GO-CAM）。这使得它不仅对基因组生物学有用，对系统生物学也有用，对解释大规模实验数据也有用。本文以生物属性本体（OBA）为例，讨论了使用设计模式构建大型本体的方法。在这里，除了少数高级概念外，大多数分类都是基于先前创建的引用本体使用自动推理自动计算的。深度学习的主要问题之一是缺乏可解释性，因为神经网络通常作为“黑盒子”运行，无法解释它们的决定。本文描述了创建解释深度学习模型的方法，并给出了两个基于自解释本体的深度学习模型的例子：(1)深度GONet，它将基因本体集成到分层神经网络架构中，其中每个神经元代表一个生物功能。在癌症诊断数据集上的实验表明，Deep GONet易于解释，在区分癌变和非癌变样本方面具有很高的性能。(2) ONN4MST，利用生物群系本体追踪微生物来源的样品，其生态位以前研究很少或未知，检测微生物污染物。ONN4MST可以将样品与本体相似的生物群系区分开来，从而为表征人类肠道微生物群落的进化提供了一种定量方法。这两个例子都展示了高性能和可解释性，使它们成为分析和解释生物学大数据的宝贵工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Vavilovskii Zhurnal Genetiki i Selektsii AGRICULTURE, MULTIDISCIPLINARY-

CiteScore

1.90

自引率

0.00%

发文量

119

审稿时长

8 weeks

期刊介绍： The "Vavilov Journal of genetics and breeding" publishes original research and review articles in all key areas of modern plant, animal and human genetics, genomics, bioinformatics and biotechnology. One of the main objectives of the journal is integration of theoretical and applied research in the field of genetics. Special attention is paid to the most topical areas in modern genetics dealing with global concerns such as food security and human health.