Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data

IF 2.4 3区 生物学 Q3 BIOCHEMICAL RESEARCH METHODS
Xinpeng Guo, Yafei Song, Dongyan Xu, Xueping Jin, Xuequn Shang
{"title":"Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data","authors":"Xinpeng Guo, Yafei Song, Dongyan Xu, Xueping Jin, Xuequn Shang","doi":"10.2174/0115748936276861240109045208","DOIUrl":null,"url":null,"abstract":"Background: When using clinical data for multi-omics analysis, there are issues such as the insufficient number of omics data types and relatively small sample size due to the protection of patients' privacy, the requirements of data management by various institutions, and the relatively large number of features of each omics data. This paper describes the analysis of multi-omics pathway relationships using statistical data in the absence of clinical data. Methods: We proposed a novel approach to exploit easily accessible statistics in public databases. This approach introduces phenotypic associations that are not included in the clinical data and uses these data to build a three-layer heterogeneous network. To simplify the analysis, we decomposed the three-layer network into double two-layer networks to predict the weights of the inter-layer associations. By adding a hyperparameter β, the weights of the two layers of the network were merged, and then k-fold cross-validation was used to evaluate the accuracy of this method. In calculating the weights of the two-layer networks, the RWR with fixed restart probability was combined with PBMDA and CIPHER to generate the PCRWR with biased weights and improved accuracy. Results: The area under the receiver operating characteristic curve was increased by approximately 7% in the case of the RWR with initial weights. Conclusion: Multi-omics statistical data were used to establish genotype and phenotype correlation networks for analysis, which was similar to the effect of clinical multi-omics analysis.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"12 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.2174/0115748936276861240109045208","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: When using clinical data for multi-omics analysis, there are issues such as the insufficient number of omics data types and relatively small sample size due to the protection of patients' privacy, the requirements of data management by various institutions, and the relatively large number of features of each omics data. This paper describes the analysis of multi-omics pathway relationships using statistical data in the absence of clinical data. Methods: We proposed a novel approach to exploit easily accessible statistics in public databases. This approach introduces phenotypic associations that are not included in the clinical data and uses these data to build a three-layer heterogeneous network. To simplify the analysis, we decomposed the three-layer network into double two-layer networks to predict the weights of the inter-layer associations. By adding a hyperparameter β, the weights of the two layers of the network were merged, and then k-fold cross-validation was used to evaluate the accuracy of this method. In calculating the weights of the two-layer networks, the RWR with fixed restart probability was combined with PBMDA and CIPHER to generate the PCRWR with biased weights and improved accuracy. Results: The area under the receiver operating characteristic curve was increased by approximately 7% in the case of the RWR with initial weights. Conclusion: Multi-omics statistical data were used to establish genotype and phenotype correlation networks for analysis, which was similar to the effect of clinical multi-omics analysis.
基于多组学统计数据的基因型与表型关联分析
背景:在利用临床数据进行多组学分析时,由于患者隐私的保护、各机构对数据管理的要求以及各组学数据特征相对较多等原因,存在组学数据类型不够多、样本量相对较小等问题。本文介绍了在没有临床数据的情况下,利用统计数据对多组学通路关系进行分析的方法。方法:我们提出了一种利用公共数据库中易于获取的统计数据的新方法。这种方法引入了临床数据中未包含的表型关联,并利用这些数据构建了一个三层异构网络。为简化分析,我们将三层网络分解为双层网络,以预测层间关联的权重。通过添加一个超参数 β,合并两层网络的权重,然后使用 k 倍交叉验证来评估这种方法的准确性。在计算两层网络的权重时,将具有固定重启概率的 RWR 与 PBMDA 和 CIPHER 结合起来,生成了具有偏置权重的 PCRWR,并提高了准确性。结果带有初始权重的 RWR 的接收器工作特征曲线下面积增加了约 7%。结论利用多组学统计数据建立基因型和表型相关网络进行分析,其效果与临床多组学分析相似。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Current Bioinformatics
Current Bioinformatics 生物-生化研究方法
CiteScore
6.60
自引率
2.50%
发文量
77
审稿时长
>12 weeks
期刊介绍: Current Bioinformatics aims to publish all the latest and outstanding developments in bioinformatics. Each issue contains a series of timely, in-depth/mini-reviews, research papers and guest edited thematic issues written by leaders in the field, covering a wide range of the integration of biology with computer and information science. The journal focuses on advances in computational molecular/structural biology, encompassing areas such as computing in biomedicine and genomics, computational proteomics and systems biology, and metabolic pathway engineering. Developments in these fields have direct implications on key issues related to health care, medicine, genetic disorders, development of agricultural products, renewable energy, environmental protection, etc.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信