Large-scale correlation mining for biomolecular network discovery

A. Hero, B. Rajaratnam
{"title":"Large-scale correlation mining for biomolecular network discovery","authors":"A. Hero, B. Rajaratnam","doi":"10.1017/CBO9781316162750.016","DOIUrl":null,"url":null,"abstract":"Continuing advances in high-throughput mRNA probing, gene sequencing and microscopic imaging technology is producing a wealth of biomarker data on many different living organisms and conditions. Scientists hope that increasing amounts of relevant data will eventually lead to better understanding of the network of interactions between the thousands of molecules that regulate these organisms. Thus progress in understanding the biological science has become increasingly dependent on progress in understanding the data science. Data mining tools have been of particular relevance since they can sometimes be used to effectively separate the “wheat” from the “chaff”, winnowing the massive amount of data down to a few important data dimensions. Correlation mining is a data mining tool that is particularly useful for probing statistical correlations between biomarkers and recovering properties of their correlation networks. However, since the number of correlations between biomarkers is quadratically larger than the number biomarkers, the scalability of correlation mining in the big data setting becomes an issue. Furthermore, there are phase transitions that govern the correlation mining discoveries that must be understood in order for these discoveries to be reliable and of high confidence. This is especially important to understand at big data scales where the number of samples is fixed and the number of biomarkers becomes unbounded, a sampling regime referred to as the ”purely-high dimensional setting.” In this chapter, we will discuss some of the main advances and challenges in correlation mining in the context of large scale biomolecular networks with a focus on medicine. A new correlation mining application will be introduced: discovery of correlation sign flips between edges in a pair of correlation or partial correlation networks. The pair of networks could respectively correspond to a disease (or treatment) group and a control group. This paper is to appear as a chapter in the book Big Data over Networks from Cambridge University Press (ISBN: 9781107099005). 4 Large scale correlation mining for biomolecular network discovery","PeriodicalId":415319,"journal":{"name":"Big Data over Networks","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data over Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/CBO9781316162750.016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Continuing advances in high-throughput mRNA probing, gene sequencing and microscopic imaging technology is producing a wealth of biomarker data on many different living organisms and conditions. Scientists hope that increasing amounts of relevant data will eventually lead to better understanding of the network of interactions between the thousands of molecules that regulate these organisms. Thus progress in understanding the biological science has become increasingly dependent on progress in understanding the data science. Data mining tools have been of particular relevance since they can sometimes be used to effectively separate the “wheat” from the “chaff”, winnowing the massive amount of data down to a few important data dimensions. Correlation mining is a data mining tool that is particularly useful for probing statistical correlations between biomarkers and recovering properties of their correlation networks. However, since the number of correlations between biomarkers is quadratically larger than the number biomarkers, the scalability of correlation mining in the big data setting becomes an issue. Furthermore, there are phase transitions that govern the correlation mining discoveries that must be understood in order for these discoveries to be reliable and of high confidence. This is especially important to understand at big data scales where the number of samples is fixed and the number of biomarkers becomes unbounded, a sampling regime referred to as the ”purely-high dimensional setting.” In this chapter, we will discuss some of the main advances and challenges in correlation mining in the context of large scale biomolecular networks with a focus on medicine. A new correlation mining application will be introduced: discovery of correlation sign flips between edges in a pair of correlation or partial correlation networks. The pair of networks could respectively correspond to a disease (or treatment) group and a control group. This paper is to appear as a chapter in the book Big Data over Networks from Cambridge University Press (ISBN: 9781107099005). 4 Large scale correlation mining for biomolecular network discovery
生物分子网络发现的大规模关联挖掘
高通量mRNA探测、基因测序和显微成像技术的不断进步,为许多不同的生物体和条件提供了丰富的生物标志物数据。科学家们希望,越来越多的相关数据将最终使人们更好地了解调节这些生物体的数千个分子之间的相互作用网络。因此,理解生物科学的进步越来越依赖于理解数据科学的进步。数据挖掘工具具有特别的相关性,因为它们有时可以用来有效地将“小麦”从“糠”中分离出来,将大量数据筛选到几个重要的数据维度。相关性挖掘是一种数据挖掘工具,对于探测生物标志物之间的统计相关性和恢复其相关网络的特性特别有用。然而,由于生物标记物之间的关联数量是二次大于生物标记物数量的,因此在大数据环境下,关联挖掘的可扩展性成为一个问题。此外,为了使这些发现具有可靠性和高可信度,必须了解控制相关采矿发现的相变。在样本数量固定且生物标记物数量无界的大数据尺度下,理解这一点尤其重要,这种采样制度被称为“纯高维设置”。在本章中,我们将讨论以医学为重点的大规模生物分子网络背景下相关挖掘的一些主要进展和挑战。本文将介绍一种新的关联挖掘应用:在一对相关或部分相关网络中发现边缘之间的关联符号翻转。这对网络可以分别对应一个疾病(或治疗)组和一个对照组。本文将作为剑桥大学出版社(ISBN: 9781107099005)出版的《网络上的大数据》一书的一章出现。4生物分子网络发现的大规模关联挖掘
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信