Learning in Biomedicine and Bioinformatics Using Affinity Propagation

Sixth International Conference on Machine Learning and Applications (ICMLA 2007) Pub Date : 1900-01-01 DOI:10.1109/ICMLA.2007.127

B. Frey

{"title":"Learning in Biomedicine and Bioinformatics Using Affinity Propagation","authors":"B. Frey","doi":"10.1109/ICMLA.2007.127","DOIUrl":null,"url":null,"abstract":"Data sets arising in biomedicine and bioinformatics are often huge and consist of quite different types of data (eg, sequence data and microarray measurements). Consequently, standard machine learning techniques usually cannot be directly applied. In this talk, I will describe an algorithm called affinity propagation and discuss why it offers flexibility in analyzing the kinds of data sets arising in bioinformatics and biomedicine. I'll describe applications in the areas of whole-genome transcript detection using microarrays, image segmentation, text analysis and motif discovery. Affinity propagation can implemented in a couple dozen lines of MATLAB or C and is suitable for distributed computing environments, making it attractive for high-throughput computations. Research for new biomarkers usually begins with a literature review to identify the mechanisms of action and to define a set of biomarkers that can jointly be used as a panel to characterize the type and stage of a disease. However, the manual search for biomarkers is an increasingly difficult task, since the number of publications is steadily increasing in volume and broadening in terms of complexity and diversity. The PubMed database of publications in biomedical science lists more than 6 million articles from the last 10 years. Currently more than 600k publications are added to the knowledge base every year, making a manual search for information a time consuming task. Even for a single disease, like lung cancer, several thousand related publications are published every year (i.e., in 2007, more than 300 per month on average for lung cancer). To address this challenging task, we have developed a system that can identify structural and longitudinal patterns in the biomedical literature data that support the understanding of trends and relationships between diseases and biomarkers over time. We believe that the information of time is important, since it helps in tracking x when a biomarker has been discovered and how important it has become for the understanding of the disease over time, x if a biomarker has been \" replaced \" or complemented by another, more informative biomarker, x at what time we can see an emerging biomarker that will become relevant for a disease on a broader basis.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2007.127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Data sets arising in biomedicine and bioinformatics are often huge and consist of quite different types of data (eg, sequence data and microarray measurements). Consequently, standard machine learning techniques usually cannot be directly applied. In this talk, I will describe an algorithm called affinity propagation and discuss why it offers flexibility in analyzing the kinds of data sets arising in bioinformatics and biomedicine. I'll describe applications in the areas of whole-genome transcript detection using microarrays, image segmentation, text analysis and motif discovery. Affinity propagation can implemented in a couple dozen lines of MATLAB or C and is suitable for distributed computing environments, making it attractive for high-throughput computations. Research for new biomarkers usually begins with a literature review to identify the mechanisms of action and to define a set of biomarkers that can jointly be used as a panel to characterize the type and stage of a disease. However, the manual search for biomarkers is an increasingly difficult task, since the number of publications is steadily increasing in volume and broadening in terms of complexity and diversity. The PubMed database of publications in biomedical science lists more than 6 million articles from the last 10 years. Currently more than 600k publications are added to the knowledge base every year, making a manual search for information a time consuming task. Even for a single disease, like lung cancer, several thousand related publications are published every year (i.e., in 2007, more than 300 per month on average for lung cancer). To address this challenging task, we have developed a system that can identify structural and longitudinal patterns in the biomedical literature data that support the understanding of trends and relationships between diseases and biomarkers over time. We believe that the information of time is important, since it helps in tracking x when a biomarker has been discovered and how important it has become for the understanding of the disease over time, x if a biomarker has been " replaced " or complemented by another, more informative biomarker, x at what time we can see an emerging biomarker that will become relevant for a disease on a broader basis.

查看原文本刊更多论文

利用亲和传播学习生物医学和生物信息学

生物医学和生物信息学中产生的数据集通常是巨大的，并且由不同类型的数据组成(例如，序列数据和微阵列测量)。因此，标准的机器学习技术通常不能直接应用。在这次演讲中，我将描述一种称为亲和传播的算法，并讨论为什么它在分析生物信息学和生物医学中出现的各种数据集时提供了灵活性。我将描述在使用微阵列、图像分割、文本分析和基序发现的全基因组转录检测领域的应用。亲和传播可以在几十行MATLAB或C中实现，并且适用于分布式计算环境，使其对高吞吐量计算具有吸引力。新生物标志物的研究通常从文献综述开始，以确定作用机制并定义一组生物标志物，这些生物标志物可以共同用作表征疾病类型和阶段的面板。然而，人工搜索生物标记物是一项越来越困难的任务，因为出版物的数量在数量上稳步增加，在复杂性和多样性方面也在扩大。PubMed的生物医学出版物数据库列出了过去10年里的600多万篇文章。目前，每年有超过60万份出版物被添加到知识库中，这使得手动搜索信息成为一项耗时的任务。即使是肺癌这样的单一疾病，每年也发表数千份相关出版物(即2007年，肺癌平均每月发表300多份)。为了解决这一具有挑战性的任务，我们开发了一个系统，可以识别生物医学文献数据中的结构和纵向模式，以支持对疾病和生物标志物之间随时间变化的趋势和关系的理解。我们认为时间信息很重要，因为它有助于跟踪x，当一种生物标志物被发现时，随着时间的推移，它对理解疾病的重要性有多大;x，如果一种生物标志物被另一种信息更丰富的生物标志物“取代”或补充，x，在什么时候我们可以看到一种新兴的生物标志物，它将在更广泛的基础上与疾病相关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Sixth International Conference on Machine Learning and Applications (ICMLA 2007)

自引率

0.00%

发文量