{"title":"Learning in Biomedicine and Bioinformatics Using Affinity Propagation","authors":"B. Frey","doi":"10.1109/ICMLA.2007.127","DOIUrl":null,"url":null,"abstract":"Data sets arising in biomedicine and bioinformatics are often huge and consist of quite different types of data (eg, sequence data and microarray measurements). Consequently, standard machine learning techniques usually cannot be directly applied. In this talk, I will describe an algorithm called affinity propagation and discuss why it offers flexibility in analyzing the kinds of data sets arising in bioinformatics and biomedicine. I'll describe applications in the areas of whole-genome transcript detection using microarrays, image segmentation, text analysis and motif discovery. Affinity propagation can implemented in a couple dozen lines of MATLAB or C and is suitable for distributed computing environments, making it attractive for high-throughput computations. Research for new biomarkers usually begins with a literature review to identify the mechanisms of action and to define a set of biomarkers that can jointly be used as a panel to characterize the type and stage of a disease. However, the manual search for biomarkers is an increasingly difficult task, since the number of publications is steadily increasing in volume and broadening in terms of complexity and diversity. The PubMed database of publications in biomedical science lists more than 6 million articles from the last 10 years. Currently more than 600k publications are added to the knowledge base every year, making a manual search for information a time consuming task. Even for a single disease, like lung cancer, several thousand related publications are published every year (i.e., in 2007, more than 300 per month on average for lung cancer). To address this challenging task, we have developed a system that can identify structural and longitudinal patterns in the biomedical literature data that support the understanding of trends and relationships between diseases and biomarkers over time. We believe that the information of time is important, since it helps in tracking x when a biomarker has been discovered and how important it has become for the understanding of the disease over time, x if a biomarker has been \" replaced \" or complemented by another, more informative biomarker, x at what time we can see an emerging biomarker that will become relevant for a disease on a broader basis.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2007.127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Data sets arising in biomedicine and bioinformatics are often huge and consist of quite different types of data (eg, sequence data and microarray measurements). Consequently, standard machine learning techniques usually cannot be directly applied. In this talk, I will describe an algorithm called affinity propagation and discuss why it offers flexibility in analyzing the kinds of data sets arising in bioinformatics and biomedicine. I'll describe applications in the areas of whole-genome transcript detection using microarrays, image segmentation, text analysis and motif discovery. Affinity propagation can implemented in a couple dozen lines of MATLAB or C and is suitable for distributed computing environments, making it attractive for high-throughput computations. Research for new biomarkers usually begins with a literature review to identify the mechanisms of action and to define a set of biomarkers that can jointly be used as a panel to characterize the type and stage of a disease. However, the manual search for biomarkers is an increasingly difficult task, since the number of publications is steadily increasing in volume and broadening in terms of complexity and diversity. The PubMed database of publications in biomedical science lists more than 6 million articles from the last 10 years. Currently more than 600k publications are added to the knowledge base every year, making a manual search for information a time consuming task. Even for a single disease, like lung cancer, several thousand related publications are published every year (i.e., in 2007, more than 300 per month on average for lung cancer). To address this challenging task, we have developed a system that can identify structural and longitudinal patterns in the biomedical literature data that support the understanding of trends and relationships between diseases and biomarkers over time. We believe that the information of time is important, since it helps in tracking x when a biomarker has been discovered and how important it has become for the understanding of the disease over time, x if a biomarker has been " replaced " or complemented by another, more informative biomarker, x at what time we can see an emerging biomarker that will become relevant for a disease on a broader basis.