文本挖掘和本体在生物信息学和GIS中的应用

Sixth International Conference on Machine Learning and Applications (ICMLA 2007) Pub Date : 2007-12-13 DOI:10.1109/ICMLA.2007.122

S. Navathe

{"title":"文本挖掘和本体在生物信息学和GIS中的应用","authors":"S. Navathe","doi":"10.1109/ICMLA.2007.122","DOIUrl":null,"url":null,"abstract":"Informatics and computers have not yet become as pervasive in chemistry as they have in physics and biology. Drawing analogies from bioinformatics, key ingredients for progress in chemoinformatics are the availability of large, annotated databases of compounds and reactions, data structures and algorithms to efficiently search these databases, and computational methods to predict the physical, chemical, and biological properties of new compounds and reactions. We will describe the development of: (1) a large public database of compounds and reactions (ChemDB); (2) machine learning kernel methods to predict molecular properties; and (3) the applications of these methods to drug screening/design problems and the identification of new drug leads against a major disease. More broadly, we will discuss some of the challenges and opportunities for computer science, AI, and machine learning in chemistry. Abstract: This talk will present some general problem areas and solutions in two fields of applications of machine learning: bioinformatics and Geographic Information Systems (GIS). The bioinformatics arena is very broad and encompasses many problems such as gene finding in sequences, molecular pathway construction, protein structure prediction etc. We will outline our research on finding important keywords from the biomedical literature by statistical analysis and some natural language analysis. We have also incorporated ontologies such as UMLS (Unified Medical Language System) to determine relationships among biological and medical concepts. The primary goal of this work has been to interpret the long lists of genes that are derived in microarray experiments used to understand and treat diseases. We are able to cluster genes based on their functional similarity. We have also used lists of keywords as feature vectors to drive SVM models for a classification of literature. In particular, we have dealt with the classification of relevant literature for Public health at the CDC (Centers of Disease Control). We will briefly explain the discovery of biomarkers for cancer using a technique that combines SVM and gene ontology.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"Text Mining and Ontology Applications in Bioinformatics and GIS\",\"authors\":\"S. Navathe\",\"doi\":\"10.1109/ICMLA.2007.122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Informatics and computers have not yet become as pervasive in chemistry as they have in physics and biology. Drawing analogies from bioinformatics, key ingredients for progress in chemoinformatics are the availability of large, annotated databases of compounds and reactions, data structures and algorithms to efficiently search these databases, and computational methods to predict the physical, chemical, and biological properties of new compounds and reactions. We will describe the development of: (1) a large public database of compounds and reactions (ChemDB); (2) machine learning kernel methods to predict molecular properties; and (3) the applications of these methods to drug screening/design problems and the identification of new drug leads against a major disease. More broadly, we will discuss some of the challenges and opportunities for computer science, AI, and machine learning in chemistry. Abstract: This talk will present some general problem areas and solutions in two fields of applications of machine learning: bioinformatics and Geographic Information Systems (GIS). The bioinformatics arena is very broad and encompasses many problems such as gene finding in sequences, molecular pathway construction, protein structure prediction etc. We will outline our research on finding important keywords from the biomedical literature by statistical analysis and some natural language analysis. We have also incorporated ontologies such as UMLS (Unified Medical Language System) to determine relationships among biological and medical concepts. The primary goal of this work has been to interpret the long lists of genes that are derived in microarray experiments used to understand and treat diseases. We are able to cluster genes based on their functional similarity. We have also used lists of keywords as feature vectors to drive SVM models for a classification of literature. In particular, we have dealt with the classification of relevant literature for Public health at the CDC (Centers of Disease Control). We will briefly explain the discovery of biomarkers for cancer using a technique that combines SVM and gene ontology.\",\"PeriodicalId\":448863,\"journal\":{\"name\":\"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2007.122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2007.122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

摘要

信息学和计算机在化学中还没有像在物理学和生物学中那样普及。与生物信息学类似，化学信息学取得进展的关键因素是化合物和反应的大型注释数据库的可用性，有效搜索这些数据库的数据结构和算法，以及预测新化合物和反应的物理，化学和生物特性的计算方法。我们将描述以下发展:(1)一个大型化合物和反应公共数据库(ChemDB);(2)机器学习核方法预测分子性质;(3)这些方法在药物筛选/设计问题和识别针对重大疾病的新药物线索方面的应用。更广泛地说，我们将讨论计算机科学、人工智能和化学中机器学习的一些挑战和机遇。摘要:本讲座将介绍生物信息学和地理信息系统(GIS)这两个机器学习应用领域的一些一般问题和解决方案。生物信息学是一个非常广泛的领域，涉及基因序列发现、分子通路构建、蛋白质结构预测等诸多问题。我们将概述我们通过统计分析和一些自然语言分析从生物医学文献中寻找重要关键词的研究。我们还结合了UMLS(统一医学语言系统)等本体来确定生物学和医学概念之间的关系。这项工作的主要目标是解释用于理解和治疗疾病的微阵列实验中衍生的一长串基因。我们能够根据基因的功能相似性对它们进行聚类。我们还使用关键词列表作为特征向量来驱动支持向量机模型对文献进行分类。特别地，我们处理了CDC(疾病控制中心)公共卫生相关文献的分类。我们将简要解释使用支持向量机和基因本体相结合的技术发现癌症生物标志物。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text Mining and Ontology Applications in Bioinformatics and GIS

Informatics and computers have not yet become as pervasive in chemistry as they have in physics and biology. Drawing analogies from bioinformatics, key ingredients for progress in chemoinformatics are the availability of large, annotated databases of compounds and reactions, data structures and algorithms to efficiently search these databases, and computational methods to predict the physical, chemical, and biological properties of new compounds and reactions. We will describe the development of: (1) a large public database of compounds and reactions (ChemDB); (2) machine learning kernel methods to predict molecular properties; and (3) the applications of these methods to drug screening/design problems and the identification of new drug leads against a major disease. More broadly, we will discuss some of the challenges and opportunities for computer science, AI, and machine learning in chemistry. Abstract: This talk will present some general problem areas and solutions in two fields of applications of machine learning: bioinformatics and Geographic Information Systems (GIS). The bioinformatics arena is very broad and encompasses many problems such as gene finding in sequences, molecular pathway construction, protein structure prediction etc. We will outline our research on finding important keywords from the biomedical literature by statistical analysis and some natural language analysis. We have also incorporated ontologies such as UMLS (Unified Medical Language System) to determine relationships among biological and medical concepts. The primary goal of this work has been to interpret the long lists of genes that are derived in microarray experiments used to understand and treat diseases. We are able to cluster genes based on their functional similarity. We have also used lists of keywords as feature vectors to drive SVM models for a classification of literature. In particular, we have dealt with the classification of relevant literature for Public health at the CDC (Centers of Disease Control). We will briefly explain the discovery of biomarkers for cancer using a technique that combines SVM and gene ontology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sixth International Conference on Machine Learning and Applications (ICMLA 2007)

自引率

0.00%

发文量