Data and Text Mining in Bioinformatics最新文献_第4页

Predicting baby feeding method from unstructured electronic health record data 从非结构化电子健康记录数据预测婴儿喂养方法

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390075

A. Rao, K. Maiden, Ben Carterette, Deborah B. Ehrenthal

引用次数: 7

Session details: Mining clinical data and text 会议细节:挖掘临床数据和文本

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/3260181

Hua Xu

引用次数: 0

Inferring appropriate eligibility criteria in clinical trial protocols without labeled data 在没有标记数据的临床试验方案中推断适当的资格标准

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390074

Angelo C. Restificar, S. Ananiadou

{"title":"Inferring appropriate eligibility criteria in clinical trial protocols without labeled data","authors":"Angelo C. Restificar, S. Ananiadou","doi":"10.1145/2390068.2390074","DOIUrl":"https://doi.org/10.1145/2390068.2390074","url":null,"abstract":"We consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D', |D'|<<|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. We view a document as a mixture of latent topics and our method exploits this by applying a three-step procedure. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA) [3]. Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. Results from our experiments indicate that our proposed method is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"38 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133352105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Indexing methods for efficient protein 3D surface search 高效蛋白质三维表面搜索的索引方法

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390078

Sungchul Kim, Lee Sael, Hwanjo Yu

{"title":"Indexing methods for efficient protein 3D surface search","authors":"Sungchul Kim, Lee Sael, Hwanjo Yu","doi":"10.1145/2390068.2390078","DOIUrl":"https://doi.org/10.1145/2390068.2390078","url":null,"abstract":"This paper exploits efficient indexing techniques for protein structure search where protein structures are represented as vectors by 3D-Zernike Descriptor (3DZD). 3DZD compactly represents a surface shape of protein tertiary structure as a vector, and the simplified representation accelerates the structural search. However, further speed up is needed to address the scenarios where multiple users access the database simultaneously. We address this need for further speed up in protein structural search by exploiting two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. The results show that both iDistance and iKernel significantly enhance the searching speed. In addition, we introduce an extended approach for protein structure search based on indexing techniques that use the 3DZD characteristic. In the extended approach, index structure is constructured using only the first few of the numbers in the 3DZDs. To find the top-k similar structures, first top-10 x k similar structures are selected using the reduced index structure, then top-k structures are selected using similarity measure of full 3DZDs of the selected structures. Using the indexing techniques, the searching time reduced 69.6% using iDistance, 77% using iKernel, 77.4% using extended iDistance, and 87.9% using extended iKernel method.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126628812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Lexicon-free and context-free drug names identification methods using hidden markov models and pointwise mutual information 使用隐马尔可夫模型和点互信息的无词典和无上下文的药品名称识别方法

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390072

Jacek Małyszko, A. Filipowska

引用次数: 3

Session details: Mining biological data and text 会议细节:挖掘生物数据和文本

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/3260182

Min Song

引用次数: 0

Extracting structured information from free-text medication prescriptions using dependencies 使用依赖关系从自由文本药物处方中提取结构化信息

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390076

Andrew D. MacKinlay, Karin M. Verspoor

引用次数: 8

Rule-based whole body modeling for analyzing multi-compound effects 基于规则的多复合效果分析全身建模

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390083

W. Hwang, Y. Hwang, Sunjae Lee, Doheon Lee

{"title":"Rule-based whole body modeling for analyzing multi-compound effects","authors":"W. Hwang, Y. Hwang, Sunjae Lee, Doheon Lee","doi":"10.1145/2390068.2390083","DOIUrl":"https://doi.org/10.1145/2390068.2390083","url":null,"abstract":"Essential reasons including robustness, redundancy, and crosstalk of biological systems, have been reported to explain the limited efficacy and unexpected side-effects of drugs. Many pharmaceutical laboratories have begun to develop multi-compound drugs to remedy this situation, and some of them have shown successful clinical results. Simultaneous application of multiple compounds could increase efficacy as well as reduce side-effects through pharmacodynamics and pharmacokinetic interactions. However, such approach requires overwhelming cost of preclinical experiments and tests as the number of possible combinations of compound dosages increases exponentially. Computer model-based experiments have been emerging as one of the most promising solutions to cope with such complexity. Though there have been many efforts to model specific molecular pathways using qualitative and quantitative formalisms, they suffer from unexpected results caused by distant interactions beyond their localized models.\u0000 Here we propose a rule-based whole-body modeling platform. We have tested this platform with Type 2 diabetes (T2D) model, which involves the malfunction of numerous organs such as pancreas, circulation system, liver, and muscle. We have extracted T2D-related 117 rules by manual curation from literature and different types of existing models. The results of our simulation show drug effect pathways of T2D drugs and how combination of drugs could work on the whole-body scale. We expect that it would provide the insight for identifying effective combination of drugs and its mechanism for the drug development.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121701244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Protein complex prediction via bottleneck-based graph partitioning 基于瓶颈图划分的蛋白质复合体预测

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390079

Jaegyoon Ahn, D. Lee, Youngmi Yoon, Yunku Yeu, Sanghyun Park

引用次数: 3

Clinical entity recognition using structural support vector machines with rich features 特征丰富的结构支持向量机临床实体识别

Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390073

Buzhou Tang, Yonghui Wu, Min Jiang, Hua Xu

{"title":"Clinical entity recognition using structural support vector machines with rich features","authors":"Buzhou Tang, Yonghui Wu, Min Jiang, Hua Xu","doi":"10.1145/2390068.2390073","DOIUrl":"https://doi.org/10.1145/2390068.2390073","url":null,"abstract":"Named entity recognition (NER) is an important task for natural language processing (NLP) of clinical text. Conditional Random Fields (CRFs), a sequential labeling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to NER tasks, including clinical entity recognition. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, has not been investigated for clinical text processing. In this study, we applied the SSVMs algorithm to the Concept Extraction task of the 2010 i2b2 clinical NLP challenge, which was to recognize entities of medical problems, treatments, and tests from hospital discharge summaries. Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER system required less training time, while achieved better performance than the CRFs-based system for clinical entity recognition, when same features were used. Our study also demonstrated that rich features such as unsupervised word representations improved the performance of clinical entity recognition. When rich features were integrated with SSVMs, our system achieved a highest F-measure of 85.74% on the test set of 2010 i2b2 NLP challenge, which outperformed the best system reported in the challenge by 0.5%.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129035767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58