Data and Text Mining in Bioinformatics最新文献

筛选
英文 中文
Predicting baby feeding method from unstructured electronic health record data 从非结构化电子健康记录数据预测婴儿喂养方法
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390075
A. Rao, K. Maiden, Ben Carterette, Deborah B. Ehrenthal
{"title":"Predicting baby feeding method from unstructured electronic health record data","authors":"A. Rao, K. Maiden, Ben Carterette, Deborah B. Ehrenthal","doi":"10.1145/2390068.2390075","DOIUrl":"https://doi.org/10.1145/2390068.2390075","url":null,"abstract":"Obesity is one of the most important health concerns in United States and is playing an important role in rising rates of chronic health conditions and health care costs. The percentage of the US population affected with childhood obesity and adult obesity has been on a constant upward linear trend for past few decades. According to Center for Disease control and prevention 35.7% of US adults are obese and 17% of children aged 2-19 years are obese. Researchers and health care providers in the US and the rest of world studying obesity are interested in factors affecting obesity. One such interesting factor potentially related to development of obesity is type of feeding provided to babies. In this work we describe an electronic health record (EHR) data set of babies with feeding method contained in the narrative portion of the record. We compare five supervised machine learning algorithms for predicting feeding method as a discrete value based on text in the field. We also compare these algorithms in terms of the classification error and prediction probability estimates generated by them.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123464641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Session details: Mining clinical data and text 会议细节:挖掘临床数据和文本
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/3260181
Hua Xu
{"title":"Session details: Mining clinical data and text","authors":"Hua Xu","doi":"10.1145/3260181","DOIUrl":"https://doi.org/10.1145/3260181","url":null,"abstract":"","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123565459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Indexing methods for efficient protein 3D surface search 高效蛋白质三维表面搜索的索引方法
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390078
Sungchul Kim, Lee Sael, Hwanjo Yu
{"title":"Indexing methods for efficient protein 3D surface search","authors":"Sungchul Kim, Lee Sael, Hwanjo Yu","doi":"10.1145/2390068.2390078","DOIUrl":"https://doi.org/10.1145/2390068.2390078","url":null,"abstract":"This paper exploits efficient indexing techniques for protein structure search where protein structures are represented as vectors by 3D-Zernike Descriptor (3DZD). 3DZD compactly represents a surface shape of protein tertiary structure as a vector, and the simplified representation accelerates the structural search. However, further speed up is needed to address the scenarios where multiple users access the database simultaneously. We address this need for further speed up in protein structural search by exploiting two indexing techniques, i.e., iDistance and iKernel, on the 3DZDs. The results show that both iDistance and iKernel significantly enhance the searching speed. In addition, we introduce an extended approach for protein structure search based on indexing techniques that use the 3DZD characteristic. In the extended approach, index structure is constructured using only the first few of the numbers in the 3DZDs. To find the top-k similar structures, first top-10 x k similar structures are selected using the reduced index structure, then top-k structures are selected using similarity measure of full 3DZDs of the selected structures. Using the indexing techniques, the searching time reduced 69.6% using iDistance, 77% using iKernel, 77.4% using extended iDistance, and 87.9% using extended iKernel method.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126628812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Inferring appropriate eligibility criteria in clinical trial protocols without labeled data 在没有标记数据的临床试验方案中推断适当的资格标准
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390074
Angelo C. Restificar, S. Ananiadou
{"title":"Inferring appropriate eligibility criteria in clinical trial protocols without labeled data","authors":"Angelo C. Restificar, S. Ananiadou","doi":"10.1145/2390068.2390074","DOIUrl":"https://doi.org/10.1145/2390068.2390074","url":null,"abstract":"We consider the user task of designing clinical trial protocols and propose a method that outputs the most appropriate eligibility criteria from a potentially huge set of candidates. Each document d in our collection D is a clinical trial protocol which itself contains a set of eligibility criteria. Given a small set of sample documents D', |D'|<<|D|, a user has initially identified as relevant e.g., via a user query interface, our scoring method automatically suggests eligibility criteria from D by ranking them according to how appropriate they are to the clinical trial protocol currently being designed. We view a document as a mixture of latent topics and our method exploits this by applying a three-step procedure. First, we infer the latent topics in the sample documents using Latent Dirichlet Allocation (LDA) [3]. Next, we use logistic regression models to compute the probability that a given candidate criterion belongs to a particular topic. Lastly, we score each criterion by computing its expected value, the probability-weighted sum of the topic proportions inferred from the set of sample documents. Intuitively, the greater the probability that a candidate criterion belongs to the topics that are dominant in the samples, the higher its expected value or score. Results from our experiments indicate that our proposed method is 8 and 9 times better (resp., for inclusion and exclusion criteria) than randomly choosing from a set of candidates obtained from relevant documents. In user simulation experiments, we were able to automatically construct eligibility criteria that are on the average 75% and 70% (resp., for inclusion and exclusion criteria) similar to the correct eligibility criteria.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133352105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Lexicon-free and context-free drug names identification methods using hidden markov models and pointwise mutual information 使用隐马尔可夫模型和点互信息的无词典和无上下文的药品名称识别方法
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390072
Jacek Małyszko, A. Filipowska
{"title":"Lexicon-free and context-free drug names identification methods using hidden markov models and pointwise mutual information","authors":"Jacek Małyszko, A. Filipowska","doi":"10.1145/2390068.2390072","DOIUrl":"https://doi.org/10.1145/2390068.2390072","url":null,"abstract":"The paper concerns the issue of extraction of medicine names from free text documents written in Polish. Using lexicon-based approaches, it is impossible to identify unknown or misspelled medicine names. In this paper, we present the results of experimentation on two methods: Hidden Markov Model (HMM) and Pointwise Mutual Information (PMI)-based approach. The experiment was to identify the medicine names without the use of lexicon or contextual information. The experimentation results show, that HMM may be used as one of several steps in drug names' identification (with F-score slightly below 70% for the test set), while the PMI can help in increasing the precision of results achieved using HMM, but with significant loss in recall.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129529870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Session details: Mining biological data and text 会议细节:挖掘生物数据和文本
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/3260182
Min Song
{"title":"Session details: Mining biological data and text","authors":"Min Song","doi":"10.1145/3260182","DOIUrl":"https://doi.org/10.1145/3260182","url":null,"abstract":"","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123904599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting structured information from free-text medication prescriptions using dependencies 使用依赖关系从自由文本药物处方中提取结构化信息
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390076
Andrew D. MacKinlay, Karin M. Verspoor
{"title":"Extracting structured information from free-text medication prescriptions using dependencies","authors":"Andrew D. MacKinlay, Karin M. Verspoor","doi":"10.1145/2390068.2390076","DOIUrl":"https://doi.org/10.1145/2390068.2390076","url":null,"abstract":"We explore an information extraction task where the goal is to determine the correct values for fields which are relevant to prescription drug administration such as dosage amount, frequency and route. The data set is a collection of prescriptions from a long-term health-care facility, a small subset of which we have manually annotated with values for these fields. We first examine a rule-based approach to the task, which uses a dependency parse of the prescription, achieving accuracies of 60-95% over various different fields, and 67.5% when all fields of the prescription are considered together. The outputs of such a system have potential applications in detecting irregularities in dosage delivery.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128652568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Rule-based whole body modeling for analyzing multi-compound effects 基于规则的多复合效果分析全身建模
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390083
W. Hwang, Y. Hwang, Sunjae Lee, Doheon Lee
{"title":"Rule-based whole body modeling for analyzing multi-compound effects","authors":"W. Hwang, Y. Hwang, Sunjae Lee, Doheon Lee","doi":"10.1145/2390068.2390083","DOIUrl":"https://doi.org/10.1145/2390068.2390083","url":null,"abstract":"Essential reasons including robustness, redundancy, and crosstalk of biological systems, have been reported to explain the limited efficacy and unexpected side-effects of drugs. Many pharmaceutical laboratories have begun to develop multi-compound drugs to remedy this situation, and some of them have shown successful clinical results. Simultaneous application of multiple compounds could increase efficacy as well as reduce side-effects through pharmacodynamics and pharmacokinetic interactions. However, such approach requires overwhelming cost of preclinical experiments and tests as the number of possible combinations of compound dosages increases exponentially. Computer model-based experiments have been emerging as one of the most promising solutions to cope with such complexity. Though there have been many efforts to model specific molecular pathways using qualitative and quantitative formalisms, they suffer from unexpected results caused by distant interactions beyond their localized models.\u0000 Here we propose a rule-based whole-body modeling platform. We have tested this platform with Type 2 diabetes (T2D) model, which involves the malfunction of numerous organs such as pancreas, circulation system, liver, and muscle. We have extracted T2D-related 117 rules by manual curation from literature and different types of existing models. The results of our simulation show drug effect pathways of T2D drugs and how combination of drugs could work on the whole-body scale. We expect that it would provide the insight for identifying effective combination of drugs and its mechanism for the drug development.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121701244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Protein complex prediction via bottleneck-based graph partitioning 基于瓶颈图划分的蛋白质复合体预测
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390079
Jaegyoon Ahn, D. Lee, Youngmi Yoon, Yunku Yeu, Sanghyun Park
{"title":"Protein complex prediction via bottleneck-based graph partitioning","authors":"Jaegyoon Ahn, D. Lee, Youngmi Yoon, Yunku Yeu, Sanghyun Park","doi":"10.1145/2390068.2390079","DOIUrl":"https://doi.org/10.1145/2390068.2390079","url":null,"abstract":"Detecting protein complexes is one of essential and fundamental tasks in understanding various biological functions or processes. Therefore, precise identification of protein complexes is indispensible. For more precise detection of protein complexes, we propose a novel data structure which employs bottleneck proteins as partitioning points for detecting the protein complexes. The partitioning process allows overlapping between resulting protein complexes. We applied our algorithm to several PPI (Protein-Protein Interaction) networks of Saccharomyces cerevisiae and Homo sapiens, and validated our results using public databases of protein complexes. Our algorithm resulted in overlapping protein complexes with significantly improved F1 score, which comes from higher precision.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122176469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Clinical entity recognition using structural support vector machines with rich features 特征丰富的结构支持向量机临床实体识别
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390073
Buzhou Tang, Yonghui Wu, Min Jiang, Hua Xu
{"title":"Clinical entity recognition using structural support vector machines with rich features","authors":"Buzhou Tang, Yonghui Wu, Min Jiang, Hua Xu","doi":"10.1145/2390068.2390073","DOIUrl":"https://doi.org/10.1145/2390068.2390073","url":null,"abstract":"Named entity recognition (NER) is an important task for natural language processing (NLP) of clinical text. Conditional Random Fields (CRFs), a sequential labeling algorithm, and Support Vector Machines (SVMs), which is based on large margin theory, are two typical machine learning algorithms that have been widely applied to NER tasks, including clinical entity recognition. However, Structural Support Vector Machines (SSVMs), an algorithm that combines the advantages of both CRFs and SVMs, has not been investigated for clinical text processing. In this study, we applied the SSVMs algorithm to the Concept Extraction task of the 2010 i2b2 clinical NLP challenge, which was to recognize entities of medical problems, treatments, and tests from hospital discharge summaries. Using the same training (N = 27,837) and test (N = 45,009) sets in the challenge, our evaluation showed that the SSVMs-based NER system required less training time, while achieved better performance than the CRFs-based system for clinical entity recognition, when same features were used. Our study also demonstrated that rich features such as unsupervised word representations improved the performance of clinical entity recognition. When rich features were integrated with SSVMs, our system achieved a highest F-measure of 85.74% on the test set of 2010 i2b2 NLP challenge, which outperformed the best system reported in the challenge by 0.5%.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129035767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信