Data and Text Mining in Bioinformatics最新文献

筛选
英文 中文
Finding associations among SNPS for prostate cancer using collaborative filtering 利用协同过滤发现前列腺癌snp之间的关联
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390080
Rohit Kugaonkar, A. Gangopadhyay, Y. Yesha, A. Joshi, Y. Yesha, M. Grasso, Mary Brady, N. Rishe
{"title":"Finding associations among SNPS for prostate cancer using collaborative filtering","authors":"Rohit Kugaonkar, A. Gangopadhyay, Y. Yesha, A. Joshi, Y. Yesha, M. Grasso, Mary Brady, N. Rishe","doi":"10.1145/2390068.2390080","DOIUrl":"https://doi.org/10.1145/2390068.2390080","url":null,"abstract":"Prostate cancer is the second leading cause of cancer related deaths among men. Because of the slow growing nature of prostate cancer, sometimes surgical treatment is not required for less aggressive cancers. Recent debates over prostate-specific antigen (PSA) screening have drawn new attention to prostate cancer. Genome-based screening can potentially help in assessing the risk of developing prostate cancer. Due to the complicated nature of prostate cancer, studying the entire genome is essential to find genomic traits. Due to the high cost of studying all Single Nucleotide Polymorphisms (SNPs), it is essential to find tag SNPs which can represent other SNPs. Earlier methods to find tag SNPs using associations between SNPs either use SNP's location information or are based on data of very few SNP markers in each sample. Our study is based on 2300 samples with 550,000 SNPs each. We have not used SNP location information or any predefined standard cut-offs to find tag SNPs. Our approach is based on using collaborative filtering methods to find pairwise associations among SNPs and thus list top-N tag SNPs. We have found 25 tag SNPs which have highest similarities to other SNPs. In addition we found 16 more SNPs which have high correlation with the known high risk SNPs that are associated with prostate cancer. We used some of these newly found SNPs with 5 different classification algorithms and observed some improvement in prostate cancer prediction accuracy over using the original known high risk SNPs.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114446130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Detecting type 2 diabetes causal single nucleotide polymorphism combinations from a genome-wide association study dataset with optimal filtration 从全基因组关联研究数据集中检测2型糖尿病致病单核苷酸多态性组合
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390070
Chiyong Kang, Hyeji Yu, G. Yi
{"title":"Detecting type 2 diabetes causal single nucleotide polymorphism combinations from a genome-wide association study dataset with optimal filtration","authors":"Chiyong Kang, Hyeji Yu, G. Yi","doi":"10.1145/2390068.2390070","DOIUrl":"https://doi.org/10.1145/2390068.2390070","url":null,"abstract":"The identification of causal single nucleotide polymorphisms (SNPs) for complex diseases like type 2 diabetes (T2D) is a challenge because of the low statistical power of individual markers from a genome-wide association study (GWAS). SNP combinations are suggested to compensate for the low statistical power of individual markers, but SNP combinations from GWAS generate high computational complexity. Hence, we aim to detect T2D causal SNP combinations from a GWAS dataset with optimal filtration and to discover the biological meaning of the detected SNP combinations. Optimal filtration can enhance the statistical power of SNP combinations by comparing the error rates of SNP combinations from various Bonferroni thresholds and p-value range-based thresholds combined with linkage disequilibrium (LD) pruning. T2D causal SNP combinations are selected using random forests with variable selection from an optimal SNP dataset. The selected SNPs with SNP combinations are mapped with multi-dimensional levels of T2D-related information and gene set enrichment analysis (GSEA). A T2D causal SNP combination containing 101 SNPs from the Wellcome Trust Case Control Consortium (WTCCC) GWAS dataset are selected, with an error rate of 10.25%. Matching with known disease genes and gene sets revealed the relationships between T2D and SNP combinations. We propose a detection method for complex disease causal SNP combinations from an optimal SNP dataset by using random forests with variable selection. Mapping the biological meanings of detected SNP combinations can help uncover complex disease mechanisms.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133699051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
High precision rule based PPI extraction and per-pair basis performance evaluation 基于高精度规则的PPI提取和基于对的性能评价
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390082
Junkyu Lee, Seongsoon Kim, Sunwon Lee, Kyubum Lee, Jaewoo Kang
{"title":"High precision rule based PPI extraction and per-pair basis performance evaluation","authors":"Junkyu Lee, Seongsoon Kim, Sunwon Lee, Kyubum Lee, Jaewoo Kang","doi":"10.1145/2390068.2390082","DOIUrl":"https://doi.org/10.1145/2390068.2390082","url":null,"abstract":"Virtually all current PPI extraction studies focus on improving F-score, aiming to balance the performance on both precision and recall. However, in many realistic scenarios involving large corpora, one can benefit more from an extremely high precision PPI extraction tool than a high-recall counterpart. We also argue that the current \"per-instance\" basis performance evaluation method should be revisited. In order to address these problems, we introduce a new rule-based PPI extraction method equipped with a set of ultra-high precision extraction rules. We also propose a new \"per-pair\" basis performance metric, which is more pragmatic in practice. The proposed PPI extraction method achieves 95-96% per-pair and 94-97% per-instance precisions on the AIMed benchmark corpus.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128794604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
TNMCA: generation and application of network motif based inference models for drug repositioning TNMCA:基于网络基序的药物重新定位推理模型的生成与应用
Data and Text Mining in Bioinformatics Pub Date : 2012-10-29 DOI: 10.1145/2390068.2390081
Jaejoon Choi, Kwangmin Kim, Min-Keun Song, Doheon Lee
{"title":"TNMCA: generation and application of network motif based inference models for drug repositioning","authors":"Jaejoon Choi, Kwangmin Kim, Min-Keun Song, Doheon Lee","doi":"10.1145/2390068.2390081","DOIUrl":"https://doi.org/10.1145/2390068.2390081","url":null,"abstract":"Since the increase of the public biomedical data, Undiscovered Public Knowledge (UPK, proposed by Swanson) became an important research topic in the biological field. Drug repositioning is one of famous UPK tasks which infer alternative indications for approved drugs. Many researchers tried to find novel candidates of existing drugs, but these previous works are not fully automated which required manual modulations to desired tasks, and was not able to cover various biomedical entities. In addition, they had inference limitations that those works could infer only pre-defined cases using limited patterns. In this paper, we propose the Typed Network Motif Comparison Algorithm (TNMCA) to discover novel drug indications using topological patterns of data. Typed network motifs (TNM) are connected sub-graphs of data, which store types of data, instead of values of data. While previous researches depends on ABC model (or extension of it), TNMCA utilizes more generalized patterns as its inference models. Also, TNMCA can infer not only an existence of interaction, but also the type of the interaction. TNMCA is suited for multi-level biomedical interaction data as TNMs depend on the different types of entities and relations. We apply TNMCA to a public database, Comparative Toxicogenomics Database (CTD), to validate our method. The results show that TNMCA could infer meaningful indications with high performance (AUC=0.7469) compared to the ABC model (AUC=0.7050).","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134067663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Session details: Keynote address 会议详情:主题演讲
Data and Text Mining in Bioinformatics Pub Date : 2011-10-24 DOI: 10.1145/3260180
Doheon Lee
{"title":"Session details: Keynote address","authors":"Doheon Lee","doi":"10.1145/3260180","DOIUrl":"https://doi.org/10.1145/3260180","url":null,"abstract":"","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126386633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic concept ontology construction for pubmed queries 面向pubmed查询的动态概念本体构建
Data and Text Mining in Bioinformatics Pub Date : 2010-10-26 DOI: 10.1145/1871871.1871885
Jinoh Oh, Taehoon Kim, Sun Park, Wook-Shin Han, Hwanjo Yu
{"title":"Dynamic concept ontology construction for pubmed queries","authors":"Jinoh Oh, Taehoon Kim, Sun Park, Wook-Shin Han, Hwanjo Yu","doi":"10.1145/1871871.1871885","DOIUrl":"https://doi.org/10.1145/1871871.1871885","url":null,"abstract":"Exploring PubMed to find relevant information is challenging and time-consuming, as PubMed typically returns a large list of articles as a result of query. Existing works in improving the search quality on PubMed have focused on helping PubMed query formulation, clustering the results, or ranking by relevance. This paper proposes a novel system that dynamically constructs a concept ontology based on the search results, which visualizes related concepts to the query in the form of ontology. The concept ontology can make the PubMed search more effective by detecting related concepts and their relation hidden in the documents. The ontology can broaden the user's knowledge by recommending new concepts unexpected by the user, and also serves to narrow down the search results by recommending additional query terms. The ontology construction is processed in real-time as a result of query, integrated within our PubMed search engine called RefMED. Our system is accesible at \"http://dm.hwanjoyu.org/refmed\".","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134632303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DrugNerAR: linguistic rule-based anaphora resolver for drug-drug interaction extraction in pharmacological documents 基于语言规则的药物-药物相互作用提取的回指解析器
Data and Text Mining in Bioinformatics Pub Date : 2009-11-06 DOI: 10.1145/1651318.1651324
Isabel Segura-Bedmar, Mario Crespo, César de Pablo-Sánchez, Paloma Martínez
{"title":"DrugNerAR: linguistic rule-based anaphora resolver for drug-drug interaction extraction in pharmacological documents","authors":"Isabel Segura-Bedmar, Mario Crespo, César de Pablo-Sánchez, Paloma Martínez","doi":"10.1145/1651318.1651324","DOIUrl":"https://doi.org/10.1145/1651318.1651324","url":null,"abstract":"DrugNerAR, a drug anaphora resolution system is presented to address the problem of co-referring expressions in pharmacological literature. This development is part of a larger and innovative study about automatic drug-drug interaction extraction. Besides, a corpus has been developed in order to analyze the phenomena and evaluate the current approach. The system uses a set of linguistic rules inspired by Centering Theory over the analysis provided by a biomedical syntactic parser. Semantic information provided by Unified Medical Language System (UMLS) is also integrated in order to improve the recognition and the resolution of nominal drug anaphors. This linguistic rule-based approach shows very promising results for the challenge of accounting for anaphoric expressions in pharmacological texts.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125945295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Mining cancer genes with running-sum statistics 用运行和统计挖掘癌症基因
Data and Text Mining in Bioinformatics Pub Date : 2009-11-06 DOI: 10.1145/1651318.1651326
Inho Park, Kwang-H. Lee, Doheon Lee
{"title":"Mining cancer genes with running-sum statistics","authors":"Inho Park, Kwang-H. Lee, Doheon Lee","doi":"10.1145/1651318.1651326","DOIUrl":"https://doi.org/10.1145/1651318.1651326","url":null,"abstract":"In this paper, we propose a new method to detect candidate cancer genes for developing molecular biomarkers or therapeutic targets from cancer microarray datasets. To resolve problems resulted in the molecular heterogeneity of cancers on gene prioritizing, our proposed method is intended to identify genes that are over- or down- expressed not in the whole cancer samples but also in a subgroup of cancer samples. To this end, we propose the RS score for gene ranking calculated with a weighted running sum statistic on the ordered list of expression values of each gene. We apply the proposed method to publically available prostate cancer microarray datasets, showing that it can identify previously well known prostate cancer associated genes such as ERG, HPN, and AMACR at the top of the list of candidate genes. Embedding samples, represented as vectors of the expression values of the top 20 genes, into a two dimensional space using the commute time embedding shows the distinction between normal samples and cancer samples in the independent test datasets as well as in the training datasets. We further evaluate the proposed method by estimating classification performance on the independent test datasets, and it shows the better classification performance compared to the other cancer outlier profile approaches.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114860664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
LITSEEK: public health literature search by metadata enhancement with external knowledge bases LITSEEK:利用外部知识库进行元数据增强的公共卫生文献检索
Data and Text Mining in Bioinformatics Pub Date : 2009-11-06 DOI: 10.1145/1651318.1651337
P. Prabhu, S. Navathe, Stephen Tyler, V. Dasigi, N. Narkhede, Balaji Palanisamy
{"title":"LITSEEK: public health literature search by metadata enhancement with external knowledge bases","authors":"P. Prabhu, S. Navathe, Stephen Tyler, V. Dasigi, N. Narkhede, Balaji Palanisamy","doi":"10.1145/1651318.1651337","DOIUrl":"https://doi.org/10.1145/1651318.1651337","url":null,"abstract":"Biomedical literature is an important source of information in any researcher's investigation of genes, risk factors, diseases and drugs. Often the information searched by public health researchers is distributed across multiple disparate sources that may include publications from PubMed, genomic, proteomic and pathway databases, gene expression and clinical resources and biomedical ontologies. The unstructured nature of this information makes it difficult to find relevant parts from it manually and comprehensive knowledge is further difficult to synthesize automatically. In this paper we report on LITSEEK (LITerature Search by metadata Enhancement with External Knowledgebases), a system we have developed for the benefit of researchers at the Centers for Disease Control (CDC) to enable them to search the HuGE (Human Genome for Epidemiology) database of PubMed articles, from a pharmacogenomic perspective. Besides analyzing text using TFIDF ranking and indexing of the important terms, the proposed system incorporates an automatic consultation with PharmGKB - a human-curated knowledge base about drugs, related diseases and genes, as well as with the Gene Ontology, a human-curated, well accepted ontology. We highlight the main components of our approach and illustrate how the search is enhanced by incorporating additional concepts in terms of genes/drugs/diseases (called metadata for ease of reference) from PharmGKB. Various measurements are reported with respect to the addition of these metadata terms. Preliminary results in terms of precision based on expert user feedback from CDC are encouraging. Further evaluation of the search procedure by actual researchers is under way.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125181432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The challenge of high recall in biomedical systematic search 生物医学系统检索中高查全率的挑战
Data and Text Mining in Bioinformatics Pub Date : 2009-11-06 DOI: 10.1145/1651318.1651338
Sarvnaz Karimi, J. Zobel, Stefan Pohl, Falk Scholer
{"title":"The challenge of high recall in biomedical systematic search","authors":"Sarvnaz Karimi, J. Zobel, Stefan Pohl, Falk Scholer","doi":"10.1145/1651318.1651338","DOIUrl":"https://doi.org/10.1145/1651318.1651338","url":null,"abstract":"Clinical systematic reviews are based on expert, laborious search of well-annotated literature. Boolean search on bibliographic databases, such as MEDLINE, continues to be the preferred discovery method, but the size of these databases, now approaching 20 million records, makes it impossible to fully trust these searching methods. We are investigating the trade-offs between Boolean and ranked retrieval. Our findings show that although Boolean search has limitations, it is not obvious that ranking is superior, and illustrate that a single query cannot be used to resolve an information need. Our experiments show that a combination of less complicated Boolean queries and ranked retrieval outperforms either of them individually, leading to possible time savings over the current process.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115468500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信