Data and Text Mining in Bioinformatics最新文献

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model 基于文本挖掘驱动图模型的未发现公共知识推理

Data and Text Mining in Bioinformatics Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665984

G. Heo, Keeheon Lee, Min Song

{"title":"Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model","authors":"G. Heo, Keeheon Lee, Min Song","doi":"10.1145/2665970.2665984","DOIUrl":"https://doi.org/10.1145/2665970.2665984","url":null,"abstract":"Due to the recent development of Information Technology, the number of publications is increasing exponentially. In response to the increasing number of publications, there has been a sharp surge in the demand for replacing the existing manual text data processing by an automatic text data processing. Swanson proposed ABC model [1] on the top of text mining as a part of literature-based knowledge discovery for finding new possible biomedical hypotheses about three decades ago. The following clinical scholars proved the effectiveness of the possible hypotheses found by ABC model [2]. Such effectiveness let scholars try various literature-based knowledge discovery approaches [3, 4, 5]. However, their trials are not fully automated but hybrids of automatic and manual processes. The manual process requires the intervention of experts. In addition, their trials consider a single perspective. Even trials involving network theory have difficulties in mal-understanding the entire network structure of the relationships among concepts and the systematic interpretation on the structure [6, 7]. Thus, this study proposes a novel approach to discover various relationships by extending the intermediate concept B to a multi-leveled concept. By applying a graph-based path finding method based on co-occurrence and the relational entities among concepts, we attempt to systematically analyze and investigate the relationships between two concepts of a source node and a target node in the total paths. For the analysis of our study, we set our baseline as the result of Swanson [8]'s work. This work suggested the intermediate concept or terms between Raynaud's disease and fish oils as blood viscosity, platelet aggregability, and vasconstriction. We compared our results of intermediate concepts with these intermediate concepts of Swanson's. This study provides distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130530143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Detecting Phosphorylation Determined Active Protein Interaction Network during Cancer Development by Robust Network Component Analysis 通过鲁棒网络成分分析检测癌症发展过程中磷酸化决定的活性蛋白相互作用网络

Data and Text Mining in Bioinformatics Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665991

T. Zeng, Ziming Wang, Luonan Chen

{"title":"Detecting Phosphorylation Determined Active Protein Interaction Network during Cancer Development by Robust Network Component Analysis","authors":"T. Zeng, Ziming Wang, Luonan Chen","doi":"10.1145/2665970.2665991","DOIUrl":"https://doi.org/10.1145/2665970.2665991","url":null,"abstract":"Motivation: In recent disease study, many key pathogen genes/proteins are found to have not significant differential expressions, and thus, they tend to be disregarded in conventional differential expression analysis or network analysis. Meanwhile, the activity in dry-experiment rather than expression in wet-experiment have been proposed to effectively estimate the actual regulation power of such important biomolecules, e.g. transcriptional factors. But, it is still unknown what and how a hidden factor (e.g. phosphorylation) determines this kind of virtual regulation power as activity [1]. Especially, for the cancer development study, it is emergent to reconstruct the active protein interaction network and detect the underlying phosphorylation pattern in a dynamic manner [2-7]. Methods: Based on the c-Myc mouse model of liver cancer, we have first collected protein expression and protein phosphorylation data at several developmental time points. Then, we constructed a rough protein interaction network as background by conditional mutual information. Next, we improved the conventional network component analysis on its robustness, and used this advanced approach RNCA (Robust Network Component Analysis) to reconstruct the time-dependent protein interaction networks and estimate the activity of target protein at different times simultaneously. Finally, considering the different experiment-qualities of protein expression and phosphorylation data, we used canonical correlation analysis to detect the maximal correlation between the expression and phosphorylation of a group of proteins (e.g. protein network module), which could reveal the active protein sub-network and its determinate factor as phosphorylation. Results: In the preliminary study, we have evaluated the robustness of RNCA by comparing with other conventional methods. And on the real biological data, we have found the rewired protein interaction network during cancer development, its corresponding active proteins, and their drivers as protein phosphorylation. This work can be further used in early diagnosis of diseases by edge biomarkers [1-2], network biomarkers [3-4] and dynamical network biomarkers [5-7].","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127710963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Exploration of the Collaborative Networks for Clinical and Academic Domains in AIDS Research: A Spatial Scientometric Approach 艾滋病研究中临床和学术领域合作网络的探索:一种空间科学计量方法

Data and Text Mining in Bioinformatics Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665982

Y. Jeong, Dahee Lee, Min Song

{"title":"An Exploration of the Collaborative Networks for Clinical and Academic Domains in AIDS Research: A Spatial Scientometric Approach","authors":"Y. Jeong, Dahee Lee, Min Song","doi":"10.1145/2665970.2665982","DOIUrl":"https://doi.org/10.1145/2665970.2665982","url":null,"abstract":"This study investigates the world-wide collaborative networks from a geographical perspective based on clinical tests (CT) and academic researches (AR) on Acquired immune deficiency syndrome or acquired immunodeficiency syndrome (AIDS). By applying text mining technique on the AIDS related documents, we extract the spatial information and are able to discover co-location pairs for each type of research at two levels: national level and city level. Co-location networks for CT and AR are analyzed using network features, visualization, and highly-ranked betweenness centrality nodes. The analysis results reveal that the CT network is more densely compact with about twice as many nodes than the AR network. According to the analysis at the national level, the AR network is rather focused on the United States while the CT network is more spread out throughout the world. At the city level, the collaborative work is more active among closely located cities in the AR network compared to the case of the CT network (see Figure 1). The AR network has core collaboration centers mainly situated in the United States and Europe, but those of the CT network also includes Asian and African cities. Overall, our study intuitively points out the differences in the collaborative networks for CT and AR, which contributes to the understanding of the research trend involving the productivity analysis of the collaborative work associated with the regional aspect.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121401009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visualization of Zoomable Network for Multi-Compounds and Multi-Targets Analysis 多化合物和多目标分析可缩放网络的可视化

Data and Text Mining in Bioinformatics Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665988

Jaesub Park, Jaeho Kim, Junseok Park, Sunghwa Bae, Hyungseok Kim, Doheon Lee

引用次数: 0

Inference of Disease E3s from Integrated Functional Relation Network 基于综合功能关系网络的疾病e3推断

Data and Text Mining in Bioinformatics Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665979

Bumki Min, G. Yi

引用次数: 1

TILD: A Strategy to Identify Cancer-related Genes Using Title Information in Literature Data TILD:利用文献数据中的标题信息识别癌症相关基因的策略

Data and Text Mining in Bioinformatics Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665992

Jeongwoo Kim, H. Kim, Yunku Yeu, Mincheol Shin, Sanghyun Park

{"title":"TILD: A Strategy to Identify Cancer-related Genes Using Title Information in Literature Data","authors":"Jeongwoo Kim, H. Kim, Yunku Yeu, Mincheol Shin, Sanghyun Park","doi":"10.1145/2665970.2665992","DOIUrl":"https://doi.org/10.1145/2665970.2665992","url":null,"abstract":"After genome project in 1990s, researches which are involved with gene have been progressed. These studies unearthed that gene is cause of disease, and relations between gene and disease are important. In this reason, we proposed a strategy called TILD that identifies cancer-related genes using title information in literature data. To implement our method, we selected cancer-specific literature data from the online database. We then extracted genes using text mining. In the next step, we classified into two kinds for extracted genes using title information. If genes are located in title, then they are classified as hub genes. In the contrast, if genes are located in body, then they are classified as sub genes which are connected with hub genes. We iterated the processes for each paper to construct the cancer-specific local gene network. In the last step, we constructed global cancer-specific gene network by integrating all local gene network, and calculated a score for each gene based on analysis of the global gene network. We assumed that genes in title have meaningful relations with cancer, and other genes in the body are related with the title genes. For validation, we compared with other methods for the top 20 genes inferred by each approach. Our approach found more cancer-related genes than comparable methods.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116457778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of Coexpressed Gene Modules across Multiple Brain Diseases by a Biclustering Analysis on Integrated Gene Expression Data 通过整合基因表达数据的双聚类分析鉴定多种脑部疾病共表达基因模块

Data and Text Mining in Bioinformatics Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665978

Kihoon Cha, Kimin Oh, Taeho Hwang, G. Yi

{"title":"Identification of Coexpressed Gene Modules across Multiple Brain Diseases by a Biclustering Analysis on Integrated Gene Expression Data","authors":"Kihoon Cha, Kimin Oh, Taeho Hwang, G. Yi","doi":"10.1145/2665970.2665978","DOIUrl":"https://doi.org/10.1145/2665970.2665978","url":null,"abstract":"It has been reported that several brain diseases could share symptoms at clinical level, suggesting the necessity and possibility to develop therapeutics. In this paper, we carried out an integrated gene expression analysis on several microarray datasets of neurodegenerative diseases and psychiatric disorders to discover the uniqueness and commonness in their molecular basis. First, we selected and combined three sets of microarray data including eight brain diseases. Second, we applied a correlation-based biclustering approach, BICLIC [1], to efficiently identify coexpressed gene modules that are correlated in individual or multiple combinations of brain diseases. Third, Gene ontology-based functional enrichment analysis is performed to analyze functional characteristics of the identified cross-disease or and disease-specific modules. In this approach, we could examine various sets of correlated genes significantly in both single and multiple diseases. As a result, in total, 4,307 coexpressed gene modules were turned out to be common to two or more of brain diseases. Among them, eight modules having different combinations of total 16 genes were involved correlatively in more than seven brain diseases. The functional analysis showed that the multi-disease specific modules were more associated to higher brain functions like cognitive functions than single disease specific modules. The results in this study provide valuable resources to further investigate the key molecular players affecting on brain diseases in both transnosological or disease specific manner.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134278638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Discriminatory Analysis of Alzheimer's Disease through pathway Activity inference in the Resting-State brain 静息状态脑通路活动推断对阿尔茨海默病的鉴别分析

Data and Text Mining in Bioinformatics Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665971

Jongan Lee, Younghoon Kim, Y. Jeong, D. Na, Jong-Won Kim, Kwang-H. Lee, Doheon Lee

引用次数: 1

Identification of a Specific Base Sequence of Pathogenic E. Coli through a Genomic Analysis 致病性大肠杆菌特定碱基序列的基因组分析鉴定

Data and Text Mining in Bioinformatics Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665981

Soobok Joe, Hojung Nam

{"title":"Identification of a Specific Base Sequence of Pathogenic E. Coli through a Genomic Analysis","authors":"Soobok Joe, Hojung Nam","doi":"10.1145/2665970.2665981","DOIUrl":"https://doi.org/10.1145/2665970.2665981","url":null,"abstract":"E. coli sequence type 131 (ST131) is one of pathogens that causes resistant infections. Comparative genome analyses allow interpretations of the virulence factors of pathogens. Thus, in this study, we analysis the genomic differences between the pathogenic E. coli ST131 and the non-pathogenic E. coli K-12. In this study, we identify the genomic differences between 96 E. coli ST131 strains and the E. coli K-12 in gene elements and their non-coding regulation elements. Using next-generation whole-genome sequencing data, we investigated genetic variations of protein-coding regions and their regulation regions. After the alignment of the sequence reads, large numbers of single nucleotide variants (SNVs) were observed in the regulation and protein-coding sequences. In the regulation regions, we found strong conserved regions, in this case, ribosome binding sites. In the gene regions, we found conserved start and stop codons with the specific position varying commonly in each codon. Except for these well-conserved regions, other variations were randomly distributed in regulation regions. Even a region having well-known conserved sequences such as -10 and -35 in the promoter had a similar level of variation. In this study, we found genomic variations between the pathogenic E. coli ST 131 strain and the non-pathogenic E. coli K-12. In addition, the numbers of sequence variations were determined in both the protein-coding regions and the regulation regions. However, we found that the effects of variations on the protein-coding regions are less significant than those on the regulation regions.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121448894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Mining the Main Health Trend of the General Public based on Opinion Mining of Korean Blogsphere 基于韩国博客舆论挖掘的大众健康主流趋势挖掘

Data and Text Mining in Bioinformatics Pub Date : 2014-11-07 DOI: 10.1145/2665970.2665985

Yong-il Lee, Sang-Hyob Nam, Jaeseung Jeong

{"title":"Mining the Main Health Trend of the General Public based on Opinion Mining of Korean Blogsphere","authors":"Yong-il Lee, Sang-Hyob Nam, Jaeseung Jeong","doi":"10.1145/2665970.2665985","DOIUrl":"https://doi.org/10.1145/2665970.2665985","url":null,"abstract":"These days, social media usually becomes a reasonable standard for understanding the public's thought. Especially, people increasingly use internet media and SNS (twitter, facebook, blog, and etc.), to share opinions, news, advice, interests, moods, concerns, critics, facts, rumors, and everything. Therefore, public health research has been started a big change. Traditional public health study has depended on only regular clinical reports by health professionals. It is limited to practical use and general public has much difficulty to understand health information, even if it's his/her own information. Nowadays, over one billion people publish their ideas about many topics, including health conditions minute by minute. SNS provides researchers the freshest source of public health conditions on a global scale. Much of that data is public and available for mining. So this article pursues making an application of opinion mining for detecting the public's trend and finding valuable opinion among the massive information. The core of this research is analyzing the adjective of opinions. Our assumption is that many adjective expressions implicate deep and sincere meaning of its author. It is applicable for both low value postings filtering and tracking high value postings simultaneously. This approach is a simple and feasible criteria. The opinion mining process includes Korean morpheme analysis, opinion extraction, opinion tagging, positive / negative score evaluation. Our research's aim is to analyze Korean blog postings.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125449567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1