M. Abouelhoda, R. Giegerich, B. Behzadi, J. Steyaert
{"title":"Alignment of Minisatellite Maps: A Minimum Spanning Tree-based Approach","authors":"M. Abouelhoda, R. Giegerich, B. Behzadi, J. Steyaert","doi":"10.1142/9781848161092_0028","DOIUrl":"https://doi.org/10.1142/9781848161092_0028","url":null,"abstract":"In addition to the well-known edit operations, the alignment of minisatellite maps includes duplication events. We model these duplications using a special kind of spanning trees and deduce an optimal duplication scenario by computing the respective minimum spanning tree. Based on best duplication scenarios for all substrings of the given sequences, we compute an optimal alignment of two minisatellite maps. Our algorithm improves upon the previously developed algorithms in the generality of the model, in alignment quality and in space-time efficiency. Using this algorithm, we derive evidence that there is a directional bias in the growth of minisatellites of the MSY1 dataset.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"88 1","pages":"261-272"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83841317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jin-Dong Kim, Tomoko Ohta, Kanae Oda, Junichi Tsujii
{"title":"From Text to Pathway: Corpus Annotation for Knowledge Acquisition from Biomedical Literature","authors":"Jin-Dong Kim, Tomoko Ohta, Kanae Oda, Junichi Tsujii","doi":"10.1142/9781848161092_0019","DOIUrl":"https://doi.org/10.1142/9781848161092_0019","url":null,"abstract":"We present a new direction of research, which deploys Text Mining technologies to construct and maintain data bases organized in the form of pathway, by associating parts of papers with relevant portions of a pathway and vice versa. In order to materialize this scenario, we present two annotated corpora. The first, Event Annotation, identifies the spans of text in which biological events are reported, while the other, Pathway Annotation, associates portions of papers with specific parts in a pathway.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"35 1","pages":"165-176"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80663458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of Population Allele Frequencies from Small Samples Containing Multiple Generations","authors":"D. Konovalov, D. Heg","doi":"10.1142/9781848161092_0033","DOIUrl":"https://doi.org/10.1142/9781848161092_0033","url":null,"abstract":"Estimations of population genetic parameters like allele frequencies, heterozygosities, inbreeding coefficients and genetic distances rely on the assumption that all sampled genotypes come from a randomly interbreeding population or sub-population. Here we show that small cross-generational samples may severely affect estimates of allele frequencies, when a small number of progenies dominate the next generation or the sample. A new estimator of allele frequencies is developed for such cases when the kin structure of the focal sample is unknown and has to be assessed simultaneously. Using Monte Carlo simulations it was demonstrated that the new estimator delivered significant improvement over the conventional allele counting estimator.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"87 1","pages":"321-332"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84068953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of Protein Sequences Based on Word Segmentation Methods","authors":"Yang Yang, Bao-Liang Lu, Wen-Yun Yang","doi":"10.1142/9781848161092_0020","DOIUrl":"https://doi.org/10.1142/9781848161092_0020","url":null,"abstract":"Protein sequences contain great potential revealing protein function, structure families and evolution information. Classifying protein sequences into different functional groups or families based on their sequence patterns has attracted lots of research efforts in the last decade. A key issue of these classification systems is how to interpret and represent protein sequences, which largely determines the performance of classifiers. Inspired by text classification and Chinese word segmentation techniques, we propose a segmentation-based feature extraction method. The extracted features include selected words, i.e., substrings of the sequences, and also motifs specified in public database. They are segmented out and their occurrence frequencies are recorded as the feature vector values. We conducted experiments on two protein data sets. One is a set of SCOP families, and the other is GPCR family. Experiments in classification of SCOP protein families show that the proposed method not only results in an extremely condensed feature set but also achieves higher accuracy than the methods based on whole k-spectrum feature space. And it also performs comparably to the most powerful classifiers for GPCR level I and level II subfamily recognition with 92.6 and 88.8% accuracy, respectively.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"26 1","pages":"177-186"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72818721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Similarity Definition over Gene Ontology by Further Mining of the Information Content","authors":"Yuan-Peng Li, Bao-Liang Lu","doi":"10.1142/9781848161092_0018","DOIUrl":"https://doi.org/10.1142/9781848161092_0018","url":null,"abstract":"The similarity of two gene products can be used to solve many problems in information biology. Since one gene product corresponds to several GO (Gene Ontology) terms, one way to calculate the gene product similarity is to use the similarity of their GO terms. This GO term similarity can be defined as the semantic similarity on the GO graph. There are many kinds of similarity definitions of two GO terms, but the information of the GO graph is not used efficiently. This paper presents a new way to mine more information of the GO graph by regarding edge as information content and using the information of negation on the semantic graph. A simple experiment is conducted and, as a result, the accuracy increased by 8.3 percent in average, compared with the traditional method which uses node as information source.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"30 1","pages":"155-164"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87979725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Strategy of Geometrical Biclustering for Microarray Data Analysis","authors":"Hongya Zhao, Alan Wee-Chung Liew, Hong Yan","doi":"10.1142/9781860947995_0008","DOIUrl":"https://doi.org/10.1142/9781860947995_0008","url":null,"abstract":"In this paper, we present a new biclustering algorithm to provide the geometrical interpretation of similar microarray gene expression profiles. Different from standard clustering analyses, biclustering methodology can perform simultaneous classification on the row and column dimensions of a data matrix. The main object of the strategy is to reveal the submatrix, in which a subset of genes exhibits a consistent pattern over a subset of conditions. However, the search for such subsets is a computationally complex task. We propose a new algorithm, based on the Hough transform in the column-pair space to perform pattern identification. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our simulation studies show that the method is robust to noise and computationally efficient. Furthermore, we have applied it to a large database of gene expression profiles of multiple human organs and the resulting biclusters show clear biological meanings.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"39 1","pages":"47-56"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74905469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Complexities and Algorithms for Glycan Structure Sequencing using Tandem Mass Spectrometry","authors":"B. Shan, B. Ma, Kaizhong Zhang, G. Lajoie","doi":"10.1142/9781860947995_0032","DOIUrl":"https://doi.org/10.1142/9781860947995_0032","url":null,"abstract":"Determining glycan structures is vital to comprehend cell-matrix, cell-cell, and even intracellular biological events. Glycan structure sequencing, which is to determine the primary structure of a glycan using MS/MS spectrometry, remains one of the most important tasks in proteomics. Analogous to the peptide de novo sequencing, the glycan de novo sequencing is to determine the structure without the aid of a known glycan database. We show in this paper that glycan de novo sequencing is NP-hard. We then provide a heuristic algorithm and develop a software program to solve the problem in practical cases. Experiments on real MS/MS data of glycopeptides demonstrate that our heuristic algorithm gives satisfactory results on practical data.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"50 1","pages":"297-306"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75004251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exact and Heuristic Approaches for Identifying Disease-Associated SNP Motifs","authors":"Gaofeng Huang, P. Jeavons, D. Kwiatkowski","doi":"10.1142/9781860947995_0020","DOIUrl":"https://doi.org/10.1142/9781860947995_0020","url":null,"abstract":"A Single Nucleotide Polymorphism (SNP) is a small DNA variation which occurs naturally between dierent individuals of the same species. Some combinations of SNPs in the human genome are known to increase the risk of certain complex genetic diseases. This paper formulates the problem of identifying such disease-associated SNP motifs as a combinatorial optimization problem and shows it to be NP-hard. Both exact and heuristic approaches for this problem are developed and tested on simulated data and real clinical data. Computational results are given to demonstrate that these approaches are suciently eective to support ongoing biological research.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"30 1","pages":"175-184"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77213011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Effective Promoter Detection Method using the Adaboost Algorithm","authors":"Xudong Xie, Shuanhu Wu, K. Lam, Hong Yan","doi":"10.1142/9781860947995_0007","DOIUrl":"https://doi.org/10.1142/9781860947995_0007","url":null,"abstract":"In this paper, an effective promoter detection algorithm, which is called PromoterExplorer, is proposed. In our approach, various features, i.e. local distribution of pentamers, positional CpG island features and digitized DNA sequence, are combined to build a high-dimensional input vector. A cascade AdaBoost based learning procedure is adopted to select the most “informative” or “discriminating” features to build a sequence of weak classifiers. A number of weak classifiers construct a strong classifier, which can achieve a better performance. In order to reduce the false positive, a cascade structure is used for detection. PromoterExplorer is tested based on large-scale DNA sequences from different databases, including EPD, Genbank and human chromosome 22. The proposed method consistently outperforms PromoterInspector and Dragon Promoter Finder.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"64 1","pages":"37-46"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83453504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flow Model of the Protein-protein Interaction Network for Finding Credible Interactions","authors":"Kinya Okada, K. Asai, Masanori Arita","doi":"10.1142/9781860947995_0034","DOIUrl":"https://doi.org/10.1142/9781860947995_0034","url":null,"abstract":"Large-scale protein-protein interactions (PPIs) detected by yeast-two-hybrid (Y2H) systems are known to contain many false positives. The separation of credible interactions from background noise is still an unavoidable task. In the present study, we propose the relative reliability score for PPI as an intrinsic characteristic of global topology in the PPI networks. Our score is calculated as the dominant eigenvector of an adjacency matrix and represents the steady state of the network flow. By using this reliability score as a cut-off threshold from noisy Y2H PPI data, the credible interactions were extracted with better or comparable performance of previously proposed methods which were also based on the network topology. The result suggests that the application of the network-flow model to PPI data is useful for extracting credible interactions from noisy experimental data.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"27 1","pages":"317-326"},"PeriodicalIF":0.0,"publicationDate":"2007-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74766164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}