IPSJ Transactions on Bioinformatics最新文献

筛选
英文 中文
Sparse Learner Boosting for Gene Expression Data 基因表达数据的稀疏学习器增强
IPSJ Transactions on Bioinformatics Pub Date : 2010-01-01 DOI: 10.2197/IPSJTBIO.3.54
M. Pritchard
{"title":"Sparse Learner Boosting for Gene Expression Data","authors":"M. Pritchard","doi":"10.2197/IPSJTBIO.3.54","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.3.54","url":null,"abstract":"Gene expression analysis is commonly used to analyze millions of gene expression data points. Challenging in this process has been the development of appropriate statistical methods for high-dimensional data. We propose Sparse Learner Boosting for gene expression data analysis. Boosting is performed to minimize the loss function, although this process can cause overfitting when a large number of variables are present. Ordinary boosting utilizes all of the potential weak learners in a given data set and constructs a decision rule. The fundamental idea of Sparse Learner Boosting is to reduce the complexity of the decision rule by using fewer weak learners than is usually required. This reduction prevents overfitting and improves performance during classification. Numerical studies support this modification for high-dimensional data, such as that obtained from gene expression analysis. We show that the proposed modification improves the performance of ordinary boosting methods.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"3 1","pages":"54-61"},"PeriodicalIF":0.0,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.3.54","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Support vector machine prediction of N-and O-glycosylation sites using whole sequence information and subcellular localization 基于全序列信息和亚细胞定位的支持向量机预测n和o糖基化位点
IPSJ Transactions on Bioinformatics Pub Date : 2009-12-01 DOI: 10.2197/IPSJTBIO.2.25
Kenta Sasaki, Nobuyoshi Nagamine, Y. Sakakibara
{"title":"Support vector machine prediction of N-and O-glycosylation sites using whole sequence information and subcellular localization","authors":"Kenta Sasaki, Nobuyoshi Nagamine, Y. Sakakibara","doi":"10.2197/IPSJTBIO.2.25","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.2.25","url":null,"abstract":"Background: Glycans, or sugar chains, are one of the three types of chain (DNA, protein and glycan) that constitute living organisms; they are often called “the third chain of the living organism”. About half of all proteins are estimated to be glycosylated based on the SWISS-PROT database. Glycosylation is one of the most important post-translational modifications, affecting many critical functions of proteins, including cellular communication, and their tertiary structure. In order to computationally predict N-glycosylation and O-glycosylation sites, we developed three kinds of support vector machine (SVM) model, which utilize local information, general protein information and/or subcellular localization in consideration of the binding specificity of glycosyltransferases and the characteristic subcellular localization of glycoproteins. Results: In our computational experiment, the model integrating three kinds of information achieved about 90% accuracy in predictions of both N-glycosylation and O-glycosylation sites. Moreover, our model was applied to a protein whose glycosylation sites had not been previously identified and we succeeded in showing that the glycosylation sites predicted by our model were structurally reasonable. Conclusions: In the present study, we developed a comprehensive and effective computational method that detects glycosylation sites. We conclude that our method is a comprehensive and effective computational prediction method that is applicable at a genome-wide level.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"2 1","pages":"25-35"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.2.25","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A Modified Algorithm for Sequence Alignment Using Ant Colony System 一种改进的蚁群序列比对算法
IPSJ Transactions on Bioinformatics Pub Date : 2009-12-01 DOI: 10.2197/IPSJTBIO.2.63
A. Mikami, Jianming Shi
{"title":"A Modified Algorithm for Sequence Alignment Using Ant Colony System","authors":"A. Mikami, Jianming Shi","doi":"10.2197/IPSJTBIO.2.63","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.2.63","url":null,"abstract":"In this study, we use the Ant Colony System (ACS) to develop a heuristic algorithm for sequence alignment. This algorithm is certainly an improvement on ACS-MultiAlignment, which was proposed in 2005 for predicting major histocompatibility complex (MHC) class II binders. The numerical experiments indicate that this algorithm is as much as 2, 900 times faster than the original ACS-MultiAlignment algorithm. We also compare this algorithm to the other approaches such as Gibbs sampling algorithm using numerical experiments. The results show that our algorithm finds the best value prompter than Gibbs approach.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"2 1","pages":"63-73"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.2.63","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68501961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Reaction Similarities Focusing Substructure Changes of Chemical Compounds and Metabolic Pathway Alignments 化合物亚结构变化与代谢途径比对的反应相似性
IPSJ Transactions on Bioinformatics Pub Date : 2009-03-24 DOI: 10.2197/IPSJTBIO.2.15
Y. Tohsato, Yuki Nishimura
{"title":"Reaction Similarities Focusing Substructure Changes of Chemical Compounds and Metabolic Pathway Alignments","authors":"Y. Tohsato, Yuki Nishimura","doi":"10.2197/IPSJTBIO.2.15","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.2.15","url":null,"abstract":"Comparative analyses of enzymatic reactions provide important information on both evolution and potential pharmacological targets. Previously, we focused on the structural formulae of compounds, and proposed a method to calculate enzymatic similarities based on these formulae. However, with the proposed method it is difficult to measure the reaction similarity when the formulae of the compounds constituting each reaction are completely different. The present study was performed to extract substructures that change within chemical compounds using the RPAIR data in KEGG. Two approaches were applied to measure the similarity between the extracted substructures: a fingerprint-based approach using the MACCS key and the Tanimoto/Jaccard coefficients; and the Topological Fragment Spectra-based approach that does not require any predefined list of substructures. Whether the similarity measures can detect similarity between enzymatic reactions was evaluated. Using one of the similarity measures, metabolic pathways in Escherichia coli were aligned to confirm the effectiveness of the method.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"7 1","pages":"15-24"},"PeriodicalIF":0.0,"publicationDate":"2009-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.2.15","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Selection of Effective Sentences from a Corpus to Improve the Accuracy of Identification of Protein Names 从语料库中选择有效句子以提高蛋白质名称识别的准确性
IPSJ Transactions on Bioinformatics Pub Date : 2009-01-01 DOI: 10.2197/IPSJTBIO.2.93
Kazunori Miyanishi, Tomonobu Ozaki, T. Ohkawa
{"title":"Selection of Effective Sentences from a Corpus to Improve the Accuracy of Identification of Protein Names","authors":"Kazunori Miyanishi, Tomonobu Ozaki, T. Ohkawa","doi":"10.2197/IPSJTBIO.2.93","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.2.93","url":null,"abstract":"As the number of documents about protein structural analysis increases, a method of automatically identifying protein names in them is required. However, the accuracy of identification is not high if the training data set is not large enough. We consider a method to extend a training data set based on machine learning using an available corpus. Such a corpus usually consists of documents about a certain kind of organism species, and documents about different kinds of organism species tend to have different vocabularies. Therefore, depending on the target document or corpus, it is not effective for the accurate identification to simply use a corpus as a training data set. In order to improve the accuracy, we propose a method to select sentences that have a positive effect on identification and to extend the training data set with the selected sentences. In the proposed method, a portion of a set of tagged sentences is used as a validation set. The process to select sentences is iterated using the result of the identification of protein names in a validation set as feedback. In the experiment, compared with the baseline, a method without a corpus, with a whole corpus, or with a part of a corpus chosen at random, the accuracy of the proposed method was higher than any baseline method. Thus, it was confirmed that the proposed method selected effective sentences.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"2 1","pages":"93-100"},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.2.93","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68502015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Nonmetric Distances for Barcode of Life 生命条码的非度量距离
IPSJ Transactions on Bioinformatics Pub Date : 2008-01-01 DOI: 10.2197/IPSJTBIO.1.35
H. Akiba, Y-h. Taguchi
{"title":"Nonmetric Distances for Barcode of Life","authors":"H. Akiba, Y-h. Taguchi","doi":"10.2197/IPSJTBIO.1.35","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.1.35","url":null,"abstract":"Barcode of Life (BOL) project[4] is the project to enable us to recognize species easier. Although it is often troublesome to define what the species are, BOL can define species by simple DNA sequences. When it works, we do not have to consult with any other information than DNA sequences to decide if two individuals belong to the same species or not. If they share same BOL with each other, they belong to the same species undoubtedly. In contrast to this, it is usually difficult to define what the higher clade are. We cannot expect that each individual which belong to the same upper Claude share the same BOL. Instead, we have to find how BOL of individuals which belong to distinct higher clade differ from each other. In this poster, we demonstrate how nonmetric measure of distances between BOL make easier to recognize if each belongs to common higher clade or not. We also show that usual hierarchical clustering like NJ method is not suitable to visualize relationships expressed by nonmetric measure and propose to usage of nonmetric multidimensional scaling (nMDS)[1, 2].","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"1 1","pages":"35-41"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.1.35","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68500635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Linear Time Algorithm that Infers Hidden Strings from Their Concatenations 从字符串的连接中推断隐藏字符串的线性时间算法
IPSJ Transactions on Bioinformatics Pub Date : 2008-01-01 DOI: 10.2197/IPSJTBIO.1.13
Tomohiro Yasuda
{"title":"A Linear Time Algorithm that Infers Hidden Strings from Their Concatenations","authors":"Tomohiro Yasuda","doi":"10.2197/IPSJTBIO.1.13","DOIUrl":"https://doi.org/10.2197/IPSJTBIO.1.13","url":null,"abstract":"Let T be a set of hidden strings and S be a set of their concatenations. We address the problem of inferring T from S. Any formalization of the problem as an optimization problem would be computationally hard, because it is NP-complete even to determine whether there exists T smaller than S, and because it is also NP-complete to partition only two strings into the smallest common collection of substrings. In this paper, we devise a new algorithm that infers T by finding common substrings in S and splitting them. This algorithm is scalable and can be completed in O(L)-time regardless of the cardinality of S, where L is the sum of the lengths of all strings in S. In computational experiments, 40, 000 random concatenations of randomly generated strings were successfully decomposed, as well as the effectiveness of our method for this problem was compared with that of multiple sequence alignment programs. We also present the result of a preliminary experiment against the transcriptome of Homo sapiens and describe problems in applications where real large-scale cDNA sequences are analyzed.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"1 1","pages":"13-22"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.1.13","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68500499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信