International Journal of Data Mining and Bioinformatics最新文献

筛选
英文 中文
A novel random forests-based feature selection method for microarray expression data analysis 基于随机森林特征选择的微阵列表达数据分析方法
IF 0.3 4区 生物学
International Journal of Data Mining and Bioinformatics Pub Date : 2015-07-01 DOI: 10.1504/IJDMB.2015.070852
Dengju Yao, Jing Yang, Xiaojuan Zhan, Xiaorong Zhan, Zhiqiang Xie
{"title":"A novel random forests-based feature selection method for microarray expression data analysis","authors":"Dengju Yao, Jing Yang, Xiaojuan Zhan, Xiaorong Zhan, Zhiqiang Xie","doi":"10.1504/IJDMB.2015.070852","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.070852","url":null,"abstract":"High-dimensional data and a large number of redundancy features in bioinformatics research have created an urgent need for feature selection. In this paper, a novel random forests-based feature selection method is proposed that adopts the idea of stratifying feature space and combines generalised sequence backward searching and generalised sequence forward searching strategies. A random forest variable importance score is used to rank features, and different classifiers are used as a feature subset evaluating function. The proposed method is examined on five microarray expression datasets, including leukaemia, prostate, breast, nervous and DLBCL, and the average accuracies of the SVM classifier in these datasets are 100%, 95.24%, 85%, 91.67%, and 91.67%, respectively. The results show that the proposed method could not only improve the classification accuracy but also greatly reduce the computation time of the feature selection process.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.070852","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Assessing protein-protein interactions based on the semantic similarity of interacting proteins 基于相互作用蛋白的语义相似性评估蛋白-蛋白相互作用
IF 0.3 4区 生物学
International Journal of Data Mining and Bioinformatics Pub Date : 2015-07-01 DOI: 10.1504/IJDMB.2015.070842
Guangyu Cui, Byungmin Kim, Saud Alguwaizani, Kyungsook Han
{"title":"Assessing protein-protein interactions based on the semantic similarity of interacting proteins","authors":"Guangyu Cui, Byungmin Kim, Saud Alguwaizani, Kyungsook Han","doi":"10.1504/IJDMB.2015.070842","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.070842","url":null,"abstract":"The Gene Ontology (GO) has been used in estimating the semantic similarity of proteins since it has the largest and reliable vocabulary of gene products and characteristics. We developed a new method which can assess Protein-Protein Interactions (PPI) using the branching factor and information content of the common ancestor of interacting proteins in the GO hierarchy. We performed a comparative evaluation of the measure with other GO-based similarity measures and evaluation results showed that our method outperformed others in most GO domains.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.070842","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
TrieAMD: a scalable and efficient apriori motif discovery approach TrieAMD:一种可扩展的、高效的先验基序发现方法
IF 0.3 4区 生物学
International Journal of Data Mining and Bioinformatics Pub Date : 2015-07-01 DOI: 10.1504/IJDMB.2015.070833
Isra M. Al-Turaiki, G. Badr, H. Mathkour
{"title":"TrieAMD: a scalable and efficient apriori motif discovery approach","authors":"Isra M. Al-Turaiki, G. Badr, H. Mathkour","doi":"10.1504/IJDMB.2015.070833","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.070833","url":null,"abstract":"Motif discovery is the problem of finding recurring patterns in biological sequences. It is one of the hardest and long-standing problems in bioinformatics. Apriori is a well-known data-mining algorithm for the discovery of frequent patterns in large datasets. In this paper, we apply the Apriori algorithm and use the Trie data structure to discover motifs. We propose several modifications so that we can adapt the classic Apriori to our problem. Experiments are conducted on Tompa's benchmark to investigate the performance of our proposed algorithm, the Trie-based Apriori Motif Discovery (TrieAMD). Results show that our algorithm outperforms all of the tested tools on real datasets for the average sensitivity measure, which means that our approach is able to discover more motifs. In terms of specificity, the performance of our algorithm is comparable to the other tools. The results also confirm both linear time and linear space scalability of the algorithm.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.070833","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Mitigating bias in planning two-colour microarray experiments 双色微阵列实验计划中的减少偏差
IF 0.3 4区 生物学
International Journal of Data Mining and Bioinformatics Pub Date : 2015-07-01 DOI: 10.1504/IJDMB.2015.070838
Nilgun Ferhatosmanoglu, T. Allen, Ümit V. Çatalyürek
{"title":"Mitigating bias in planning two-colour microarray experiments","authors":"Nilgun Ferhatosmanoglu, T. Allen, Ümit V. Çatalyürek","doi":"10.1504/IJDMB.2015.070838","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.070838","url":null,"abstract":"Two-colour microarrays are used to study differential gene expression on a large scale. Experimental planning can help reduce the chances of wrong inferences about whether genes are differentially expressed. Previous research on this problem has focused on minimising estimation errors (according to variance-based criteria such as A-optimality) on the basis of optimistic assumptions about the system studied. In this paper, we propose a novel planning criterion to evaluate existing plans for microarray experiments. The proposed criterion is 'Generalised-A Optimality' that is based on realistic assumptions that include bias errors. Using Generalised-A Optimality, the reference-design approach is likely to yield greater estimation accuracy in specific situations in which loop designs had previously seemed superior. However, hybrid designs are likely to offer higher estimation accuracy than reference, loop and interwoven designs having the same number of samples and slides. These findings are supported by data from both simulated and real microarray experiments.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.070838","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An integrated strategy for functional analysis of microbial communities based on gene ontology and 16S rRNA gene 基于基因本体和16S rRNA基因的微生物群落功能分析集成策略
IF 0.3 4区 生物学
International Journal of Data Mining and Bioinformatics Pub Date : 2015-07-01 DOI: 10.1504/IJDMB.2015.070841
Suping Deng, De-shuang Huang
{"title":"An integrated strategy for functional analysis of microbial communities based on gene ontology and 16S rRNA gene","authors":"Suping Deng, De-shuang Huang","doi":"10.1504/IJDMB.2015.070841","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.070841","url":null,"abstract":"In order to analyse the similarity among microbial communities on functional state after assigning 16S rRNA sequences from all microbial communities to species. It's an important addition to the species-level relationship between two compared communities and can quantify their differences in function. We downloaded all functional annotation data of several microbiotas. It's developed to identify the functional distribution and the significantly enriched functional categories of microbial communities. We analysed the similarity between two microbial communities on functional state. In the experimental results, it shows that the semantic similarity can quantify the difference between two compared species on function level. It can analyse the function of microbial communities by gene ontology based on 16S rRNA gene. Exploration of the function relationship between two sets of species assemblages will be a key result of microbiome studies and may provide new insights into assembly of a wide range of ecosystems.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.070841","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Gene function prediction with knowledge from gene ontology 利用基因本体知识预测基因功能
IF 0.3 4区 生物学
International Journal of Data Mining and Bioinformatics Pub Date : 2015-07-01 DOI: 10.1504/IJDMB.2015.070840
Ying Shen, Lin Zhang
{"title":"Gene function prediction with knowledge from gene ontology","authors":"Ying Shen, Lin Zhang","doi":"10.1504/IJDMB.2015.070840","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.070840","url":null,"abstract":"Gene function prediction is an important problem in bioinformatics. Due to the inherent noise existing in the gene expression data, the attempt to improve the prediction accuracy resorting to new classification techniques is limited. With the emergence of Gene Ontology (GO), extra knowledge about the gene products can be extracted from GO and facilitates solving the gene function prediction problem. In this paper, we propose a new method which utilises GO information to improve the classifiers' performance in gene function prediction. Specifically, our method learns a distance metric under the supervision of the GO knowledge using the distance learning technique. Compared with the traditional distance metrics, the learned one produces a better performance and consequently classification accuracy can be improved. The effectiveness of our proposed method has been corroborated by the extensive experimental results.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.070840","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DNA sequence and structure properties analysis reveals similarities and differences to promoters of stress responsive genes in Arabidopsis thaliana DNA序列和结构特性分析揭示了拟南芥逆境响应基因启动子的异同
IF 0.3 4区 生物学
International Journal of Data Mining and Bioinformatics Pub Date : 2015-07-01 DOI: 10.1504/IJDMB.2015.070832
P. Zhu, Yanhong Zhou, Libin Zhang, Chuang Ma
{"title":"DNA sequence and structure properties analysis reveals similarities and differences to promoters of stress responsive genes in Arabidopsis thaliana","authors":"P. Zhu, Yanhong Zhou, Libin Zhang, Chuang Ma","doi":"10.1504/IJDMB.2015.070832","DOIUrl":"https://doi.org/10.1504/IJDMB.2015.070832","url":null,"abstract":"Understanding regulatory mechanisms of stress response in plants has important biological and agricultural significances. In this study, we firstly compiled a set of genes responsive to different stresses in Arabidopsis thaliana and then comparatively analysed their promoters at both the DNA sequence and three-dimensional structure levels. Amazingly, the comparison revealed that the profiles of several sequence and structure properties vary distinctly in different regions of promoters. Moreover, the content of nucleotide T and the profile of B-DNA twist are distinct in promoters from different stress groups, suggesting Arabidopsis genes might exploit different regulatory mechanisms in response to various stresses. Finally, we evaluated the performance of two representative promoter predictors including EP3 and PromPred. The evaluation results revealed their strengths and weakness for identifying stress-related promoters, providing valuable guidelines to accelerate the discovery of novel stress-related promoters and genes in plants.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJDMB.2015.070832","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"66730639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A mixture of physicochemical and evolutionary-based feature extraction approaches for protein fold recognition. 混合物理化学和进化为基础的特征提取方法的蛋白质折叠识别。
IF 0.3 4区 生物学
International Journal of Data Mining and Bioinformatics Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.066359
Abdollah Dehzangi, Alok Sharma, James Lyons, Kuldip K Paliwal, Abdul Sattar
{"title":"A mixture of physicochemical and evolutionary-based feature extraction approaches for protein fold recognition.","authors":"Abdollah Dehzangi,&nbsp;Alok Sharma,&nbsp;James Lyons,&nbsp;Kuldip K Paliwal,&nbsp;Abdul Sattar","doi":"10.1504/ijdmb.2015.066359","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.066359","url":null,"abstract":"<p><p>Recent advancement in the pattern recognition field stimulates enormous interest in Protein Fold Recognition (PFR). PFR is considered as a crucial step towards protein structure prediction and drug design. Despite all the recent achievements, the PFR still remains as an unsolved issue in biological science and its prediction accuracy still remains unsatisfactory. Furthermore, the impact of using a wide range of physicochemical-based attributes on the PFR has not been adequately explored. In this study, we propose a novel mixture of physicochemical and evolutionary-based feature extraction methods based on the concepts of segmented distribution and density. We also explore the impact of 55 different physicochemical-based attributes on the PFR. Our results show that by providing more local discriminatory information as well as obtaining benefit from both physicochemical and evolutionary-based features simultaneously, we can enhance the protein fold prediction accuracy up to 5% better than previously reported results found in the literature.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.066359","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33973465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Concepts of relative sample outlier (RSO) and weighted sample similarity (WSS) for improving performance of clustering genes: co-function and co-regulation. 提高聚类基因性能的相对样本离群值(RSO)和加权样本相似性(WSS)概念:协同功能和协同调控。
IF 0.3 4区 生物学
International Journal of Data Mining and Bioinformatics Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.067322
Anindya Bhattacharya, Nirmalya Chowdhury, Rajat K De
{"title":"Concepts of relative sample outlier (RSO) and weighted sample similarity (WSS) for improving performance of clustering genes: co-function and co-regulation.","authors":"Anindya Bhattacharya,&nbsp;Nirmalya Chowdhury,&nbsp;Rajat K De","doi":"10.1504/ijdmb.2015.067322","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.067322","url":null,"abstract":"<p><p>Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the success of a similarity measure. Better the ability of similarity measure in measuring similarity between genes in the presence of outliers, better will be the performance of the clustering algorithm in forming biologically relevant groups of genes. In the present article, we discuss the problem of handling outliers with different existing similarity measures and introduce the concepts of Relative Sample Outlier (RSO). We formulate new similarity, called Weighted Sample Similarity (WSS), incorporated in Euclidean distance and Pearson correlation coefficient and then use them in various clustering and biclustering algorithms to group different gene expression profiles. Our results suggest that WSS improves performance, in terms of finding biologically relevant groups of genes, of all the considered clustering algorithms.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067322","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34039166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ensemble of sparse classifiers for high-dimensional biological data. 高维生物数据的稀疏分类器集成。
IF 0.3 4区 生物学
International Journal of Data Mining and Bioinformatics Pub Date : 2015-01-01 DOI: 10.1504/ijdmb.2015.069416
Sunghan Kim, Fabien Scalzo, Donatello Telesca, Xiao Hu
{"title":"Ensemble of sparse classifiers for high-dimensional biological data.","authors":"Sunghan Kim,&nbsp;Fabien Scalzo,&nbsp;Donatello Telesca,&nbsp;Xiao Hu","doi":"10.1504/ijdmb.2015.069416","DOIUrl":"https://doi.org/10.1504/ijdmb.2015.069416","url":null,"abstract":"<p><p>Biological data are often high in dimension while the number of samples is small. In such cases, the performance of classification can be improved by reducing the dimension of data, which is referred to as feature selection. Recently, a novel feature selection method has been proposed utilising the sparsity of high-dimensional biological data where a small subset of features accounts for most variance of the dataset. In this study we propose a new classification method for high-dimensional biological data, which performs both feature selection and classification within a single framework. Our proposed method utilises a sparse linear solution technique and the bootstrap aggregating algorithm. We tested its performance on four public mass spectrometry cancer datasets along with two other conventional classification techniques such as Support Vector Machines and Adaptive Boosting. The results demonstrate that our proposed method performs more accurate classification across various cancer datasets than those conventional classification techniques.</p>","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.3,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.069416","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34123510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信