Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics最新文献

筛选
英文 中文
Breaking Ties in Weighted Interactomes 打破加权交互组中的联系
Ibrahim Youssef, Anna M. Ritz
{"title":"Breaking Ties in Weighted Interactomes","authors":"Ibrahim Youssef, Anna M. Ritz","doi":"10.1145/3107411.3108228","DOIUrl":"https://doi.org/10.1145/3107411.3108228","url":null,"abstract":"Automatic signaling pathway reconstruction methods using protein-protein interactomes are hindered by the unit/coarsely-weighted edges that lead to many paths sharing the same cost. We propose to use gene expression as an orthologous dataset from the pathway of interest to re-prioritize tied paths. The proposed method promotes/demotes paths based on the number of inferred true pathway edges. In general, it can be applied to any signaling pathway reconstruction method that outputs an ordered list of paths, trees, or other subgraphs.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129060547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ACM-BCB '17 Tutorial: Robotics-inspired Algorithms for Modeling Protein Structures and Motions ACM-BCB '17教程:机器人启发算法建模蛋白质结构和运动
Kevin Molloy, David Morris, Amarda Shehu
{"title":"ACM-BCB '17 Tutorial: Robotics-inspired Algorithms for Modeling Protein Structures and Motions","authors":"Kevin Molloy, David Morris, Amarda Shehu","doi":"10.1145/3107411.3107493","DOIUrl":"https://doi.org/10.1145/3107411.3107493","url":null,"abstract":"With biomolecular structure recognized as central to understanding mechanisms in the cell, computational chemists and biophysicists have spent significant efforts on modeling structure and dynamics. While significant advances have been made, particularly in the design of sophisticated energetic models and molecular representations, such efforts are experiencing diminishing returns. One of the culprits is low exploration capability. The impasse has attracted AI researchers to offer adaptations of robot motion planning algorithms for modeling biomolecular structures and motions. This tutorial introduces students and researchers to robotics-inspired treatments and methodologies for understanding and elucidating the role of structure and dynamics in the function of biomolecules. The presentation is enhanced via an open-source software developed in the Shehu Computational Biology laboratory. The software allows researchers to integrate themselves in a new research domain and drive further research via plug-and-play capabilities. The hands-on approach in the the tutorial benefits both students and senior researchers keen to make contributions in computational structural biology.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129207955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Classification and Prediction of Antimicrobial Peptides Using N-gram Representation and Machine Learning 基于n -图表示和机器学习的抗菌肽分类与预测
M. Othman, Sujay Ratna, Anant Tewari, Anthony M. Kang, Katherine Du, I. Vaisman
{"title":"Classification and Prediction of Antimicrobial Peptides Using N-gram Representation and Machine Learning","authors":"M. Othman, Sujay Ratna, Anant Tewari, Anthony M. Kang, Katherine Du, I. Vaisman","doi":"10.1145/3107411.3108215","DOIUrl":"https://doi.org/10.1145/3107411.3108215","url":null,"abstract":"Current antibiotic treatments for infectious diseases are drastically losing effectiveness, as the organisms they target have developed resistance to the drugs over time. In the United States, antibiotic-resistant bacterial infections annually result in more than 23,000 deaths, the morbidity rates are much higher. A promising alternative to current antibiotic treatments are antimicrobial peptides (AMPs), short sequences of amino acid residues that have been experimentally identified to inhibit the propagation of pathogens. In this study, we demonstrated that an N-gram representation of AMP sequences using reduced amino acid alphabet combined with machine learning (ML) methods provide a simple and efficient AMP classification with performance comparable to the more complex algorithms. All AMP sequences were retrieved from public data sources. Our AMP set consists of 7760 sequences, regardless of AMP subclass. We also used class-specific AMP sets (antibacterial, antiviral, antifungal, and antiparasitic). We created a raw negative set consisting of 20258 non-antimicrobial peptides (non-AMPs) using sequence fragments from annotated protein sequence databases. Models for all AMP against non-AMP sequences classification achieved a maximum accuracy of 85.0% using frequency N-gram analysis, and the RF model with 10-fold cross-validation. The datasets ranged from 200 to 7760 sequences per class. Classification using more specific classes of AMPs was conducted next. First, classification of ABPs against non-ABP sequences achieved an accuracy of up to 100% depending on a ML algorithm and alphabet reduction used. ABP against AVP sequences classification yielded a maximum accuracy of 81.8% AVP against non-AVP - 80.7% and AVP against AFP - 80.5%. The common trends present across multiple experiment series include the following: Random Forest frequently outperforms other algorithms. The optimal size of the reduced alphabet is either 3 or 4 letters. Reduction to 2 letters leads to a significant drop in accuracy, reduction to 5 or more letters does not provide any noticeable gains in classification accuracy. The results of this study indicate that N-gram based classification of AMPs is a promising approach with a strong potential for providing important insights into understanding AMP mechanisms and computationally designing new AMPs.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131438389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SparkGA: A Spark Framework for Cost Effective, Fast and Accurate DNA Analysis at Scale SparkGA:一个具有成本效益,快速和准确的DNA分析的Spark框架
Hamid Mushtaq, Frank Liu, Carlos H. A. Costa, Gang Liu, H. P. Hofstee, Z. Al-Ars
{"title":"SparkGA: A Spark Framework for Cost Effective, Fast and Accurate DNA Analysis at Scale","authors":"Hamid Mushtaq, Frank Liu, Carlos H. A. Costa, Gang Liu, H. P. Hofstee, Z. Al-Ars","doi":"10.1145/3107411.3107438","DOIUrl":"https://doi.org/10.1145/3107411.3107438","url":null,"abstract":"In recent years, the cost of NGS (Next Generation Sequencing) technology has dramatically reduced, making it a viable method for diagnosing genetic diseases. The large amount of data generated by NGS technology, usually in the order of hundreds of gigabytes per experiment, have to be analyzed quickly to generate meaningful variant results. The GATK best practices pipeline from the Broad Institute is one of the most popular computational pipelines for DNA analysis. Many components of the GATK pipeline are not very parallelizable though. In this paper, we present a parallel implementation of a DNA analysis pipeline based on the big data Apache Spark framework. This implementation is highly scalable and capable of parallelizing computation by utilizing data-level parallelism as well as load balancing techniques. In order to reduce the analysis cost, the framework can run on nodes with as little memory as 16GB. For whole genome sequencing experiments, we show that the runtime can be reduced to about 1.5 hours on a 20-node cluster with an accuracy of up to 99.9981%. Our solution is about 71% faster than other state-of-the-art solutions while also being more accurate. The source code of the software described in this paper is publicly available at https://github.com/HamidMushtaq/SparkGA1.git.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125694425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Challenges in Prediction of different Cancer Stages using Gene Expression Profile of Cancer Patients 利用癌症患者基因表达谱预测不同癌症分期的挑战
Sherry Bhalla, Suresh Sharma, Gajendra P.S. Raghava
{"title":"Challenges in Prediction of different Cancer Stages using Gene Expression Profile of Cancer Patients","authors":"Sherry Bhalla, Suresh Sharma, Gajendra P.S. Raghava","doi":"10.1145/3107411.3108211","DOIUrl":"https://doi.org/10.1145/3107411.3108211","url":null,"abstract":"Despite the plethora of gene expression based cancer biomarkers in the scientific literature, a few make their way to the clinic. In the past, several efforts have been made to predict cancer biomarkers with very limited success so far. One of the challenges in the field of cancer biology is to predict cancer at an early stage. The success of various therapies to treat cancer patients depends on correct identification of stage or progression of cancer. Despite the tremendous progress in the field of genomics and proteomics, the performance of stage classification has not improved substantially. Recently our group also developed CancerCSP, a server with prediction models for discriminating early and late stage of clear cell renal cancer (ccRCC) samples based on the gene expression profile. We achieved maximum accuracy of 72.64% with ROC value 0.81, despite the fact that we tried state of- the-art techniques to improve the performance of our models. This raises the question, why the models fail to discriminate ccRCC patients in the early and late stage with high accuracy. In this poster, the analysis is carried out on ccRCC samples obtained from The Cancer Genome Atlas (TCGA) data portal to understand the reasons for the failure of the stage classification models. Firstly, we performed bin-wise analysis of top 20 genes that can discriminate (single gene-based models using threshold) early and late stage samples with highest ROC. A significant overlap was observed in the expression of each gene in early and late stage samples. Though the number of early and late stage samples varied in different gene expression bins, this was not sufficient to classify both types of samples with high accuracy. As an example, the gene NR3C2 had maximum ROC of 0.67 at expression (log RSEM) of 7.61. There were nearly 70% early stage patients above this threshold that made it an average expression marker but the presence of nearly 55% of late stage patients above this threshold increased the false positives. Secondly, we performed hierarchical clustering of ccRCC samples using 64- gene expression features selected using Weka showed weak concordance with pathological stage. The k-means clustering of patients into four groups showed four separable clusters, but these clusters were not associated with the pathological stage. These observations led to the conclusion that the molecular parameters do not always comply with histopathological features. The third analysis was done to identify patients, which were not predicted correctly by any of the four machine-learning algorithms (SVM, Random Forest, SMO and Naïve Bayes). Many samples were not predicted correctly by any of the four machine-learning methods. The false positives and false negatives belonged to explicit clusters obtained through clustering. This further points out to the interspersed nature of the data to differentiate between histopathological stages of cancer. We reach the conclusion that expression profile of","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116948266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SMILE: A Novel Procedure for Subcellular Module Identification with Localization Expansion SMILE:一种定位扩展的亚细胞模块识别新方法
Lixin Cheng, Pengfei Liu, K. Leung
{"title":"SMILE: A Novel Procedure for Subcellular Module Identification with Localization Expansion","authors":"Lixin Cheng, Pengfei Liu, K. Leung","doi":"10.1145/3107411.3110415","DOIUrl":"https://doi.org/10.1145/3107411.3110415","url":null,"abstract":"We propose a novel procedure, Subcellular Module Identification with Localization Expansion (SMILE), to identify super modules that consist of several subcellular modules performing specific biological functions among cell compartments. These super modules identified by SMILE are more functionally diverse and have been verified to be more associated with known protein complexes and biological pathways compared with the modules identified from the global protein interaction networks in both the ComPPI and InWeb_InBioMap protein interaction datasets. Our results reveal that subcellular localization is a principal feature of functional modules and offers important guidance in detecting biologically meaningful results.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124438587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Session details: Session 3: Proteins and RNA Structure, Dynamics, and Analysis I 会议详情:第三部分:蛋白质和RNA结构、动力学和分析
F. Jagodzinski
{"title":"Session details: Session 3: Proteins and RNA Structure, Dynamics, and Analysis I","authors":"F. Jagodzinski","doi":"10.1145/3254546","DOIUrl":"https://doi.org/10.1145/3254546","url":null,"abstract":"","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115451396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pathway Enrichment Analysis for Untargeted Metabolomics 非靶向代谢组学的途径富集分析
V. Porokhin, Xinmeng Li, S. Hassoun
{"title":"Pathway Enrichment Analysis for Untargeted Metabolomics","authors":"V. Porokhin, Xinmeng Li, S. Hassoun","doi":"10.1145/3107411.3108233","DOIUrl":"https://doi.org/10.1145/3107411.3108233","url":null,"abstract":"Metabolomics-based studies have provided critical insights across many applications and now offer researchers an opportunity to collect information about thousands of small molecules in-bulk through untargeted metabolomics. However, taking advantage of this new development requires accurate identification of metabolites and their biological significance in a given sample, which unfortunately remains difficult. Pathway enrichment is a powerful method that can aid in addressing those tasks, but existing techniques intended for gene enrichment analysis are not directly applicable to untargeted metabolomics. In this poster we address the following problem: given a network model of the biological sample and a likelihood score of observing metabolites (nodes) within the network, compute the enrichment of pathways within the network model. We approach this challenge as an optimization problem, where a solution is defined as a particular assignment of mass features to candidate metabolites. The method generates possible assignments of features to compounds using in silico fragmentation tools (e.g., MetFrag [1], CFM-ID [2], and CSI:FingerID [3]) and spectral database (e.g., MassBank [4]) and then attempts to iteratively improve a possible solution. By developing this method, we enable the use of pathway enrichment as an effective way of metabolite identification in untargeted metabolomics.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128925523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Folding Large Proteins by Ultra-Deep Learning 通过超深度学习折叠大蛋白质
Jinbo Xu
{"title":"Folding Large Proteins by Ultra-Deep Learning","authors":"Jinbo Xu","doi":"10.1145/3107411.3107456","DOIUrl":"https://doi.org/10.1145/3107411.3107456","url":null,"abstract":"Ab initio protein folding is one of the most challenging problems in computational biology. The popular fragment assembly method mainly can only fold some small proteins. Recently contact-assisted folding has made some progress, but it requires accurate contact prediction, which by existing methods can only be achieved on some proteins with a very large number (>500 or 1000) of sequence homologs. To deal with proteins without so many sequence homologs, we have developed a novel deep learning model for contact prediction by concatenating two deep residual neural networks (ResNet), which performed the best in 2015 computer vision challenges. The first ResNet conducts convolutional transformation of 1-dimensional features and the second conducts convolutional transformation of 2-dimensional information including output of the first one. Experimental results suggest that our deep learning method greatly outperforms existing contact prediction methods and doubles the accuracy of pure co-evolution methods on proteins without many sequence homologs. Our method is ranked 1st in terms of the total F1 score in the latest CASP competition (i.e., CASP12), although back then (May-July 2016) our method was not fully implemented. Our predicted contacts also lead to much more accurate contact-assisted folding. Blindly tested in the weekly benchmark CAMEO (which can be interpreted as fully-automated CASP) since October 2016, our fully-automated web server implementing this method successfully folded many large hard targets (up to 600 residues) without good templates and many sequence homologs. Our large-scale benchmark indicates that ab initio folding (based upon predicted contacts) now can correctly fold more than 2/3 of randomly-chosen proteins. We have also applied this method to membrane protein contact prediction, which produces very good results in terms of both contact prediction accuracy and folding. An important finding is that even trained by only non-membrane proteins, our deep model works very well on membrane protein contact prediction and folding. This is because our deep model learns to predict contacts by making use of contact occurrence patterns (which are shared between membrane and non-membrane proteins) instead of sequence similarity. This method can also be extended to protein-protein interaction prediction, protein complex prediction and protein docking. Our web server implementing this method is publicly available at http://raptorx.uchicago.edu/ContactMap/ . For technical and result details, please see our papers [1-2].","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124657876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mining Faces from Biomedical Literature using Deep Learning 使用深度学习从生物医学文献中挖掘人脸
M. Dawson, Andrew Zisserman, C. Nellåker
{"title":"Mining Faces from Biomedical Literature using Deep Learning","authors":"M. Dawson, Andrew Zisserman, C. Nellåker","doi":"10.1145/3107411.3107476","DOIUrl":"https://doi.org/10.1145/3107411.3107476","url":null,"abstract":"Gaining access to large, labelled sets of relevant images is crucial for the development and testing of biomedical imaging algorithms. Using images found in biomedical research articles would contribute some way towards a solution to this problem. However, this approach critically depends on being able to identify the most relevant images from very large sets of potentially useful figures. In this paper a deep convolutional neural network (CNN) classifier is trained using only synthetic data, to rapidly and accurately label raw images taken from biomedical articles. We apply this method in the context of detecting faces in biomedical images; and show that the classifier is able to retrieve figures containing faces with an average precision of 94.8%, from a dataset of over 31,000 images taken from articles held in the PubMed database. The utility of the classifier is then demonstrated through a case study, by aiding the mining of photographs of patients with rare genetic disorders from targeted articles. This approach is readily adaptable to facilitate the retrieval of other categories of biomedical images.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126103256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信