M. Ohsaki, Hayato Sasaki, Naoya Kishimoto, S. Katagiri, P. Then
{"title":"Discovery of Sets and Representatives of Variables in Co-nonlinear Relationships by Neural Network Regression and Group Lasso","authors":"M. Ohsaki, Hayato Sasaki, Naoya Kishimoto, S. Katagiri, P. Then","doi":"10.1109/BIBM.2018.8621207","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621207","url":null,"abstract":"In regression and classification, the dependences among input variables lead to the reduction in prediction performance and reliability and to the misidentification of contributable input variables. Not only for these issues but also knowledge discovery, it is necessary to clarify variable dependences. This study aims to discover the sets and representatives of co-nonlinear variables, ensuring a high nonlinearity modeling capability and a high reproducibility without variable combinational explosion. Our proposed method achieves this by combining neural network regression, group lasso, and complementary aggregation of regression results. We conducted experiments to examine the fundamental effectiveness of the proposed method, using synthetic data of which co-nonlinearities were known. As a result, the proposed method succeeded to discover the sets and representatives of co-nonlinear variables robustly to noise added to the variables.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129214641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BIBM 2018 Main Conference Program Committee Members","authors":"","doi":"10.1109/bibm.2018.8621561","DOIUrl":"https://doi.org/10.1109/bibm.2018.8621561","url":null,"abstract":"","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129253900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming Xiao, Jiakun Li, Song Hong, Yongtao Yang, Junhua Li, Jian-xin Wang, Jian Yang, W. Ding, Le Zhang
{"title":"K-mer Counting: memory-efficient strategy, parallel computing and field of application for Bioinformatics","authors":"Ming Xiao, Jiakun Li, Song Hong, Yongtao Yang, Junhua Li, Jian-xin Wang, Jian Yang, W. Ding, Le Zhang","doi":"10.1109/BIBM.2018.8621325","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621325","url":null,"abstract":"Currently, k-mer counting is an important algorithm for bioinformatics research. This review lists the major application fields of k-mer counting in Bioinformatics at the beginning. Next, we introduce the commonly used memory-efficient strategy for k-mer counting tools, because the large amount of memory request is a bottleneck of k-mer counting tools. Next we illustrate the current parallel computing technologies for k-mer counting tool. Finally, we discuss the future study for k-mer counting.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125528488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guitao Cao, Tiantian Huang, Kai Hou, W. Cao, Peng Liu, Jiawei Zhang
{"title":"3D Convolutional Neural Networks Fusion Model for Lung Nodule Detection onClinical CT Scans","authors":"Guitao Cao, Tiantian Huang, Kai Hou, W. Cao, Peng Liu, Jiawei Zhang","doi":"10.1109/BIBM.2018.8621468","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621468","url":null,"abstract":"Automatically accurate pulmonary nodule detection plays an important role in lung cancer diagnosis and early treatment. We propose a three-dimensional (3D) Convolutional Neural Networks (ConvNets) fusion model for lung nodule detection on clinical CT scans. Two 3D ConvNets models are trained separately without any pre-training weights: One trained on the LUng Nodule Analysis 2016 dataset (LUNA) and additional augmented data to learn the nodules’ representative features in volumetric space, which may cause overfitting problems meanwhile, so we train another network on original data and fuse the results of the two best-performing models to reduce this risk. Both use reshaped objective function to solve the class imbalance problem and differentiate hard samples from easy samples. More importantly, 335 patients’ CT scans from the hospital are further used to evaluate and help optimize the performance of our approach in the real situation, and we develop a system based on this method. Experimental results show a sensitivity of 95.1% at 8 false positives per scan in Free Receiver Operating Characteristics (FROC) curve analysis, and our system has a pleasing generalization ability in clinical data.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126844818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Arisi, P. Bertolazzi, Eleonora Cappelli, F. Conte, Fabio Cumbo, G. Fiscon, M. Sonnessa, F. Taglino
{"title":"An ontology-based approach to improve data querying and organization of Alzheimer’s Disease data","authors":"I. Arisi, P. Bertolazzi, Eleonora Cappelli, F. Conte, Fabio Cumbo, G. Fiscon, M. Sonnessa, F. Taglino","doi":"10.1109/BIBM.2018.8621524","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621524","url":null,"abstract":"The recent advances in biotechnology and IT have led to an ever-increasing availability of public biomedical data distributed in large databases. Analyzing this huge volume of data is a challenging task because of its complexity, high heterogeneity and its multiple and numerous correlated factors. In the framework of neurodegenerative diseases, the last years have witnessed the creation of specialized databases such as the international projects ADNI (Alzheimer’s Disease Neuroimaging Initiative). The main problems to fully exploit this database are related to the querying, integration, and analysis of data themselves. Here, we aim to develop a detailed ontology for clinical multidimensional datasets from ADNI repository in order to simplify the data access and to obtain new diagnostic knowledge about Alzheimer’s Disease.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126505082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Ahn, Taewan Goo, Chan-hee Lee, Sungmin Kim, Kyullhee Han, Sangick Park, T. Park
{"title":"Deep Learning-based Identification of Cancer or Normal Tissue using Gene Expression Data","authors":"T. Ahn, Taewan Goo, Chan-hee Lee, Sungmin Kim, Kyullhee Han, Sangick Park, T. Park","doi":"10.1109/BIBM.2018.8621108","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621108","url":null,"abstract":"Background: Deep learning has proven to show outstanding performance in resolving recognition and classification problems. As increasing amounts of cancer and normal gene expression data become publicly available, deep learning may become an integral component of efficiently finding specific patterns within massive datasets. Thus, we aim to address the extent to which the machine can learn to recognize cancer. We integrated cancer and normal tissue data from the Gene Expression Omnibus (GEO), The Cancer Gene Atlas (TCGA), Therapeutically Applicable Research To Generate Effective Treatments (TARGET), and Genotype-Tissue Expression (GTEx) databases, including 13,406 cancer and 12,842 normal gene expression data from 24 different tissues. We first trained the deep neural network (DNN) to discriminate between cancer and normal samples using various gene selection strategies and therapeutic target genes from commercial cancer panels and genes in NCI-curated cancer pathways. We also suggest systemic analyzation method to interpret trained deep neural network. We applied the method to find genes mostly contribute to classify cancer in an individual sample. Result: The best trained DNN could classify cancer and normal data with accuracy of 0.997 in the training data set of 13,123 (cancer: 6,703, normal: 6,402) samples. In the independent test set comprising 13,125 (cancer: 6,703, normal: 6,422) samples, the DNN model achieved 0.979 accuracy. Using the same training and test data, our DNN showed better performance than other conventional prediction methods, followed by the support vector machine approach. For interpretation, we propose a method that can extract a gene’s contribution to an individual sample’s cancer probability from the trained DNN. This method distinguished samples dependent on one or a few genes suggesting these samples are possibly}}{{it “oncogene addicted”. Conclusion: A deep learning approach in conjunction with our interpretation method is not only a useful tool to identify cancer from gene expression data but can also contribute toward understanding the complex nature of cancer based on large public data.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122570824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Chimeh, Peter Heywood, M. Pennisi, F. Pappalardo, P. Richmond
{"title":"Parallel Pair-Wise Interaction for Multi-Agent Immune Systems Modelling","authors":"M. Chimeh, Peter Heywood, M. Pennisi, F. Pappalardo, P. Richmond","doi":"10.1109/BIBM.2018.8621404","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621404","url":null,"abstract":"Agent Based Modelling (ABM), is an approach for modelling dynamic systems and studying complex and emergent behaviour. ABM approach is a very common technique in biological domain due to high demand for a large scale analysis tool to collect and interpret information to solve biological problems. However, simulating large scale cellular level models (i.e. large number of agents/entities) require a high degree of computational power which is achievable through parallel computing methods such as Graphics Processing Units (GPUs). The use of parallel approaches in ABMs is growing rapidly specifically when modelling in continuous space system (particle based). Parallel implementation of particle based simulation within continuum space where agents contain quantities of chemicals/substances is very challenging. Pair-wise interactions are different abstraction to continuous space (particle) models which is commonly used for immune system modelling. This paper describes an approach to parallelising the key component of biological and immune system models (pair-wise interactions) within an ABM model. Our performance results demonstrate the applicability of this method to a broader class of biological systems with the same type of cell interactions and that it can be used as the basis for developing complete immune system models on parallel hardware.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126682895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploration of dysregulated lncRNA-mRNA network from the RNA-seq data of rats induced by three different synthetic cytotoxic compounds","authors":"D. Leng, Chen Huang, J. Lei, Shixue Sun, XD Zhang","doi":"10.1109/BIBM.2018.8621456","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621456","url":null,"abstract":"Accumulating evidence has demonstrated that long non-coding RNAs (lncRNAs) play important roles in initiation and development of human diseases. However, the mechanism of the targets regulated by lncRNA remains unclear. In this study, we performed a multi-step computational analysis to construct dysregulated lncRNA-mRNA networks for the rats’ RNAseq data induced by three different synthetic cytotoxic compounds (CARBON TETRACHLORIDE, CHLOROFORM, THIOACETAMIDE). We systematically integrated lncRNA and mRNA expression profiles and lncRNA-mRNA regulatory interactions. The constructed interaction network exhibited biological network characteristics, and functional analysis demonstrated that the networks were specific for inducing synthetic compounds. Additionally, we identified some lncRNA-mRNA modules. This study will provide us new insight into lncRNA-mRNA regulatory mechanisms involved in rats induced by three different synthetic cytotoxic compounds and will facilitate the discovery of candidate diagnostic and prognosis biomarkers for related diseases.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114242665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boundary Detection by Determining the Difference of Classification Probabilities of Sequences: Topic Segmentation of Clinical Notes","authors":"W. Ruan, Won-sook Lee","doi":"10.1109/BIBM.2018.8621195","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621195","url":null,"abstract":"Topic segmentation of clinical notes is a significant issue in the information retrieval domain that could effectively help the process of diagnosis. In this study, we propose a methodology of topic segmentation to clinical notes with boundary detection by determining the difference of classification probabilities of sequences. With 1127 text plain clinical notes collected from I2B2 we experiment on 5 topics: medications, history, hospital course, laboratories and physical exams. The Naive Bayes and Linear SVM models with a selected feature of BOW are employed to train Topic Score Predictors that assign each sequence with a 5-dimensional vector $v_{i}$ in which each element represents the probability of the sequence belonging to a corresponding class. By analyzing the vector $rho = [v_{1},v_{2},cdots cdots v_{i}]$, the boundaries would be detected by finding the locations where topic scores have a rapid change. Famous Windiff, $P_{k}$ and $F_{1}$ Score metrics are used for evaluating our system. Segmenter based on Naive Bayes shows superior performance to that based on SVM model having 0.1468 for Windiff, 0.1221 for $P_{k}$ and averaged $F_{1}$ Score over 0.90.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121155677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Else-Tree Classifier for Minimizing Misclassification of Biological Data","authors":"Truong X. Tran, M. Pusey, R. S. Aygün","doi":"10.1109/BIBM.2018.8621322","DOIUrl":"https://doi.org/10.1109/BIBM.2018.8621322","url":null,"abstract":"Misclassification has a high cost in biological research studies such as protein crystallization. For drug development, the 3D structure of a protein is obtained by first crystallizing the protein. Hence, missing a crystalline condition may hinder the development of a drug. It is important to develop classification algorithms that would avoid or minimize misclassifications. Traditional decision tree classifiers are based on an impurity measure that identifies the most informative attribute to be selected at the early levels of a decision tree. The class labels are chosen based on majority of class labels at a leaf node. We introduce a novel decision tree classifier, else-tree, by analyzing pure regions or ranges of an attribute per class. After identifying the longest or most populated contiguous range per class, the rest of the ranges are fed into else branch of the decision tree. Only conflicting or doubtful samples are passed to the lower levels of the decision tree. It does not necessarily assign a class for difficult samples to classify. We have used our protein crystallization trials data and three other publicly available datasets to evaluate else-tree. The experiments show that the else-tree may reduce the misclassification to 0% by labeling difficult samples as undecided when the training set is a good representation of the dataset.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116204181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}