{"title":"A Particle Swarm Optimization based Hybrid Recommendation System","authors":"R. Behera, S. Dash","doi":"10.4018/IJKDB.2016070101","DOIUrl":"https://doi.org/10.4018/IJKDB.2016070101","url":null,"abstract":"Due to rapid digital explosion user shows interest towards finding suggestions regarding a particular topic before taking any decision. Nowadays, a movie recommendation system is an upcoming area which suggests movies based on user profile. Many researchers working on supervised or semi-supervised ensemble based machine learning approach for matching more appropriate profiles and suggest related movies. In this paper a hybrid recommendation system is proposed which includes both collaborative and content based filtering to design a profile matching algorithm. A nature inspired Particle Swam Optimization technique is applied to fine tune the profile matching algorithm by assigning to multiple agents or particle with some initial random guess. The accuracy of the model will be judged comparing with Genetic algorithm.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123859467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sushruta Mishra, B. K. Mishra, Soumya Sahoo, Bijayalaxmi Panda
{"title":"Impact of Swarm Intelligence Techniques in Diabetes Disease Risk Prediction","authors":"Sushruta Mishra, B. K. Mishra, Soumya Sahoo, Bijayalaxmi Panda","doi":"10.4018/IJKDB.2016070103","DOIUrl":"https://doi.org/10.4018/IJKDB.2016070103","url":null,"abstract":"Diabetes has affected over 246 million people worldwide and by 2025 it is expected to rise to over 380 million. With the rise of information technology and its continued advent into the medical and healthcare sector, different symptoms of diabetes are being documented. The techniques inspired from the distributed collective behavior of social colonies have shown worth and excellence in dealing with complex optimization problems and are becoming more popular nowadays. It can be used as an effective problem solving tool for identifying diabetes disease risks. This paper aims at finding solutions to diagnose the disease by analyzing the patterns found in data through various swarm optimization techniques by employing Support Vector Machines and Naive Bayes algorithms. It proposes a quicker and more efficient technique of diagnosing the disease, leading to timely treatment of the patients.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129170443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Latent Feature Model Approach to Biclustering","authors":"J. Caldas, Samuel Kaski","doi":"10.4018/IJKDB.2016070102","DOIUrl":"https://doi.org/10.4018/IJKDB.2016070102","url":null,"abstract":"Biclustering is the unsupervised learning task of mining a data matrix for useful submatrices, for instance groups of genes that are co-expressed under particular biological conditions. As these submatrices are expected to partly overlap, a significant challenge in biclustering is to develop methods that are able to detect overlapping biclusters. The authors propose a probabilistic mixture modelling framework for biclustering biological data that lends itself to various data types and allows biclusters to overlap. Their framework is akin to the latent feature and mixture-of-experts model families, with inference and parameter estimation being performed via a variational expectation-maximization algorithm. The model compares favorably with competing approaches, both in a binary DNA copy number variation data set and in a miRNA expression data set, indicating that it may potentially be used as a general-problem solving tool in biclustering.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133763655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genome Sequence Analysis in Distributed Computing using Spark","authors":"S. Ap., Pooja Mehta, J. Anuradha, B. Tripathy","doi":"10.4018/IJKDB.2015070103","DOIUrl":"https://doi.org/10.4018/IJKDB.2015070103","url":null,"abstract":"Integration of Computer Science with Bio Science has led to new field Computational Biology which created an opportunity in speeding up the process of analyzing the Bio-data. DNA sequence analysis especially finding the base pairs that helps in identifying the order of nucleotides present in all living beings, it also helps in forensics for DNA profiling and parenting testing. This sequence analysis has been a challenging task in Computational Biology due to large volumes of data and need of more computational resources. Using a distributed file system with distributed computation of tasks can be one of the solutions to above problem. In this paper, the authors use Spark a query engine for large-scale data processing in analyzing the DNA sequence and extracting the base pairs and also they try to improve base pair extraction with improvised algorithms.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124797841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Test Data Generation using Metaheuristic Cuckoo Search Algorithm","authors":"M. Panda, P. Sarangi, S. Dash","doi":"10.4018/IJKDB.2015070102","DOIUrl":"https://doi.org/10.4018/IJKDB.2015070102","url":null,"abstract":"The proposed work emphasizes on the automated process of test data generation for unit testing of structured programs, targeting complete path coverage of the software under test. In recent years, Cuckoo Search CS has been successfully applied in many engineering applications because of its high convergence rate to the global solution. The authors motivated with the performance of Cuckoo search, utilized it to generate test suits for the standard benchmark problems, covering entire search space of the input data in less iterations. The experimental results reveal that the proposed approach covers entire search space generating test data for all feasible paths of the problem in few number of generations. It is observed that proposed approach gives promising results and outperforms other reported algorithms and it can be an alternative approach in the field of test data generation.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130358079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sequence Analysis of a Subset of Plasma Membrane Raft Proteome Containing CXXC Metal Binding Motifs: Metal Binding Proteins","authors":"S. Sahu, H. Behuria, Sangam Gupta, Babita Sahoo","doi":"10.4018/IJKDB.2015070101","DOIUrl":"https://doi.org/10.4018/IJKDB.2015070101","url":null,"abstract":"In an attempt to identify the metal sensing proteins localized to mammalian plasma membrane, the authors screened a list of 300 raft associated proteins that are involved in cellular signaling mechanisms by searching the presence of metal thionin CXXC motifs. 50 proteins were found to possess CXXC motifs that could act as potential metal sensing proteins. The authors determined membrane topologies of the above CXXC motif containing proteins using TM-pred and analyzed the positions of their transmembrane TM domains using Bio-edit software. Based on the topology of CXXC domains, the authors classified all the raft-associated metal sensing proteins into six categories. They are i Exoplasmic tails with CXXC motif, ii Exoplasmic loops with CXXC motif, iii Cytosolic tails with CXXC motif, iv Cytosolic loop with CXXC motif, v TM domains with CXXC motifs, vi Proteins with multiple topologies of CXXC motif. The authors' study will lead to understanding of the raft-mediated mechanism of heavy metal sensing and signaling in mammalian cells.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130750148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Analysis of Microarray Data Classification using Machine Learning Techniques","authors":"S. Pani, B. Ratha, Ajay Kumar Mishra","doi":"10.4018/IJKDB.2015070104","DOIUrl":"https://doi.org/10.4018/IJKDB.2015070104","url":null,"abstract":"Microarray technology of DNA permits simultaneous monitoring and determining of thousands of gene expression activation levels in a single experiment. Data mining technique such as classification is extensively used on microarray data for medical diagnosis and gene analysis. However, high dimensionality of the data affects the performance of classification and prediction. Consequently, a key issue in microarray data is feature selection and dimensionality reduction in order to achieve better classification and predictive accuracy. There are several machine learning approaches available for feature selection. In this study, the authors use Particle Swarm Organization PSO and Genetic Algorithm GA to find the performance of several popular classifiers on a set of microarray datasets. Experimental results conclude that feature selection affects the performance.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131294259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pharmacy Data Integrity for Optimal Analytics","authors":"C. Butler","doi":"10.4018/IJKDB.2014070103","DOIUrl":"https://doi.org/10.4018/IJKDB.2014070103","url":null,"abstract":"The use and creation of data continues to proliferate, with each year seeing further reliance on information systems to support practitioners and organizations. This surge is expected to continue at an even faster pace with the increased use of inexpensive storage methods of any type of data, plus greater reliance on computerized clinical decision-making tools. Yet data integrity remains essential to every organization and to every healthcare practitioner in order to ensure the correct use of patient information to optimize care. It provides the assurance that the data you see every day is the same as it was the day before. It promises that the drug dosage regimen \"QID,\" whether you define it as four times daily or four times daily with meals and at bedtime, is applied using the same parameters for every patient as you define a patient, across every day or any time period, as you define day in your health care setting. It also means that referential connections between data values must be consistent. When a specific patient takes a specific combination of drug products, referential integrity must be applied to ensure the correct products, drug ingredients and strengths are recognized as being received by that patient. Definitions about data and their referential relationships must be made by the business person the practitioner, rather than by information technology IT. Only by doing this can appropriate business rules by applied by a database, which manages the information used in electronic medical records. Once a decision is made about what a datum represents, and how it relates to other data, whether by an individual or a group, it is imperative that the decision remain consistent over time. Should the definition evolve, it is also imperative that that evolution be tracked. Thus, organizations must establish governance committees to maintain consistency both across an organization and across time. Governance committees must have the highest level of authority to ensure that rules are not overridden on a casual, intermittent basis. Once business rules for data have been established, use of a relational database provides one of the strongest tools for ensuring that data integrity is maintain. This paper explores the concepts serving as the foundation for today's relational database management systems. A top-down approach is described using an Entity-Relationship diagram that can be used to create a relational model for implementation in a relational database management system. A bottom-up approach is described using functional dependencies and normalization. A pharmacist should be able to apply these concepts in corporation with a database architect to ensure the appropriate, consistent use of drug data within an organization. A pharmacist must be able to validate all drug information being used across the organization in order to minimize medication errors and optimize patient care. Only by being the subject matter expert on governance committ","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126161620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive Data Visualization to Understand Data Better: Case Studies in Healthcare System","authors":"Zhecheng Zhu, B. Heng, K. Teow","doi":"10.4018/IJKDB.2014070101","DOIUrl":"https://doi.org/10.4018/IJKDB.2014070101","url":null,"abstract":"This paper focuses on interactive data visualization techniques and their applications in healthcare systems. Interactive data visualization is a collection of techniques translating data from its numeric format to graphic presentation dynamically for easy understanding and visual impact. Compared to conventional static data visualization techniques, interactive data visualization techniques allow users to self-explore the entire data set by instant slice and dice, quick switching among multiple data sources. Adjustable granularity of interactive data visualization allows for both detailed micro information and aggregated macro information displayed in a single chart. Animated transition adds extra visual impact that describes how system transits from one state to another. When applied to healthcare system, interactive visualization techniques are useful in areas such as information integration, flow or trajectory presentation and location related visualization, etc. In this paper, three case studies are shared to illustrate how interactive data visualization techniques are applied to various aspects of healthcare systems. The first case study shows a pathway visualization representing longitudinal disease progression of a patient cohort. The second case study shows a dashboard profiling different patient cohorts from multiple perspectives. The third case study shows an interactive map illustrating patient geographical distribution at adjustable granularity. All three case studies illustrate that interactive data visualization techniques help quick information access, fast knowledge sharing and better decision making in healthcare system.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129235282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Nersisyan, H. Löffler-Wirth, A. Arakelyan, H. Binder
{"title":"Gene Set- and Pathway- Centered Knowledge Discovery Assigns Transcriptional Activation Patterns in Brain, Blood, and Colon Cancer: A Bioinformatics Perspective","authors":"L. Nersisyan, H. Löffler-Wirth, A. Arakelyan, H. Binder","doi":"10.4018/IJKDB.2014070104","DOIUrl":"https://doi.org/10.4018/IJKDB.2014070104","url":null,"abstract":"Genome-wide 'omics'-assays provide a comprehensive view on the molecular landscapes of healthy and diseased cells. Bioinformatics traditionally pursues a 'gene-centered' view by extracting lists of genes differentially expressed or methylated between healthy and diseased states. Biological knowledge mining is then performed by applying gene set techniques using libraries of functional gene sets obtained from independent studies. This analysis strategy neglects two facts: i that different disease states can be characterized by a series of functional modules of co-regulated genes and ii that the topology of the underlying regulatory networks can induce complex expression patterns that require analysis methods beyond traditional genes set techniques. The authors here provide a knowledge discovery method that overcomes these shortcomings. It combines machine learning using self-organizing maps with pathway flow analysis. It extracts and visualizes regulatory modes from molecular omics data, maps them onto selected pathways and estimates the impact of pathway-activity changes. The authors illustrate the performance of the gene set and pathway signal flow methods using expression data of oncogenic pathway activation experiments and of patient data on glioma, B-cell lymphoma and colorectal cancer.","PeriodicalId":160270,"journal":{"name":"Int. J. Knowl. Discov. Bioinform.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129505441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}