{"title":"A PubMed Meta Search Engine Based on Biomedical Entity Mining","authors":"Andreas Kanavos, E. Theodoridis, A. Tsakalidis","doi":"10.1109/DEXA.2014.32","DOIUrl":"https://doi.org/10.1109/DEXA.2014.32","url":null,"abstract":"Biomedical knowledge stored in the web is increasing significantly as most of the biomedical research papers are published online. Biomedical entity extraction is a crucial procedure for efficient text analysis and retrieval. PubMed is a very popular indexing engine, concerning life sciences and biomedical research. Being a free database, it accesses primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. In this work, we propose a metasearch engine over PubMed, which classifies PubMed results according to their specific topic and the extracted Biomedical entities. This method helps researchers to browse and search in the retrieved results. In order to provide more accurate clustering results, we utilize the biomedical ontology, named MeSH as well as RxNorm which is a tool for supporting semantic interoperation between drug terminologies and pharmacy knowledge base systems. Finally, we embed the proposed methodology in an online system.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115891013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jorge Martínez Gil, Georgios C. Chasparis, B. Freudenthaler, T. Natschläger
{"title":"Realistic User Behavior Modeling for Energy Saving in Residential Buildings","authors":"Jorge Martínez Gil, Georgios C. Chasparis, B. Freudenthaler, T. Natschläger","doi":"10.1109/DEXA.2014.38","DOIUrl":"https://doi.org/10.1109/DEXA.2014.38","url":null,"abstract":"Due to the high costs of live research, performance simulation has become a widely accepted method of assessment for the quality of proposed solutions in this field. Additionally, being able to simulate the behavior of the future occupants of a residential building can be very useful since it can support both design-time and run-time decisions leading to reduced energy consumption through, e.g., the design of model predictive controllers that incorporate user behavior predictions. In this work, we provide a framework for simulating user behavior in residential buildings. In fact, we are interested in how to deal with all user behavior aspects so that these computer simulations can provide a realistic framework for testing alternative policies for energy saving.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132852724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Approach for Phylogenetic Tree Construction Based on Minimal Absent Words","authors":"Supaporn Chairungsee","doi":"10.1109/DEXA.2014.21","DOIUrl":"https://doi.org/10.1109/DEXA.2014.21","url":null,"abstract":"An absent word (or a forbidden word) is a word that does not appear in a given sequence. It is a minimal absent word if all its proper factors occur in the given sequence. In this paper, we propose a linear-time algorithm to compute the minimal absent words for DNA sequence using a suffix automaton. This method outputs the whole set of minimal absent words. We apply a Neighbor-Joining method to construct phylogenetic tree based on the minimal absent words.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"48 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120988446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"eXsight: An Analytical Framework for Quantifying Financial Loss in the Aftermath of Catastrophic Events","authors":"M. Coelho, A. Rau-Chaplin","doi":"10.1109/DEXA.2014.45","DOIUrl":"https://doi.org/10.1109/DEXA.2014.45","url":null,"abstract":"In this paper we explore the design of an analytical framework for quantifying financial loss in the aftermath of catastrophic events. The idea is to aggregate the thousands of exposure databases received by a single reinsurer into a giant loosely structured exposure portfolio and then use Big Data analysis technology, originally developed in the context of web-scale analytics, to rapidly perform natural but ad-hoc loss analysis immediately after an event. As in many situational analysis problems, the challenge here is to work with both categorical and geospatial data, deal with partial data often at varying levels of aggregation, integrate data from many sources, and provide an analysis framework in which analyses can be rapidly performed in the hours, days, and weeks immediately after an event.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122658740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Protein Data Modelling for Concurrent Sequential Patterns","authors":"Jing Lu, M. Keech, Cuiqing Wang","doi":"10.1109/DEXA.2014.19","DOIUrl":"https://doi.org/10.1109/DEXA.2014.19","url":null,"abstract":"Protein sequences from the same family typically share common patterns which imply their structural function and biological relationship. The challenge of identifying protein motifs is often addressed through mining frequent item sets and sequential patterns, where post-processing is a useful technique. Earlier work has shown that Concurrent Sequential Patterns mining can be applied in bioinformatics, e.g. to detect frequently occurring concurrent protein sub-sequences. This paper presents a companion approach to data modelling and visualisation, applying it to real-world protein datasets from the PROSITE and NCBI databases. The results show the potential for graph-based modelling in representing the integration of higher level patterns common to all or nearly all of the protein sequences.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126528597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigation of Latent Semantic Analysis for Clustering of Czech News Articles","authors":"Michal Rott, P. Cerva","doi":"10.1109/DEXA.2014.54","DOIUrl":"https://doi.org/10.1109/DEXA.2014.54","url":null,"abstract":"This paper studies the use of Latent Semantic Analysis (LSA) for automatic clustering of Czech news articles. We show that LSA is capable of yielding good results in this task as it allows us to reduce the problem of synonymy. This is a very important factor particularly for Czech, which belongs to a group of highly inflective and morphologicallyrich languages. The experimental evaluation of our clustering scheme and investigation of LSA is performed on query-and category-based test sets. The obtained results demonstrate that the automatic system yields values of the Rand index that are absolutely lower -- by 20% -- than the accuracy of human cluster annotations. We also show which similarity metric should be used for cluster merging and the effect of dimension reduction on clustering accuracy.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123486154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Cantoni, M. Ferretti, Nicola Pellicanò, Jennifer Vandoni, M. Musci, Nahumi Nugrahaningsih
{"title":"Protein Motif Retrieval by Secondary Structure Element Geometry and Biological Features Saliency","authors":"V. Cantoni, M. Ferretti, Nicola Pellicanò, Jennifer Vandoni, M. Musci, Nahumi Nugrahaningsih","doi":"10.1109/DEXA.2014.22","DOIUrl":"https://doi.org/10.1109/DEXA.2014.22","url":null,"abstract":"This paper presents an approach to detect the presence of a given motif in proteins or in protein data bank (PDB). The approach is based on the secondary structure elements (SSEs) geometrical arrangement in 3D space. A motif is represented as a set of SSEs in their specific positions related to a local reference system (LRS). We propose, exploiting the SSE biological feature saliency in the motif LRS construction stage, a planning strategy to speed-up the motif retrieval process. The experimentation has been carried out on a set of 20 proteins selected from the PDB. In detail we tested five different cases: (i) performances on searching a motif within single proteins, (ii) searching motifs on a set of proteins belonging to the same biological family, (iii) searching into single symmetric proteins, (iv) searching on a set of symmetric proteins from the same family, and finally (v) a general motif retrieval from the entire protein dataset. The experimental results showed good motif recognition performances on each test category, and, by exploiting the basic biological features saliency in motif construction, comparing to a previous approach of SSEs block geometrical retrieval based on the Generalized Hough Transform, it was revealed a significant decrease of the time/space computational complexity. It is worth to point out that the computation time for the case of motif absence is significantly lower than the case of motif present.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116886842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Random Manhattan Indexing","authors":"B. Zadeh, S. Handschuh","doi":"10.1109/DEXA.2014.51","DOIUrl":"https://doi.org/10.1109/DEXA.2014.51","url":null,"abstract":"Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in text processing. In these models, high-dimensional, often sparse vectors represent text units. In an application, the similarity of vectors -- and hence the text units that they represent -- is computed by a distance formula. The high dimensionality of vectors, however, is a barrier to the performance of methods that employ VSMs. Consequently, a dimensionality reduction technique is employed to alleviate this problem. This paper introduces a new method, called Random Manhattan Indexing (RMI), for the construction of L1 normed VSMs at reduced dimensionality. RMI combines the construction of a VSM and dimension reduction into an incremental, and thus scalable, procedure. In order to attain its goal, RMI employs the sparse Cauchy random projections.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114823369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Filter Correlation Method for Feature Selection","authors":"Hanen Hosni, F. Mhamdi","doi":"10.1109/DEXA.2014.28","DOIUrl":"https://doi.org/10.1109/DEXA.2014.28","url":null,"abstract":"Biological data is undergoing exponential growth in volume and complexity. Often, the selection of biological features is a crucial step that aims to defy the curse of dimensionality to improve prediction performance in classification systems, facilitate viewing, understanding and analyzing data. In this paper we present an adaptation of the Fast Correlation Based Filter algorithm (FCBF) whose aims is to identify relevant, not redundant features to improve the capacity of prediction and reduce the search space.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114965478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"M2RML: Multidimensional to RDF Mapping Language","authors":"Saleh Ghasemi, W. Luk, Norah Alrayes","doi":"10.1109/DEXA.2014.61","DOIUrl":"https://doi.org/10.1109/DEXA.2014.61","url":null,"abstract":"This research is about design and implementation of a middleware between an RDF application (as a client) and an OLAP server which manages multidimensional data. At the heart of the middleware is a software layer which accepts a query from the client and returns as the answer an RDF dataset. This software is an implementation of a mapping language from multidimensional data to RDF, or M2RML. The RDF dataset is a 'slice' of the data cube as defined in the W3C's RDF Data Cube Vocabulary. The limitations of the Data Cube Vocabulary are discussed, and options to overcome these limitations are proposed.","PeriodicalId":291899,"journal":{"name":"2014 25th International Workshop on Database and Expert Systems Applications","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116970556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}