2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)最新文献_第2页

BioHCDP: A Hybrid Constituency-Dependency Parser for Biological NLP information extraction 生物NLP信息提取的混合选区依赖解析器

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008151

K. Taha, M. Alzaabi

{"title":"BioHCDP: A Hybrid Constituency-Dependency Parser for Biological NLP information extraction","authors":"K. Taha, M. Alzaabi","doi":"10.1109/CIDM.2014.7008151","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008151","url":null,"abstract":"One of the key goals of biological Natural Language Processing (NLP) is the automatic information extraction from biomedical publications. Most current constituency and dependency parsers overlook the semantic relationships between the constituents comprising a sentence and may not be well suited for capturing complex long-distance dependencies. We propose in this paper a hybrid constituency-dependency parser for biological NLP information extraction called BioHCDP. BioHCDP aims at enhancing the state of the art of biological text mining by applying novel linguistic computational techniques that overcome the limitations of current constituency and dependency parsers outlined above, as follows: (1) it determines the semantic relationship between each pair of constituents in a sentence using novel semantic rules, and (2) it applies semantic relationship extraction models that represent the relationships of different patterns of usage in different contexts. BioHCDP can be used to extract various classes of data from biological texts, including protein function assignments, genetic networks, and protein-protein interactions. We compared BioHCDP experimentally with three systems. Results showed marked improvement.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"294 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132703739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Detecting and profiling sedentary young men using machine learning algorithms 使用机器学习算法检测和分析久坐不动的年轻人

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008681

Pekka Siirtola, Riitta Pyky, Riikka Ahola, Heli Koskimäki, T. Jämsä, R. Korpelainen, J. Röning

{"title":"Detecting and profiling sedentary young men using machine learning algorithms","authors":"Pekka Siirtola, Riitta Pyky, Riikka Ahola, Heli Koskimäki, T. Jämsä, R. Korpelainen, J. Röning","doi":"10.1109/CIDM.2014.7008681","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008681","url":null,"abstract":"Many governments and institutions have guidelines for health-enhancing physical activity. Additionally, according to recent studies, the amount of time spent on sitting is a highly important determinant of health and wellbeing. In fact, sedentary lifestyle can lead to many diseases and, what is more, it is even found to be associated with increased mortality. In this study, a data set consisting of self-reported questionnaire, medical diagnoses and fitness tests was studied to detect sedentary young men from a large population and to create a profile of a sedentary person. The data set was collected from 595 young men and contained altogether 678 features. Most of these are answers to multi-choice close-ended questions. More precisely, features were mostly integers with a scale from 1 to 5 or from 1 to 2, and therefore, there was only a little variability in the values of features. In order to detect and profile a sedentary young man, machine learning algorithms were applied to the data set. The performance of five algorithms is compared (quadratic discriminant analysis (QDA), linear discriminant analysis (LDA), C4.5, random forests, and nearest neighbours (kNN)) to find the most accurate algorithm. The results of this study show that when the aim is to detect a sedentary person based on medical records and fitness tests, LDA performs better than the other algorithms, but still the accuracy is not high. In the second part of the study the differences between highly sedentary and non-sedentary young men are searched, recognition can be obtained with high accuracy with each algorithm.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125165295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Takagi-Sugeno-Kang type collaborative fuzzy rule based system Takagi-Sugeno-Kang型协同模糊规则系统

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008684

Kuang-Pen Chou, M. Prasad, Yang-Yin Lin, Sudhanshu Joshi, Chin-Teng Lin, J. Chang

引用次数: 10

High-SNR model order selection using exponentially embedded family and its applications to curve fitting and clustering 基于指数嵌入族的高信噪比模型阶次选择及其在曲线拟合和聚类中的应用

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008708

Quan Ding, S. Kay, Xiaorong Zhang

引用次数: 2

Weighted feature-based classification of time series data 基于加权特征的时间序列数据分类

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008671

Penugonda Ravikumar, V. Devi

引用次数: 6

Learning energy consumption profiles from data 从数据中学习能源消耗概况

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008704

J. Andreoli

引用次数: 2

Interpolation and extrapolation: Comparison of definitions and survey of algorithms for convex and concave hulls 内插和外推:凸壳和凹壳的定义比较和算法综述

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008683

Tobias Ebert, Julian Belz, O. Nelles

{"title":"Interpolation and extrapolation: Comparison of definitions and survey of algorithms for convex and concave hulls","authors":"Tobias Ebert, Julian Belz, O. Nelles","doi":"10.1109/CIDM.2014.7008683","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008683","url":null,"abstract":"Any data based method is vulnerable to the problem of extrapolation, nonetheless there exists no unified theory on handling it. The main topic of this publication is to point out the differences in definitions of extrapolation and related methods. There are many different interpretations of extrapolation and a multitude of methods and algorithms, which address the problem of extrapolation detection in different fields of study. We examine popular definitions of extrapolation, compare them to each other and list related literature and methods. It becomes apparent, that the opinions what extrapolation is and how to handle it, differ greatly from each other. We categorize existing literature and give guidelines to choose an appropriate definition of extrapolation for a present problem. We also present hull algorithms, from classic approaches to recent advances. The presented guidelines and categorized literature enables the reader to categorize a present problem, inspect relevant literature and apply suitable methods and algorithms to solve a problem, which is affected by extrapolation.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114257720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Massively parallelized support vector machines based on GPU-accelerated multiplicative updates 基于gpu加速乘法更新的大规模并行化支持向量机

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008700

C. Kou, Chao-Hui Huang

引用次数: 0

Semi-supervised source extraction methodology for the nosological imaging of glioblastoma response to therapy 半监督源提取方法用于胶质母细胞瘤对治疗反应的病理成像

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008653

S. Ortega-Martorell, I. Olier, T. Delgado-Goñi, M. Ciezka, M. Julià-Sapé, P. Lisboa, C. Arús

引用次数: 3

Accurate and interpretable regression trees using oracle coaching 准确的和可解释的回归树使用oracle教练

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI: 10.1109/CIDM.2014.7008667

U. Johansson, Cecilia Sönströd, Rikard König

{"title":"Accurate and interpretable regression trees using oracle coaching","authors":"U. Johansson, Cecilia Sönströd, Rikard König","doi":"10.1109/CIDM.2014.7008667","DOIUrl":"https://doi.org/10.1109/CIDM.2014.7008667","url":null,"abstract":"In many real-world scenarios, predictive models need to be interpretable, thus ruling out many machine learning techniques known to produce very accurate models, e.g., neural networks, support vector machines and all ensemble schemes. Most often, tree models or rule sets are used instead, typically resulting in significantly lower predictive performance. The overall purpose of oracle coaching is to reduce this accuracy vs. comprehensibility trade-off by producing interpretable models optimized for the specific production set at hand. The method requires production set inputs to be present when generating the predictive model, a demand fulfilled in most, but not all, predictive modeling scenarios. In oracle coaching, a highly accurate, but opaque, model is first induced from the training data. This model (“the oracle”) is then used to label both the training instances and the production instances. Finally, interpretable models are trained using different combinations of the resulting data sets. In this paper, the oracle coaching produces regression trees, using neural networks and random forests as oracles. The experiments, using 32 publicly available data sets, show that the oracle coaching leads to significantly improved predictive performance, compared to standard induction. In addition, it is also shown that a highly accurate opaque model can be successfully used as a pre-processing step to reduce the noise typically present in data, even in situations where production inputs are not available. In fact, just augmenting or replacing training data with another copy of the training set, but with the predictions from the opaque model as targets, produced significantly more accurate and/or more compact regression trees.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127004927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6