{"title":"Unsupervised Terminological Ontology Learning Based on Hierarchical Topic Modeling","authors":"Xiaofeng Zhu, D. Klabjan, Patrick N. Bless","doi":"10.1109/IRI.2017.18","DOIUrl":"https://doi.org/10.1109/IRI.2017.18","url":null,"abstract":"In this paper, we present hierarchical relationbased latent Dirichlet allocation (hrLDA), a data-driven hierarchical topic model for extracting terminological ontologies from a large number of heterogeneous documents. In contrast to traditional topic models, hrLDA relies on noun phrases instead of unigrams, considers syntax and document structures, and enriches topic hierarchies with topic relations. Through a series of experiments, we demonstrate the superiority of hrLDA over existing topic models, especially for building hierarchies. Furthermore, we illustrate the robustness of hrLDA in the settings of noisy data sets, which are likely to occur in many practical scenarios. Our ontology evaluation results show that ontologies extracted from hrLDA are very competitive with the ontologies created by domain experts.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127906184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GFEL: Generalized Feature Embedding Learning Using Weighted Instance Matching","authors":"Eric Golinko, Xingquan Zhu","doi":"10.1109/IRI.2017.21","DOIUrl":"https://doi.org/10.1109/IRI.2017.21","url":null,"abstract":"Feature embedding is an emerging research area which intends to transform features from the original space into a new space to support effective learning. Many feature embedding algorithms exist, but they are often designed to handle a single type of feature, or users have to clearly separate features into different feature views and supply such information for feature embedding learning. In this paper, we propose a generalized feature embedding learning algorithm, GFEL, which learns feature embedding from any type of data or data with mixed feature types. GFEL is an eigendecomposition based approach, which calculates weighted instance matching in the original feature space, and then uses an eigenvector decomposition to convert the proximity matrix into a low-dimensional space. The learned numerical embedding features, which blend the original features, can be directly used to represent instances for effective learning. Our experiments and comparisons on 28 datasets, including categorical, numerical, and ordinal features, demonstrate that embedding features learned from GFEL can effectively represent the original instances for clustering and classification tasks.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"456 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124324759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Addressing Forest Management Challenges by Refining Tree Cover Type Classification with Machine Learning Models","authors":"Duncan Macmichael, Dong Si","doi":"10.1109/IRI.2017.89","DOIUrl":"https://doi.org/10.1109/IRI.2017.89","url":null,"abstract":"The goals of this paper were twofold: to continue and refine previous research in the topic of tree cover type classification by harnessing modern machine learning models, and to extend the conclusions of that work to demonstrate that results gained from such models can be used to assist U.S. land management agencies in current challenges they face. Using the same dataset as the past study, an artificial neural network was constructed and compared with three baseline traditional machine learning models: Naïve Bayes, Decision Tree, and K-Nearest Neighbor. The artificial neural network achieved 97.01% ac-curacy while the best-performing traditional classifier, K-Nearest Neighbor, managed 74.61%. This mirrored the earlier results, but with higher overall accuracy on both counts. Specifically, the neural network performed 26.43% better than before, showing not only advances in machine learning algorithms over the past 18 years, but also that accuracy is now high enough to apply practically to land management issues where natural resource inventory is time-consuming and expensive.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"444 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123381887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Documentation Reuse: Managing Similar Documents","authors":"S. Jarzabek, D. Dan","doi":"10.1109/IRI.2017.52","DOIUrl":"https://doi.org/10.1109/IRI.2017.52","url":null,"abstract":"Many engineering and business domains involve management of similar, but also different documents. Examples are user guides and other manuals for different versions of a product, contracts between vendors and clients, and legal documents. The usual practice is to capture similarities in templates that must be copied and manually customized to a new context – often a slow, tedious, and error-prone process. Our Document Management Environment (DME) exploits similarities among documents to simplify and automate routine tasks involved in creating and updating documents. Built as MS Word add-in, DME can represent any group of recurring fragments as a template that is reused (after adaptations) to create custom document. DME concept is based on proven method for adaptive reuse of software.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114606416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saad Sadiq, Yilin Yan, Asia Taylor, M. Shyu, Shu‐Ching Chen, D. Feaster
{"title":"AAFA: Associative Affinity Factor Analysis for Bot Detection and Stance Classification in Twitter","authors":"Saad Sadiq, Yilin Yan, Asia Taylor, M. Shyu, Shu‐Ching Chen, D. Feaster","doi":"10.1109/IRI.2017.25","DOIUrl":"https://doi.org/10.1109/IRI.2017.25","url":null,"abstract":"The rise in popularity of social interacting websites such as Facebook, Twitter, and Snapchat has been challenged by the upsurge of unwelcomed and troubling bodies on these systems. This includes spam senders, malware systems, and other content contaminators. It is noted that highly automated accounts with 450 tweets per day produced almost 18% of entire Twitter circulation in the 2016 U.S. Presidential election. It is also observed that those disruptive systems called bots are inclined more towards circulating negative news than positive information. This paper introduces a novel framework named Associative Affinity Factor Analysis (AAFA) designed for stance detection and bot identification. Using AAFA, the proposed framework identifies real people from bots and detects the stance in bipolar affinities. The 2016 U.S. Presidential election campaign was used as a test use case because of its significant and unique counter-factual properties. The results show that our proposed AAFA framework achieves high accuracy when compared to several existing state-of-theart methods.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122168439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modernizing Analytics for Melanoma with a Large-Scale Research Dataset","authors":"Aaron N. Richter, T. Khoshgoftaar","doi":"10.1109/IRI.2017.45","DOIUrl":"https://doi.org/10.1109/IRI.2017.45","url":null,"abstract":"We present the Modernizing Analytics for MELanoma (MAMEL) dataset: a real-world, dermatologyspecific research dataset specifically crafted to advance data mining and machine learning research in the field of melanoma diagnosis, analysis, and treatment. This dataset was collected and curated from Modernizing Medicine’s EMA DermatologyTM application, a cloud-based Electronic Health Record (EHR) platform. A big data processing architecture, built on Apache Hadoop and Apache Spark, was used to collect all patient data, identify patients for the MAMEL dataset, and create and document all data elements. This paper outlines the application and data processing architectures, provides an exploratory analysis of data elements available in MAMEL, and discusses avenues for using this dataset in clinical decision support applications for melanoma.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114992096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Entropy in Design Phase: A Higraph-Based Model Approach","authors":"H. Aboutaleb, B. Monsuez","doi":"10.1109/IRI.2017.51","DOIUrl":"https://doi.org/10.1109/IRI.2017.51","url":null,"abstract":"The exponential growing effort, cost and time investment of complex systems in modeling phase emphasize the need for a methodology, a framework and a environment to handle the system model complexity. For that, it is necessary to be able to measure the system model entropy. This paper highlights the requirements a model needs to fulfill to match human user expectations. It suggests a hierarchical graph-based formalism for modeling complex systems and presents transformations to handle the underlying complexity. Finally, a way to measure system model structural complexity based on Shannon theory of information is proposed.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115723023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua D. Eisenberg, Deya Banisakher, Maria E. Presa-Reyes, Kalli Unthank, Mark A. Finlayson, René Price, Shu‐Ching Chen
{"title":"Toward Semantic Search for the Biogeochemical Literature","authors":"Joshua D. Eisenberg, Deya Banisakher, Maria E. Presa-Reyes, Kalli Unthank, Mark A. Finlayson, René Price, Shu‐Ching Chen","doi":"10.1109/IRI.2017.49","DOIUrl":"https://doi.org/10.1109/IRI.2017.49","url":null,"abstract":"Literature search is a vital step of every research project. Semantic literature search is an approach to article retrieval and ranking using concepts rather than keywords, in an attempt to address the well-known deficiencies of keyword-based search, namely, (1) retrieval of an overwhelming number of results, (2) rankings that do not precisely reflect true relevance, and (3) the omission of relevant results because they do not contain the idiosyncratic keywords of the query. The difficulty of semantic search, however, is that it requires significant knowledge engineering, often in the form of conceptual ontologies tailored to a particular scientific domain. It also requires non-trivial tuning, in the form of domain-specific term and concepts weights. Here we present preliminary, work-in-progress results in the development of a semantic search system for the biogeochemical scientific literature. We report the following initial steps: first, one of the co-authors—a biogeochemistry expert—wrote a sample search query, and ranked the five most relevant articles that were returned for that query from a popular keyword-based search engine. We then hand annotated the five articles and the query with the Environmental Ontology (ENVO), an existing ontology for the domain. Critically, this pilot annotation revealed a number of missing concepts that we will add in future work. We then showed that a straightforward ontology distance metric between concepts in the search query and the five articles was sufficient to produce the expected ranking. We discuss the implications of these results, and outline next steps required produce a full-fledged semantic search system for the biogeochemistry scientific literature.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122582065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning for Sentiment Analysis on Google Play Consumer Review","authors":"Min-Yuh Day, Yue-Da Lin","doi":"10.1109/IRI.2017.79","DOIUrl":"https://doi.org/10.1109/IRI.2017.79","url":null,"abstract":"In recent years, there has been an increasing interest in sentiment analysis on consumer reviews to understand the opinion polarity on social media. However, little attention has been paid to the development of deep learning for sentiment analysis on consumer reviews in Chinese. The research objective of this paper is to explore the impact of deep learning for sentiment analysis on Google Play consumer reviews in Chinese. A web mining technique was implemented for collecting 196,651 reviews on Google Play. We used Long Short Term Memory (LSTM) deep learning model, Naïve Bayes (NB), and support vector machine (SVM) approaches for sentiment analysis on consumer reviews and compared the experimental results. The experimental results suggest that the accuracy of deep learning for sentiment analysis on Google Play consumer review achieves 94% and deep learning approach outperforms Naïve Bayes (74.12%) and Support Vector Machine (76.46%) in the present study. Our finding confirmed that sentiment analysis on Google Play consumer review with deep learning is outstanding. The contributions of this paper are three-fold. First, the present study confirmed sentiment analysis with deep learning on Google Play consumer review may improve the accuracy of prediction. Second, we create a sentiment dictionary named iSGoPaSD for Google Play review. Third, the study compared the result of average sampling data and non-average sampling data. We found that deep learning method with non-average sampling data reached the better performance.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"276 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122690445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haiman Tian, Shu‐Ching Chen, S. Rubin, William K. Grefe
{"title":"FA-MCADF: Feature Affinity Based Multiple Correspondence Analysis and Decision Fusion Framework for Disaster Information Management","authors":"Haiman Tian, Shu‐Ching Chen, S. Rubin, William K. Grefe","doi":"10.1109/IRI.2017.20","DOIUrl":"https://doi.org/10.1109/IRI.2017.20","url":null,"abstract":"Multimedia semantic concept detection is one of the major research topics in multimedia data analysis in recent years. Disaster information management needs the assistance of multimedia data analysis to better utilize those disasterrelated information, which has been widely shared by people through the Internet. In this paper, a Feature Affinity based Multiple Correspondence Analysis and Decision Fusion (FAMCADF) framework is proposed to extract useful semantics from a disaster dataset. By utilizing the selected features and their affinities/ranks in each of the feature groups, the proposed framework is able to improve the concept detection results. Moreover, the decision fusion scheme further improves the accuracy performance. The experimental results demonstrate the effectiveness of the proposed framework and prove that the fusion of the decisions of the basic classifiers could make the framework outperform several existing approaches in the comparison.","PeriodicalId":254330,"journal":{"name":"2017 IEEE International Conference on Information Reuse and Integration (IRI)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123370016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}