{"title":"A Domain-Independent Hybrid Approach for Automatic Taxonomy Induction","authors":"Bushra Zafar, Usman Qamar, Ayesha Imran","doi":"10.1109/PDCAT.2016.085","DOIUrl":null,"url":null,"abstract":"Semantic taxonomies are the flexible way to organize, navigate and retrieve information effectively. Natural Language Processing and Artificial Intelligence tasks are heavily relied on these taxonomies. This paper presents a taxonomy induction system that integrates two modules: word-embedding and string inclusion. We implement a simple, semi-supervised and domain independent system based on Taxonomy Extraction Evaluation (TExEval2) Task, SemEval 2016. The task is divided into two steps, first is to identify hyponym-hypernym relations and then to construct a taxonomy from a domain specific terms lists. The system is trained over large general corpus. The system learns vectors for phrases and utilizes word vectors with phrases such as \"known as\", etc. to generate possible hypernyms and construct taxonomy. Three different domains, i.e. environment, food and science are considered for taxonomy induction. The constructed taxonomies are evaluated against gold standard taxonomies. The proposed system achieved significant results for hyponym-hypernym identification and taxonomy induction.","PeriodicalId":203925,"journal":{"name":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2016.085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Semantic taxonomies are the flexible way to organize, navigate and retrieve information effectively. Natural Language Processing and Artificial Intelligence tasks are heavily relied on these taxonomies. This paper presents a taxonomy induction system that integrates two modules: word-embedding and string inclusion. We implement a simple, semi-supervised and domain independent system based on Taxonomy Extraction Evaluation (TExEval2) Task, SemEval 2016. The task is divided into two steps, first is to identify hyponym-hypernym relations and then to construct a taxonomy from a domain specific terms lists. The system is trained over large general corpus. The system learns vectors for phrases and utilizes word vectors with phrases such as "known as", etc. to generate possible hypernyms and construct taxonomy. Three different domains, i.e. environment, food and science are considered for taxonomy induction. The constructed taxonomies are evaluated against gold standard taxonomies. The proposed system achieved significant results for hyponym-hypernym identification and taxonomy induction.