{"title":"Knowledge extraction through etymological networks: Synonym discovery in Sino-Korean words","authors":"E. Pablo, Kyomin Jung","doi":"10.1109/ICKEA.2016.7803019","DOIUrl":null,"url":null,"abstract":"Extracting knowledge from a text is a very active area of research. Techniques such as word embedding and LSA have brought great breakthroughs and have been used in applications such as automatic translation. We propose a novel approach to extract knowledge from text that relies on a graph to express the complex etymological structures formed by the historical roots of words. Our approach is specially fit for the study of Sino-Korean vo- cabulary, where the etymological roots of words are clearly shown in their writing. We use our approach to build a bipartite graph based on the Chinese etymological roots of Sino-Korean words, and then use the network structure to extract features describing pairs of nodes. We used these features in a classification scheme to discover pairs of nodes that represent synonym characters. Our model is simpler than previous work on synonym discovery with Chinese characters, and obtains good results. The code and data for our work are made openly available.","PeriodicalId":241850,"journal":{"name":"2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA)","volume":"2004 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Knowledge Engineering and Applications (ICKEA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKEA.2016.7803019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Extracting knowledge from a text is a very active area of research. Techniques such as word embedding and LSA have brought great breakthroughs and have been used in applications such as automatic translation. We propose a novel approach to extract knowledge from text that relies on a graph to express the complex etymological structures formed by the historical roots of words. Our approach is specially fit for the study of Sino-Korean vo- cabulary, where the etymological roots of words are clearly shown in their writing. We use our approach to build a bipartite graph based on the Chinese etymological roots of Sino-Korean words, and then use the network structure to extract features describing pairs of nodes. We used these features in a classification scheme to discover pairs of nodes that represent synonym characters. Our model is simpler than previous work on synonym discovery with Chinese characters, and obtains good results. The code and data for our work are made openly available.