{"title":"利用词嵌入从书目数据集扩展科技词典","authors":"Takahiro Kawamura, Kouji Kozaki, Tatsuya Kushida, Katsutaro Watanabe, Katsuji Matsumura","doi":"10.1109/ICTAI.2016.0133","DOIUrl":null,"url":null,"abstract":"The use of thesauri and taxonomies for science and technology information in scientometrics has been attracting attention. However, manual construction and maintenance of thesauri is expensive and requires significant time, thus, methods for semi-automatic construction and maintenance are being actively studied. We propose a method to expand an existing thesaurus using the abstracts of articles from state-of-the-art technological domains with limited structured information. Specifically, we consider a method for properly allocating new terms to the hierarchical structures of an existing thesaurus using rapidly evolving word embedding. In an experiment, word vectors of 500 degrees are constructed from 567,000 biomedical articles and are clustered after dimension reduction using principal component analysis. Then, semantic relations are estimated based on the spatial relations between the new term and any of the terms in the thesaurus. We then conducted a comparison of the results obtained from three experts. In future, we will develop a recommendation system for new terms related to the existing terms to support semi-automatic thesaurus maintenance.","PeriodicalId":245697,"journal":{"name":"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Expanding Science and Technology Thesauri from Bibliographic Datasets Using Word Embedding\",\"authors\":\"Takahiro Kawamura, Kouji Kozaki, Tatsuya Kushida, Katsutaro Watanabe, Katsuji Matsumura\",\"doi\":\"10.1109/ICTAI.2016.0133\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of thesauri and taxonomies for science and technology information in scientometrics has been attracting attention. However, manual construction and maintenance of thesauri is expensive and requires significant time, thus, methods for semi-automatic construction and maintenance are being actively studied. We propose a method to expand an existing thesaurus using the abstracts of articles from state-of-the-art technological domains with limited structured information. Specifically, we consider a method for properly allocating new terms to the hierarchical structures of an existing thesaurus using rapidly evolving word embedding. In an experiment, word vectors of 500 degrees are constructed from 567,000 biomedical articles and are clustered after dimension reduction using principal component analysis. Then, semantic relations are estimated based on the spatial relations between the new term and any of the terms in the thesaurus. We then conducted a comparison of the results obtained from three experts. In future, we will develop a recommendation system for new terms related to the existing terms to support semi-automatic thesaurus maintenance.\",\"PeriodicalId\":245697,\"journal\":{\"name\":\"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTAI.2016.0133\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2016.0133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Expanding Science and Technology Thesauri from Bibliographic Datasets Using Word Embedding
The use of thesauri and taxonomies for science and technology information in scientometrics has been attracting attention. However, manual construction and maintenance of thesauri is expensive and requires significant time, thus, methods for semi-automatic construction and maintenance are being actively studied. We propose a method to expand an existing thesaurus using the abstracts of articles from state-of-the-art technological domains with limited structured information. Specifically, we consider a method for properly allocating new terms to the hierarchical structures of an existing thesaurus using rapidly evolving word embedding. In an experiment, word vectors of 500 degrees are constructed from 567,000 biomedical articles and are clustered after dimension reduction using principal component analysis. Then, semantic relations are estimated based on the spatial relations between the new term and any of the terms in the thesaurus. We then conducted a comparison of the results obtained from three experts. In future, we will develop a recommendation system for new terms related to the existing terms to support semi-automatic thesaurus maintenance.