{"title":"面向同义词预测的分层多任务词嵌入学习","authors":"Hongliang Fei, Shulong Tan, Ping Li","doi":"10.1145/3292500.3330914","DOIUrl":null,"url":null,"abstract":"Automatic synonym recognition is of great importance for entity-centric text mining and interpretation. Due to the high language use variability in real-life, manual construction of semantic resources to cover all synonyms is prohibitively expensive and may also result in limited coverage. Although there are public knowledge bases, they only have limited coverage for languages other than English. In this paper, we focus on medical domain and propose an automatic way to accelerate the process of medical synonymy resource development for Chinese, including both formal entities from healthcare professionals and noisy descriptions from end-users. Motivated by the success of distributed word representations, we design a multi-task model with hierarchical task relationship to learn more representative entity/term embeddings and apply them to synonym prediction. In our model, we extend the classical skip-gram word embedding model by introducing an auxiliary task \"neighboring word semantic type prediction'' and hierarchically organize them based on the task complexity. Meanwhile, we incorporate existing medical term-term synonymous knowledge into our word embedding learning framework. We demonstrate that the embeddings trained from our proposed multi-task model yield significant improvement for entity semantic relatedness evaluation, neighboring word semantic type prediction and synonym prediction compared with baselines. Furthermore, we create a large medical text corpus in Chinese that includes annotations for entities, descriptions and synonymous pairs for future research in this direction.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Hierarchical Multi-Task Word Embedding Learning for Synonym Prediction\",\"authors\":\"Hongliang Fei, Shulong Tan, Ping Li\",\"doi\":\"10.1145/3292500.3330914\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic synonym recognition is of great importance for entity-centric text mining and interpretation. Due to the high language use variability in real-life, manual construction of semantic resources to cover all synonyms is prohibitively expensive and may also result in limited coverage. Although there are public knowledge bases, they only have limited coverage for languages other than English. In this paper, we focus on medical domain and propose an automatic way to accelerate the process of medical synonymy resource development for Chinese, including both formal entities from healthcare professionals and noisy descriptions from end-users. Motivated by the success of distributed word representations, we design a multi-task model with hierarchical task relationship to learn more representative entity/term embeddings and apply them to synonym prediction. In our model, we extend the classical skip-gram word embedding model by introducing an auxiliary task \\\"neighboring word semantic type prediction'' and hierarchically organize them based on the task complexity. Meanwhile, we incorporate existing medical term-term synonymous knowledge into our word embedding learning framework. We demonstrate that the embeddings trained from our proposed multi-task model yield significant improvement for entity semantic relatedness evaluation, neighboring word semantic type prediction and synonym prediction compared with baselines. Furthermore, we create a large medical text corpus in Chinese that includes annotations for entities, descriptions and synonymous pairs for future research in this direction.\",\"PeriodicalId\":186134,\"journal\":{\"name\":\"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3292500.3330914\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3292500.3330914","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hierarchical Multi-Task Word Embedding Learning for Synonym Prediction
Automatic synonym recognition is of great importance for entity-centric text mining and interpretation. Due to the high language use variability in real-life, manual construction of semantic resources to cover all synonyms is prohibitively expensive and may also result in limited coverage. Although there are public knowledge bases, they only have limited coverage for languages other than English. In this paper, we focus on medical domain and propose an automatic way to accelerate the process of medical synonymy resource development for Chinese, including both formal entities from healthcare professionals and noisy descriptions from end-users. Motivated by the success of distributed word representations, we design a multi-task model with hierarchical task relationship to learn more representative entity/term embeddings and apply them to synonym prediction. In our model, we extend the classical skip-gram word embedding model by introducing an auxiliary task "neighboring word semantic type prediction'' and hierarchically organize them based on the task complexity. Meanwhile, we incorporate existing medical term-term synonymous knowledge into our word embedding learning framework. We demonstrate that the embeddings trained from our proposed multi-task model yield significant improvement for entity semantic relatedness evaluation, neighboring word semantic type prediction and synonym prediction compared with baselines. Furthermore, we create a large medical text corpus in Chinese that includes annotations for entities, descriptions and synonymous pairs for future research in this direction.