{"title":"PubMed摘要的概念关系质量检验","authors":"Rajeswaran Viswanathan, S. Priya","doi":"10.1145/3508230.3508243","DOIUrl":null,"url":null,"abstract":"Conceptnet is a crowd sourced knowledge graph used to find relationship between words and concepts. PubMed is the largest source of documents for the bio-medical domain. From the PubMed abstracts stop words are removed and remaining words are used as seed words. For these seed words “Nearest neighbor” words are identified as candidate words using 3 popular Word Vectors (WV) - Word2Vec, Glove and FastText. Similarity is calculated for these words for each strata of relationship. Bootstrap estimator in Random Effects Model (REM) is used to study this relationship using the similarity scores. Analysis shows that there is heterogeneity among the relationships independent of the WV used as base.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"29 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Examination of the quality of Conceptnet relations for PubMed abstracts\",\"authors\":\"Rajeswaran Viswanathan, S. Priya\",\"doi\":\"10.1145/3508230.3508243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Conceptnet is a crowd sourced knowledge graph used to find relationship between words and concepts. PubMed is the largest source of documents for the bio-medical domain. From the PubMed abstracts stop words are removed and remaining words are used as seed words. For these seed words “Nearest neighbor” words are identified as candidate words using 3 popular Word Vectors (WV) - Word2Vec, Glove and FastText. Similarity is calculated for these words for each strata of relationship. Bootstrap estimator in Random Effects Model (REM) is used to study this relationship using the similarity scores. Analysis shows that there is heterogeneity among the relationships independent of the WV used as base.\",\"PeriodicalId\":252146,\"journal\":{\"name\":\"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval\",\"volume\":\"29 5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3508230.3508243\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508230.3508243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Examination of the quality of Conceptnet relations for PubMed abstracts
Conceptnet is a crowd sourced knowledge graph used to find relationship between words and concepts. PubMed is the largest source of documents for the bio-medical domain. From the PubMed abstracts stop words are removed and remaining words are used as seed words. For these seed words “Nearest neighbor” words are identified as candidate words using 3 popular Word Vectors (WV) - Word2Vec, Glove and FastText. Similarity is calculated for these words for each strata of relationship. Bootstrap estimator in Random Effects Model (REM) is used to study this relationship using the similarity scores. Analysis shows that there is heterogeneity among the relationships independent of the WV used as base.