{"title":"基于树和词嵌入的句子相似度评价智能辅导系统中的好答案","authors":"Emil Brajković, Daniel Vasić","doi":"10.23919/SOFTCOM.2017.8115592","DOIUrl":null,"url":null,"abstract":"This article presents an approach to examining the similarity of the sentences. In our approach, Euler algorithm was used to generate a series of words based on tree and S⊘rensen-Dice coefficient was applied to determine the similarity between compared trees. The emphasis is on defining the similarity between the correct and incorrect answers from the Yahoo Question and Answer of the Non-Factual Data Set. Proposed algorithm was used on two types of trees. First is the constituency tree generated by Stanford CoreNLP, and second is custom-made algorithm that produces second type of tree, called knowledge tree which is derived from parse tree. In our comparison, Zhuang-Sasha algorithm was also used. Second approach that was used for sentence comparison uses Word2Vec model for finding word embedding's and calculating sentence average vector, after that cosine distance was applied to determine similarity between two sentences. Results generated with this method were compared with our method in finding sentence similarity based on knowledge tree. Approach described in this paper can be used in evaluation of correct answers which will be used in our implementation of Intelligent Tutoring System.","PeriodicalId":189860,"journal":{"name":"2017 25th International Conference on Software, Telecommunications and Computer Networks (SoftCOM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Tree and word embedding based sentence similarity for evaluation of good answers in intelligent tutoring system\",\"authors\":\"Emil Brajković, Daniel Vasić\",\"doi\":\"10.23919/SOFTCOM.2017.8115592\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article presents an approach to examining the similarity of the sentences. In our approach, Euler algorithm was used to generate a series of words based on tree and S⊘rensen-Dice coefficient was applied to determine the similarity between compared trees. The emphasis is on defining the similarity between the correct and incorrect answers from the Yahoo Question and Answer of the Non-Factual Data Set. Proposed algorithm was used on two types of trees. First is the constituency tree generated by Stanford CoreNLP, and second is custom-made algorithm that produces second type of tree, called knowledge tree which is derived from parse tree. In our comparison, Zhuang-Sasha algorithm was also used. Second approach that was used for sentence comparison uses Word2Vec model for finding word embedding's and calculating sentence average vector, after that cosine distance was applied to determine similarity between two sentences. Results generated with this method were compared with our method in finding sentence similarity based on knowledge tree. Approach described in this paper can be used in evaluation of correct answers which will be used in our implementation of Intelligent Tutoring System.\",\"PeriodicalId\":189860,\"journal\":{\"name\":\"2017 25th International Conference on Software, Telecommunications and Computer Networks (SoftCOM)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 25th International Conference on Software, Telecommunications and Computer Networks (SoftCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/SOFTCOM.2017.8115592\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 25th International Conference on Software, Telecommunications and Computer Networks (SoftCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/SOFTCOM.2017.8115592","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Tree and word embedding based sentence similarity for evaluation of good answers in intelligent tutoring system
This article presents an approach to examining the similarity of the sentences. In our approach, Euler algorithm was used to generate a series of words based on tree and S⊘rensen-Dice coefficient was applied to determine the similarity between compared trees. The emphasis is on defining the similarity between the correct and incorrect answers from the Yahoo Question and Answer of the Non-Factual Data Set. Proposed algorithm was used on two types of trees. First is the constituency tree generated by Stanford CoreNLP, and second is custom-made algorithm that produces second type of tree, called knowledge tree which is derived from parse tree. In our comparison, Zhuang-Sasha algorithm was also used. Second approach that was used for sentence comparison uses Word2Vec model for finding word embedding's and calculating sentence average vector, after that cosine distance was applied to determine similarity between two sentences. Results generated with this method were compared with our method in finding sentence similarity based on knowledge tree. Approach described in this paper can be used in evaluation of correct answers which will be used in our implementation of Intelligent Tutoring System.