{"title":"识别文本文档语义相似度的主要统计方法","authors":"P. Vigneshvaran, E. Jayabalan, K. Vijaya","doi":"10.1109/ICPRIME.2013.6496721","DOIUrl":null,"url":null,"abstract":"Semantic similarity is the processes of identifying similar words. It relates to computing the similarity between documents which are not lexicographically similar. This paper proposed an empirical method to estimate the semantic similarity using HBase. Specifically this paper defines various word co-occurrence in the document measured and its synonyms are also identified using WordNet. By using the statistical approaches such as MSE and MSD, similarity has been measured. This research focuses on evaluating the similarity between the key document and source documents in the document corpus. In this paper, the developed predominant tool using statistical approach has been tested by checking the similarity of the assignments submitted by the students to check the integrity of a student. This tool may also be used to identify Plagiarism of documents and to eliminate duplicates in a text repository.","PeriodicalId":123210,"journal":{"name":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A predominant statistical approach to identify semantic similarity of textual documents\",\"authors\":\"P. Vigneshvaran, E. Jayabalan, K. Vijaya\",\"doi\":\"10.1109/ICPRIME.2013.6496721\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semantic similarity is the processes of identifying similar words. It relates to computing the similarity between documents which are not lexicographically similar. This paper proposed an empirical method to estimate the semantic similarity using HBase. Specifically this paper defines various word co-occurrence in the document measured and its synonyms are also identified using WordNet. By using the statistical approaches such as MSE and MSD, similarity has been measured. This research focuses on evaluating the similarity between the key document and source documents in the document corpus. In this paper, the developed predominant tool using statistical approach has been tested by checking the similarity of the assignments submitted by the students to check the integrity of a student. This tool may also be used to identify Plagiarism of documents and to eliminate duplicates in a text repository.\",\"PeriodicalId\":123210,\"journal\":{\"name\":\"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPRIME.2013.6496721\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPRIME.2013.6496721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A predominant statistical approach to identify semantic similarity of textual documents
Semantic similarity is the processes of identifying similar words. It relates to computing the similarity between documents which are not lexicographically similar. This paper proposed an empirical method to estimate the semantic similarity using HBase. Specifically this paper defines various word co-occurrence in the document measured and its synonyms are also identified using WordNet. By using the statistical approaches such as MSE and MSD, similarity has been measured. This research focuses on evaluating the similarity between the key document and source documents in the document corpus. In this paper, the developed predominant tool using statistical approach has been tested by checking the similarity of the assignments submitted by the students to check the integrity of a student. This tool may also be used to identify Plagiarism of documents and to eliminate duplicates in a text repository.