{"title":"中文文档的中文语义知识表示与重叠度量","authors":"Xu Li, Xiaoqiang Yu, C. Yao, Xiuyan Zhao","doi":"10.1109/ICICIP.2012.6391442","DOIUrl":null,"url":null,"abstract":"Document copy detection is to judge whether a given query document plagiarizes content of other ones in the database, which plagiarism occurs in some ways, such as by duplicating partial or total document content, by using different words or sentences to express the same meanings of the text of previous documents. Matching hashed chunks is relatively simple and suffices for reliably detecting exact overlaps. However, detecting paraphrase overlap is subtle. To address the problem, a frame-based Chinese semantic knowledge representation and an overlap measure method for Chinese documents are proposed. The experimental results show that the method can identify the complicated plagiarism patterns, such as single-word synonym, voice changes, part of speech changes and breaking long sentence.","PeriodicalId":376265,"journal":{"name":"2012 Third International Conference on Intelligent Control and Information Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Chinese semantic knowledge representation and overlap measure for Chinese documents\",\"authors\":\"Xu Li, Xiaoqiang Yu, C. Yao, Xiuyan Zhao\",\"doi\":\"10.1109/ICICIP.2012.6391442\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Document copy detection is to judge whether a given query document plagiarizes content of other ones in the database, which plagiarism occurs in some ways, such as by duplicating partial or total document content, by using different words or sentences to express the same meanings of the text of previous documents. Matching hashed chunks is relatively simple and suffices for reliably detecting exact overlaps. However, detecting paraphrase overlap is subtle. To address the problem, a frame-based Chinese semantic knowledge representation and an overlap measure method for Chinese documents are proposed. The experimental results show that the method can identify the complicated plagiarism patterns, such as single-word synonym, voice changes, part of speech changes and breaking long sentence.\",\"PeriodicalId\":376265,\"journal\":{\"name\":\"2012 Third International Conference on Intelligent Control and Information Processing\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Third International Conference on Intelligent Control and Information Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICIP.2012.6391442\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Third International Conference on Intelligent Control and Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIP.2012.6391442","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Chinese semantic knowledge representation and overlap measure for Chinese documents
Document copy detection is to judge whether a given query document plagiarizes content of other ones in the database, which plagiarism occurs in some ways, such as by duplicating partial or total document content, by using different words or sentences to express the same meanings of the text of previous documents. Matching hashed chunks is relatively simple and suffices for reliably detecting exact overlaps. However, detecting paraphrase overlap is subtle. To address the problem, a frame-based Chinese semantic knowledge representation and an overlap measure method for Chinese documents are proposed. The experimental results show that the method can identify the complicated plagiarism patterns, such as single-word synonym, voice changes, part of speech changes and breaking long sentence.