{"title":"Identifying text reuse using word net-based extended named entity recognition","authors":"Eunji Lee, Pankoo Kim","doi":"10.1145/3264746.3264811","DOIUrl":null,"url":null,"abstract":"Text reuse is an unethical practice that has become prominent in information content digitization owing to the spread of the internet and smartphones. One challenge with text reuse is that it can be difficult to detect if there are changes in the word order and words are inserted, deleted, or replaced. To resolve the issue of words being excluded from similarity measurement targets when they are replaced with words having a similar meaning, this paper proposes a method of measuring similarity in which named entity recognition is performed on the words appearing in the target document and named entity tags are annotated to them. However, typical named entity recognition only targets proper nouns, so when common nouns are replaced with similar words, they are not classified as named entities belonging to the same class. To resolve this problem, we have expanded the range of WordNetbased named entity recognition.","PeriodicalId":186790,"journal":{"name":"Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems","volume":"34 10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3264746.3264811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Text reuse is an unethical practice that has become prominent in information content digitization owing to the spread of the internet and smartphones. One challenge with text reuse is that it can be difficult to detect if there are changes in the word order and words are inserted, deleted, or replaced. To resolve the issue of words being excluded from similarity measurement targets when they are replaced with words having a similar meaning, this paper proposes a method of measuring similarity in which named entity recognition is performed on the words appearing in the target document and named entity tags are annotated to them. However, typical named entity recognition only targets proper nouns, so when common nouns are replaced with similar words, they are not classified as named entities belonging to the same class. To resolve this problem, we have expanded the range of WordNetbased named entity recognition.