{"title":"Identification of Key Sentences in the Task of Text Duplicate Detection","authors":"E. Sharapova","doi":"10.1109/WECONF48837.2020.9131465","DOIUrl":null,"url":null,"abstract":"The paper considers the problems of detecting duplicates of large text documents. To reduce the verification time, it is proposed to submit the document being verified with a set of key sentences. Key sentences are selected for parts of the text and are used to search for matches over the Internet. As a criterion for choosing key sentences, the largest sum of the weights of the words included in the sentence is calculated, taking into account the global frequency of words. Studies have shown that the use of key offers can significantly reduce the number of queries to search engines. At the same time, good duplicate text detection results are preserved.","PeriodicalId":303530,"journal":{"name":"2020 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WECONF48837.2020.9131465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The paper considers the problems of detecting duplicates of large text documents. To reduce the verification time, it is proposed to submit the document being verified with a set of key sentences. Key sentences are selected for parts of the text and are used to search for matches over the Internet. As a criterion for choosing key sentences, the largest sum of the weights of the words included in the sentence is calculated, taking into account the global frequency of words. Studies have shown that the use of key offers can significantly reduce the number of queries to search engines. At the same time, good duplicate text detection results are preserved.