{"title":"基于连通子图的实体事件重复数据删除方法","authors":"Wei Ai, Jia Xu, Hongen Shao, Ze Wang, Tao Meng","doi":"10.1109/icsai53574.2021.9664040","DOIUrl":null,"url":null,"abstract":"The news data from different sources has a high degree of repetition. How to accurately get the most timely news is the primary task of text processing and analysis. Therefore, we propose an entity event deduplication method based on a connected subgraph to address the high repetition rate of multisource events. In this paper, according to the correlation of events and development factors of events, we extract relevant features and improve the accuracy of the deduplication method. The final experimental evaluation shows that the proposed method has higher accuracy than the current deduplication methods, and can significantly improve the repeat detection rate of the entity event within the effective time.","PeriodicalId":131284,"journal":{"name":"2021 7th International Conference on Systems and Informatics (ICSAI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Entity Event Deduplication Method Based on Connected Subgraph\",\"authors\":\"Wei Ai, Jia Xu, Hongen Shao, Ze Wang, Tao Meng\",\"doi\":\"10.1109/icsai53574.2021.9664040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The news data from different sources has a high degree of repetition. How to accurately get the most timely news is the primary task of text processing and analysis. Therefore, we propose an entity event deduplication method based on a connected subgraph to address the high repetition rate of multisource events. In this paper, according to the correlation of events and development factors of events, we extract relevant features and improve the accuracy of the deduplication method. The final experimental evaluation shows that the proposed method has higher accuracy than the current deduplication methods, and can significantly improve the repeat detection rate of the entity event within the effective time.\",\"PeriodicalId\":131284,\"journal\":{\"name\":\"2021 7th International Conference on Systems and Informatics (ICSAI)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 7th International Conference on Systems and Informatics (ICSAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/icsai53574.2021.9664040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icsai53574.2021.9664040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Entity Event Deduplication Method Based on Connected Subgraph
The news data from different sources has a high degree of repetition. How to accurately get the most timely news is the primary task of text processing and analysis. Therefore, we propose an entity event deduplication method based on a connected subgraph to address the high repetition rate of multisource events. In this paper, according to the correlation of events and development factors of events, we extract relevant features and improve the accuracy of the deduplication method. The final experimental evaluation shows that the proposed method has higher accuracy than the current deduplication methods, and can significantly improve the repeat detection rate of the entity event within the effective time.