{"title":"基于内容的集距聚类P2P搜索模型","authors":"Jing Wang, Shoubao Yang","doi":"10.1109/WI-IATW.2006.53","DOIUrl":null,"url":null,"abstract":"The main issues that affect query efficiency and search cost in content-based unstructured P2P search system are the complexity of computing the similarity of the documents brought by high dimensions and the great deal of redundant messages coming with flooding. This paper defines the documents similarity by the way of set distance. This method restrains the complexity of computing the document similarity in linear time. Also, this paper clusters the peers based on content by their set distance to reduce the query time and redundant messages. Simulations show that the content-based search model constructed by set distance not only has higher recall, but also reduce the search cost and query time to the rate of 40% and 30% of Gnutella","PeriodicalId":358971,"journal":{"name":"2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Content-Based Clustered P2P Search Model Depending on Set Distance\",\"authors\":\"Jing Wang, Shoubao Yang\",\"doi\":\"10.1109/WI-IATW.2006.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main issues that affect query efficiency and search cost in content-based unstructured P2P search system are the complexity of computing the similarity of the documents brought by high dimensions and the great deal of redundant messages coming with flooding. This paper defines the documents similarity by the way of set distance. This method restrains the complexity of computing the document similarity in linear time. Also, this paper clusters the peers based on content by their set distance to reduce the query time and redundant messages. Simulations show that the content-based search model constructed by set distance not only has higher recall, but also reduce the search cost and query time to the rate of 40% and 30% of Gnutella\",\"PeriodicalId\":358971,\"journal\":{\"name\":\"2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI-IATW.2006.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IATW.2006.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Content-Based Clustered P2P Search Model Depending on Set Distance
The main issues that affect query efficiency and search cost in content-based unstructured P2P search system are the complexity of computing the similarity of the documents brought by high dimensions and the great deal of redundant messages coming with flooding. This paper defines the documents similarity by the way of set distance. This method restrains the complexity of computing the document similarity in linear time. Also, this paper clusters the peers based on content by their set distance to reduce the query time and redundant messages. Simulations show that the content-based search model constructed by set distance not only has higher recall, but also reduce the search cost and query time to the rate of 40% and 30% of Gnutella