{"title":"主题列举和提炼系统","authors":"G. Greco, S. Greco, E. Zumpano","doi":"10.1109/ITCC.2002.1000405","DOIUrl":null,"url":null,"abstract":"Search services on hyperlinked data are becoming popular among users because of the huge amount of data available and the consequent difficulty of retrieving and filtering relevant documents. Traditional term-based search engines are not very useful for this purpose since the resulting ranking depends on the users's precision in expressing the query. Current research, instead, takes a different approach, called topic distillation, which consists of finding documents related to the query topic, but these do not necessarily contain the query string. Current algorithms for topic distillation first compute a base set containing all the relevant pages and then apply an iterative procedure to obtain the authoritative pages. In this paper we present STED, a system for topic distillation and enumeration (i.e. identification of different communities) of Web documents. The system is based on a technique which computes authoritative pages by analyzing the structure of the base set. More specifically, the system applies a statistical approach to the co-citation matrix associated with the base set, to find the most co-cited pages and analyzes both the link structure and the content of pages. Several experiments have demonstrated the effectiveness and efficiency of the system.","PeriodicalId":115190,"journal":{"name":"Proceedings. International Conference on Information Technology: Coding and Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"STED: a system for topic enumeration and distillation\",\"authors\":\"G. Greco, S. Greco, E. Zumpano\",\"doi\":\"10.1109/ITCC.2002.1000405\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Search services on hyperlinked data are becoming popular among users because of the huge amount of data available and the consequent difficulty of retrieving and filtering relevant documents. Traditional term-based search engines are not very useful for this purpose since the resulting ranking depends on the users's precision in expressing the query. Current research, instead, takes a different approach, called topic distillation, which consists of finding documents related to the query topic, but these do not necessarily contain the query string. Current algorithms for topic distillation first compute a base set containing all the relevant pages and then apply an iterative procedure to obtain the authoritative pages. In this paper we present STED, a system for topic distillation and enumeration (i.e. identification of different communities) of Web documents. The system is based on a technique which computes authoritative pages by analyzing the structure of the base set. More specifically, the system applies a statistical approach to the co-citation matrix associated with the base set, to find the most co-cited pages and analyzes both the link structure and the content of pages. Several experiments have demonstrated the effectiveness and efficiency of the system.\",\"PeriodicalId\":115190,\"journal\":{\"name\":\"Proceedings. International Conference on Information Technology: Coding and Computing\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Information Technology: Coding and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITCC.2002.1000405\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Information Technology: Coding and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITCC.2002.1000405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
STED: a system for topic enumeration and distillation
Search services on hyperlinked data are becoming popular among users because of the huge amount of data available and the consequent difficulty of retrieving and filtering relevant documents. Traditional term-based search engines are not very useful for this purpose since the resulting ranking depends on the users's precision in expressing the query. Current research, instead, takes a different approach, called topic distillation, which consists of finding documents related to the query topic, but these do not necessarily contain the query string. Current algorithms for topic distillation first compute a base set containing all the relevant pages and then apply an iterative procedure to obtain the authoritative pages. In this paper we present STED, a system for topic distillation and enumeration (i.e. identification of different communities) of Web documents. The system is based on a technique which computes authoritative pages by analyzing the structure of the base set. More specifically, the system applies a statistical approach to the co-citation matrix associated with the base set, to find the most co-cited pages and analyzes both the link structure and the content of pages. Several experiments have demonstrated the effectiveness and efficiency of the system.