{"title":"A combined approach of formal concept analysis and text mining for concept based document clustering","authors":"Nyeint Nyeint Myat, K. Hla","doi":"10.1109/WI.2005.1","DOIUrl":"https://doi.org/10.1109/WI.2005.1","url":null,"abstract":"Nowadays, the demand of conceptual document clustering is becoming increase to manage various types of vast amount of information published on the World Wide Web. In this paper, we use formal concept analysis (FCA) method for clustering documents according to their formal contexts. Concept hierarchy of documents is built using the formal concepts of the documents in the document corpus. We use tf.idf (term frequency /spl times/ inverse document frequency) term weighting model to reduce less useful concepts from these formal concepts and the association and correlation mining techniques to analyze the relationship of terms in the document corpus.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121506734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. S. Lampropoulos, P. S. Lampropoulou, G. Tsihrintzis
{"title":"A middleware system for Web-based digital music libraries","authors":"A. S. Lampropoulos, P. S. Lampropoulou, G. Tsihrintzis","doi":"10.1109/WI.2005.8","DOIUrl":"https://doi.org/10.1109/WI.2005.8","url":null,"abstract":"We present a middleware system that facilitates Internet users' access to Web-based digital music libraries and allows them to manipulate audio meta-information taking into consideration content and semantic information of music data. Useful relations in the data are automatically extracted through semantic networks (constructed and maintained in the library). Our system is complemented with a query-by-example retrieval subsystem, user relevance feedback facilities, and a new approach for musical genre classification based on the features extracted from signals that correspond to distinct musical instrument sources, as these sources have been identified by a source separation process. The system operation is illustrated in detail.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121561172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Webpage importance analysis using conditional Markov random walk","authors":"Tie-Yan Liu, Wei-Ying Ma","doi":"10.1109/WI.2005.161","DOIUrl":"https://doi.org/10.1109/WI.2005.161","url":null,"abstract":"In this paper, we propose a novel method to calculate the Web page importance based on a conditional Markov random walk model. The main assumption in this model is that given the hyperlinks in a Web page, users are not really randomly clicking one of them. Instead, many factors may bias their behaviors, for example, the anchor text, the content relevance and the previous experiences when visiting the Web site that a destination page belongs to. As one of the results, the user might tend to visit those pages in high-quality Web sites with higher probability. To implement this idea, we reformulate the Web graph to be a two-layer structure, and the Web page importance is calculated by conditional random walk in this new Web graph. Experiments on the topic distillation task of TREC 2003 Web track showed that our new method can achieve about 18% improvement on mean average precision (MAP) and 16% on precision at 10 (P@10) over the PageRank algorithm.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134111102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Web clustering by cluster selection","authors":"Daniel Crabtree, Xiaoying Gao, Peter M. Andreae","doi":"10.1109/WI.2005.75","DOIUrl":"https://doi.org/10.1109/WI.2005.75","url":null,"abstract":"Web page clustering is a technology that puts semantically related Web pages into groups and is useful for categorizing, organizing, and refining search results. When clustering using only textual information, suffix tree clustering (STC) outperforms other clustering algorithms by making use of phrases and allowing clusters to overlap. One problem of STC and other similar algorithms is how to select a small set of clusters to display to the user from a very large set of generated clusters. The cluster selection method used in STC is flawed in that it does not handle overlapping clusters appropriately. This paper introduces a new cluster scoring function and a new cluster selection algorithm to overcome the problems with overlapping clusters, which are combined with STC to make a new clustering algorithm ESTC. This paper's experiments show that ESTC significantly outperforms STC and that even with less data ESTC performs similarly to a commercial clustering search engine.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134455095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Web structure mining for usability analysis","authors":"Chun-hung Li, C. Chui","doi":"10.1109/WI.2005.160","DOIUrl":"https://doi.org/10.1109/WI.2005.160","url":null,"abstract":"The interaction between usability and how a Web site is structured is a complicated issue. In this paper, we discuss a Web structure mining algorithm which allows the automatic extraction of navigational structures in a Web site without performing hypertext analysis. We perform several usability experiments to correlate the usability of Web sites and the structural design of the Web site. Experimental results show that the structure mining algorithm gives reasonable prediction about several design issues in Web structure. The analysis serves as building block in the complex issue of web usability and structure mining.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131191531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Srinivas Vadrevu, S. Nagarajan, Fatih Gelgi, H. Davulcu
{"title":"Automated metadata and instance extraction from news Web sites","authors":"Srinivas Vadrevu, S. Nagarajan, Fatih Gelgi, H. Davulcu","doi":"10.1109/WI.2005.38","DOIUrl":"https://doi.org/10.1109/WI.2005.38","url":null,"abstract":"Over the past few years World Wide Web has established as a vital resource for news. With the continuous growth in the number of available news Web sites and the diversity in their presentation of content, there is an increasing need to organize the news related information on the Web and keep track of it. In this paper, we present automated techniques for extracting metadata instance information by organizing and mining a set of news Web sites. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. The tree-mining algorithms that we present identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. We report experimental evaluation for the news domain to demonstrate the efficacy of our algorithms.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117195432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clickstream log acquisition with Web farming","authors":"Jia Hu, N. Zhong","doi":"10.1109/WI.2005.47","DOIUrl":"https://doi.org/10.1109/WI.2005.47","url":null,"abstract":"Collecting customer interaction data on the e-business Web sites and portals help to figure out customer behavior and build customer profile, and then perform personalized services. Traditional Web server log is hard to be associated with specific customer and impossible to log the complete actions and movements of customers across Web sites. Collecting clickstream log at the application layer with Web farming technology helps to seamlessly integrate Web usage data with other customer related data. This model can be developed as a common plugin for most existing e-business Web sites and portals.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124042429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Page-reRank: using trusted links to re-rank authority","authors":"P. Massa, Conor Hayes","doi":"10.1109/WI.2005.112","DOIUrl":"https://doi.org/10.1109/WI.2005.112","url":null,"abstract":"Search engines like Google.com use the link structure of the Web to determine whether Web pages are authoritative sources of information. However, the linking mechanism provided by HTML does not allow the Web author to express different types of links, such as positive or negative endorsements of page content. As a consequence, search engine algorithms cannot discriminate between sites that are highly linked and sites that are highly trusted. We demonstrate our claim by running PageRank on a real world data set containing positive and negative links. We conclude that simple semantic extensions to the link mechanism would provide a richer semantic network from which to mine more precise Web intelligence.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128309323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aligning class hierarchies with grass-roots class alignment","authors":"B. Yan","doi":"10.1109/WI.2005.23","DOIUrl":"https://doi.org/10.1109/WI.2005.23","url":null,"abstract":"The performance of an ontology alignment technique largely depends on the amount of information that can be leveraged for the alignment task. On the semantic Web, end-users may explicitly or implicitly generate ontology alignments during their use of the semantic data. This kind of end-user-generated ontology alignment, which we call grass-roots ontology alignment, is an important source of information that is yet to be taken into account by current ontology alignment techniques. Grass-roots ontology alignment, often generated as a side effect of other data manipulations, could be user-specific, task-specific, approximate, or even contradictory. This paper reports our work on reusing grass-roots class alignment for aligning class hierarchies. A grass-roots class alignment, though approximate, still reveals some facts about relationships between different classes. We formalize facts about class relationships that can be inferred from an alignment under different cases. We then apply forward-chaining inference to the facts knowledge base to infer more facts. The facts KB is then leveraged for ontology alignment purposes. To deal with uncertainty and inconsistency, each fact is associated with an evidence that tells how the fact is obtained. The evidences are used to select better-supported facts in case of inconsistency.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128752562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward the automatic compilation of multimedia encyclopedias: associating images with term descriptions on the Web","authors":"Atsushi Fujii, Tetsuya Ishikawa","doi":"10.1109/WI.2005.148","DOIUrl":"https://doi.org/10.1109/WI.2005.148","url":null,"abstract":"To generate content for multimedia encyclopedias, we propose a method for searching the Web, seeking images associated with a specific word sense. We use text in an HTML file that links to an image as a pseudo-caption for the image, enabling text-based indexing and retrieval. We use term descriptions in a Web search site called \"Cyclone\" as queries and match images and texts based on word senses. We show the effectiveness of our method experimentally.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127500704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}