Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics最新文献

筛选
英文 中文
Implicit Links based Web Page Representation for Web Page Classification 网页分类中基于隐式链接的网页表示
Abdelbadie Belmouhcine, M. Benkhalifa
{"title":"Implicit Links based Web Page Representation for Web Page Classification","authors":"Abdelbadie Belmouhcine, M. Benkhalifa","doi":"10.1145/2797115.2797125","DOIUrl":"https://doi.org/10.1145/2797115.2797125","url":null,"abstract":"With the rapid growth of the web's size, web page classification becomes more prominent. The representation way of a web page and contextual features used for this representation have both an impact on the classification's performance. Thus, finding an adequate representation of web pages is essential for a better web page classification. In this paper, we propose a web page representation based on the structure of the implicit graph built using implicit links extracted from the query-log. In this representation, we represent web pages using their textual contents along with their neighbors as features instead of using features of their neighbors. When two or more web pages in the implicit graph share the same direct neighbors and belong to the same class ci, it is most likely that every other web page, having the same immediate neighbors, will belong to the same class ci. We propose two kinds of web page representations: Boolean Neighbor Vector (BNV) and Weighted Neighbor Vector (WNV). In BNV, we supplement the feature vector, which represents the textual content of a web page, by a Boolean vector. This vector represents the target web page's neighbors and shows whether a web page is a direct neighbor of the target web page or not. In WNV, we supplement the feature vector, which represents the textual content of a web page, by a weighted vector. This latter represents the target web page's neighbors and shows strengths of relations between the target web page and its neighbors. We conduct experiments using four classifiers: SVM (Support Vector Machine), NB (Naive Bayes), RF (Random Forest) and KNN (K-Nearest Neighbors) on two subsets of ODP (Open Directory Project). Results show that: (1) the proposed representation helps obtain better classification results when using SVM, NB, RF and KNN for both Bag of Words (BW) and 5-gram representations. (2) The performances based on BNV are better than those based on WNV.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126922826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A LOD-based, query construction and refinement service for web search engines 一个基于lod的web搜索引擎查询构建和细化服务
I. Papadakis, Ioannis Apostolatos, Dimitris Apostolou
{"title":"A LOD-based, query construction and refinement service for web search engines","authors":"I. Papadakis, Ioannis Apostolatos, Dimitris Apostolou","doi":"10.1145/2797115.2797122","DOIUrl":"https://doi.org/10.1145/2797115.2797122","url":null,"abstract":"Nowadays, search engines are the obvious way of finding information on the web. However, there are times when users are forced to engage themselves in long and tedious search sessions during which they have to process their initial query a number of times until they come up with results that satisfy their information needs. This paper proposes a query construction and refinement service that aids users during their engagement with a large scale web search engine. As a proof of concept, GContext is presented and accordingly evaluated as an implementation of the proposed service. GContext integrates various sources of the lod-cloud within the environment of a large scale web search engine.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123866714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Creating Semantic Fingerprints for Web Documents 为Web文档创建语义指纹
K. Krieger, J. Schneider, Christian Nywelt, D. Rösner
{"title":"Creating Semantic Fingerprints for Web Documents","authors":"K. Krieger, J. Schneider, Christian Nywelt, D. Rösner","doi":"10.1145/2797115.2797132","DOIUrl":"https://doi.org/10.1145/2797115.2797132","url":null,"abstract":"With Semantic Web technologies and Linked Data datasets we are able to not only retrieve the textual content of a document but also to automatically create formal semantic descriptions of its content. In this paper we present a Linked Data-based approach to automatically generate semantic fingerprints for Web documents. Our approach exploits the structured information in Linked Data datasets to derive an explicit semantic description of a Web resource. A two-stage evaluation of the implementation of the presented approach shows its feasibility and robustness.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127411015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time schema.org词汇的采用和演变的网络尺度研究
R. Meusel, Christian Bizer, Heiko Paulheim
{"title":"A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time","authors":"R. Meusel, Christian Bizer, Heiko Paulheim","doi":"10.1145/2797115.2797124","DOIUrl":"https://doi.org/10.1145/2797115.2797124","url":null,"abstract":"Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of large-scale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from different points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare different versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at different points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133350318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Matching HTML Tables to DBpedia 匹配HTML表到DBpedia
Dominique Ritze, O. Lehmberg, Christian Bizer
{"title":"Matching HTML Tables to DBpedia","authors":"Dominique Ritze, O. Lehmberg, Christian Bizer","doi":"10.1145/2797115.2797118","DOIUrl":"https://doi.org/10.1145/2797115.2797118","url":null,"abstract":"Millions of HTML tables containing structured data can be found on the Web. With their wide coverage, these tables are potentially very useful for filling missing values and extending cross-domain knowledge bases such as DBpedia, YAGO, or the Google Knowledge Graph. As a prerequisite for being able to use table data for knowledge base extension, the HTML tables need to be matched with the knowledge base, meaning that correspondences between table rows/columns and entities/schema elements of the knowledge base need to be found. This paper presents the T2D gold standard for measuring and comparing the performance of HTML table to knowledge base matching systems. T2D consists of 8 700 schema-level and 26 100 entity-level correspondences between the WebDataCommons Web Tables Corpus and the DBpedia knowledge base. In contrast related work on HTML table to knowledge base matching, the Web Tables Corpus (147 million tables), the knowledge base, as well as the gold standard are publicly available. The gold standard is used afterward to evaluate the performance of T2K Match, an iterative matching method which combines schema and instance matching. T2K Match is designed for the use case of matching large quantities of mostly small and narrow HTML tables against large cross-domain knowledge bases. The evaluation using the T2D gold standard shows that T2K Match discovers table-to-class correspondences with a precision of 94%, row-to-entity correspondences with a precision of 90%, and column-to-property correspondences with a precision of 77%.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133393499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 160
Recommending Customizable Products: A Multiple Choice Knapsack Solution 推荐可定制的产品:一个多重选择的背包解决方案
A. Sivaramakrishnan, Madhusudhan Krishnamachari, Vidhya Balasubramanian
{"title":"Recommending Customizable Products: A Multiple Choice Knapsack Solution","authors":"A. Sivaramakrishnan, Madhusudhan Krishnamachari, Vidhya Balasubramanian","doi":"10.1145/2797115.2797116","DOIUrl":"https://doi.org/10.1145/2797115.2797116","url":null,"abstract":"Recommender systems have become very prominent over the past decade. Methods such as collaborative filtering and knowledge based recommender systems have been developed extensively for non-customizable products. However, as manufacturers today are moving towards customizable products to satisfy customers, the need of the hour is customizable product recommender systems. Such systems must be able to capture customer preferences and provide recommendations that are both diverse and novel. This paper proposes an approach to building a recommender system that can be adapted to customizable products such as desktop computers and home theater systems. The Customizable Product Recommendation problem is modeled as a special case of the Multiple Choice Knapsack Problem, and an algorithm is proposed to generate desirable product recommendations in real-time. The performance of the proposed system is then evaluated.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126746771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
What Makes Ontology Reasoning so Arduous?: Unveiling the key ontological features 是什么让本体论推理如此艰难?:揭示关键的本体特征
N. Alaya, S. Yahia, M. Lamolle
{"title":"What Makes Ontology Reasoning so Arduous?: Unveiling the key ontological features","authors":"N. Alaya, S. Yahia, M. Lamolle","doi":"10.1145/2797115.2797117","DOIUrl":"https://doi.org/10.1145/2797115.2797117","url":null,"abstract":"Reasoning with ontologies is one of the core fields of research in Description Logics. A variety of efficient reasoner with highly optimized algorithms have been developed to allow inference tasks on expressive ontology languages such as OWL(DL). However, reasoner reported computing times have exceeded and sometimes fall behind the expected theoretical values. From an empirical perspective, it is not yet well understood, which particular aspects in the ontology are reasoner performance degrading factors. In this paper, we conducted an investigation about state of art works that attempted to portray potential correlation between reasoner empirical behaviour and particular ontological features. These works were analysed and then broken down into categories. Further, we proposed a set of ontology features covering a broad range of structural and syntactic ontology characteristics. We claim that these features are good indicators of the ontology hardness level against reasoning tasks. In order to assess the worthiness of our proposals, we adopted a supervised machine learning approach. Features served as the bases to learn predictive models of reasoners robustness. These models was trained for 6 well known reasoners and using their evaluation results during the ORE'2014 competition. Our prediction models showed a high accuracy level which witness the effectiveness of our set of features.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133773695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
User Modeling in Folksonomies: Relational Clustering and Tag Weighting 大众分类法中的用户建模:关系聚类和标签加权
Takuya Kitazawa, M. Sugiyama
{"title":"User Modeling in Folksonomies: Relational Clustering and Tag Weighting","authors":"Takuya Kitazawa, M. Sugiyama","doi":"10.1145/2797115.2797129","DOIUrl":"https://doi.org/10.1145/2797115.2797129","url":null,"abstract":"This paper proposes a user-modeling method for folksonomic data. Since data mining of folksonomic data is difficult due to their complexity, significant amounts of preprocessing are usually required. To catch sketchy characteristics of such complex data, our method employs two steps: (1) using the infinite relational model (IRM) to perform relational clustering of a folksonomic data set, and (2) using tag-weighting to extract the characteristics of each user cluster. As an experimental evaluation, we applied our method to real-world data from one of the most popular social bookmarking services in Japan. Our user-modeling method successfully extracted semantically clustered user models, thus demonstrating that relational data analysis has promise for mining folksonomic data. In addition, we developed the user-model-based filtering algorithm (UMF), which evaluates the user models by their resource recommendations. The F-measure was higher than that of random recommendation, and the running time was much shorter than that of collaborative-filtering-based top-n recommendation.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134456684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
bacon: Linked Data Integration based on the RDF Data Cube Vocabulary bacon:基于RDF数据立方体词汇表的关联数据集成
Sebastian P. Bayerl, M. Granitzer
{"title":"bacon: Linked Data Integration based on the RDF Data Cube Vocabulary","authors":"Sebastian P. Bayerl, M. Granitzer","doi":"10.1145/2797115.2797126","DOIUrl":"https://doi.org/10.1145/2797115.2797126","url":null,"abstract":"Discovering and integrating relevant real-live datasets are essential tasks, when it comes to handling Linked Data. Similar to Data Warehousing approaches, Linked Data can be prepared to enable sophisticated data analysis. The developed open source framework bacon enables interactive and crowed-sourced Data Integration on Linked Data (Linked Data Integration), utilizing the RDF Data Cube Vocabulary and the semantic properties of Linked Open Data. Discovering suitable datasets on-the-fly in local or remote repositories sets up the ensuing integration process. Based on well-known Data Warehousing processes, the semantic nature of the data is taken into account to handle and merge RDF Data Cubes. To do so, structure and content of the cubes must be analyzed and processed. A similarity measure has been developed to find similarly structured cubes. The user is offered a graphical interface, where he can search for suitable cubes and modify their structure based on semantic properties. This process is fostered by a set of automated suggestions to support inexperienced users and also domain experts.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134609855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Modeling and predicting information search behavior 建模和预测信息搜索行为
Saraschandra Karanam, H. Oostendorp, M. Sanchiz, A. Chevalier, Jessie Chin, W. Fu
{"title":"Modeling and predicting information search behavior","authors":"Saraschandra Karanam, H. Oostendorp, M. Sanchiz, A. Chevalier, Jessie Chin, W. Fu","doi":"10.1145/2797115.2797123","DOIUrl":"https://doi.org/10.1145/2797115.2797123","url":null,"abstract":"This paper looks at two limitations of cognitive models of web-navigation: first, they do not account for the entire process of information search and second, they do not account for the differences in search behavior caused by aging. To address these limitations, data from an experiment in which two types of information search tasks (simple and difficult), presented to both young and old participants was used. We found that in general difficult tasks demand significantly more time, significantly more clicks, significantly more reformulations and are answered significantly less accurately than simple tasks. Older persons inspect the search engine result pages significantly longer, produce significantly fewer reformulations with difficult tasks than younger persons, and are significantly more accurate than younger persons with simple tasks. We next used a cognitive model of web-navigation called CoLiDeS to predict which search engine result a user would choose to click. Old participants were found to click more often only on search engine results with high semantic similarity with the query. Search engine results generated by old participants were of higher semantic similarity value (computed w.r.t the query) than those generated by young participants only in the second cycle. Match between model-predicted clicks and actual user clicks was found to be significantly higher for difficult tasks compared to simple tasks. Potential improvements in enhancing the modeling and its applications are discussed.","PeriodicalId":386229,"journal":{"name":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114173044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信