IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)最新文献_第7页

Determining Bias to Search Engines from Robots.txt 从Robots.txt确定对搜索引擎的偏见

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI: 10.1109/WI.2007.45

Yang Sun, Ziming Zhuang, Isaac G. Councill, C. Lee Giles

{"title":"Determining Bias to Search Engines from Robots.txt","authors":"Yang Sun, Ziming Zhuang, Isaac G. Councill, C. Lee Giles","doi":"10.1109/WI.2007.45","DOIUrl":"https://doi.org/10.1109/WI.2007.45","url":null,"abstract":"Search engines largely rely on robots (i.e., crawlers or spiders) to collect information from the Web. Such crawling activities can be regulated from the server side by deploying the Robots Exclusion Protocol in a file called robots.txt. Ethical robots will follow the rules specified in robots.txt. Websites can explicitly specify an access preference for each robot by name. Such biases may lead to a \"rich get richer\" situation, in which a few popular search engines ultimately dominate the Web because they have preferred access to resources that are inaccessible to others. This issue is seldom addressed, although the robots.txt convention has become a de facto standard for robot regulation and search engines have become an indispensable tool for information access. We propose a metric to evaluate the degree of bias to which specific robots are subjected. We have investigated 7,593 websites covering education, government, news, and business domains, and collected 2,925 distinct robots.txt files. Results of content and statistical analysis of the data confirm that the robots of popular search engines and information portals, such as Google, Yahoo, and MSN, are generally favored by most of the websites we have sampled. The results also show a strong correlation between the search engine market share and the bias toward particular search engine robots.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127665656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Concordance-Based Entity-Oriented Search 基于一致性的面向实体的搜索

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI: 10.1109/WI.2007.37

Mikhail Bautin, S. Skiena

{"title":"Concordance-Based Entity-Oriented Search","authors":"Mikhail Bautin, S. Skiena","doi":"10.1109/WI.2007.37","DOIUrl":"https://doi.org/10.1109/WI.2007.37","url":null,"abstract":"We consider the problem of finding the relevant named entities in response to a search query over a given text corpus. Entity search can readily be used to augment conventional web search engines for a variety of applications. To assess the significance of entity search, we analyzed the AOL dataset of 36 million web search queries with respect to two different sets of entities: namely (a) 2.3 million distinct entities extracted from a news text corpus and (b) 2.9 million Wikipedia article titles. The results clearly indicate that search engines should be aware of entities, for under various criteria of matching between 18-39% of all web search queries can be recognized as specifically searching for entities, while 73-87% of all queries contain entities. Our entity search engine creates a concordance document for each entity, consisting of all the sentences in the corpus containing that entity. We then index and search these documents using open-source search software. This gives a ranked list of entities as the result of search. Visit http://www.textmap.com for a demonstration of our entity search engine over a large news corpus. We evaluate our system by comparing the results of each query to the list of entities that have highest statistical juxtaposition scores with the queried entity. Juxtaposition score is a measure of how strongly two entities are related in terms of a probabilistic upper bound. The results show excellent performance, particularly over well-characterized classes of entities such as people.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132754928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Extending Description Logic for Reasoning about Ontology Evolution 扩展描述逻辑用于本体演化推理

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI: 10.1109/WI.2007.53

Chuming Chen, M. Matthews

引用次数: 0

You Can't Always Get What You Want: Achieving Differentiated Service Levels with Pricing Agents in a Storage Grid 你不能总是得到你想要的:在存储网格中使用定价代理实现差异化的服务水平

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI: 10.1109/WI.2007.117

H. H. Huang, A. Grimshaw, John F. Karpovich

{"title":"You Can't Always Get What You Want: Achieving Differentiated Service Levels with Pricing Agents in a Storage Grid","authors":"H. H. Huang, A. Grimshaw, John F. Karpovich","doi":"10.1109/WI.2007.117","DOIUrl":"https://doi.org/10.1109/WI.2007.117","url":null,"abstract":"We have designed a new storage grid called Storage@desk to harness unused storage available on desktop machines and turn it into a useful resource for clients. Given the complexity of managing clientspecific QoS requirements, and the dynamism inherent in supply and demand for resources, even a highly experienced system administrator cannot effectively manage resource allocation. In this paper, we present a market-based resource allocation model where pricing agents help resource providers adjust the prices as demand fluctuates. With derivative-following pricing, an agent requires no knowledge of competitors or consumers, which reduces communication overheads and avoids bottlenecks in the system. Individual clients need a variety of service levels and are in competition in scarce resources. Under the budget constraints, the consumers can't always get what they want. The budgets serve as an incentive for the consumers to react to the price signals. We simulate our model using real world trace data and the results show that, using this model, the system allows the consumers to achieve QoS goals under sufficient budgets and degrade in accordance with relative budget amounts.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"25 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114234203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Using Novel IR Measures to Learn Optimal Cluster Structures for Web Information Retrieval 利用新的IR方法学习网络信息检索的最优聚类结构

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI: 10.1109/WI.2007.107

Martin Mehlitz, Jérôme Kunegis, S. Albayrak

引用次数: 10

A Comparison of Dimensionality Reduction Techniques for Web Structure Mining Web结构挖掘的降维技术比较

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI: 10.1109/WI.2007.6

N. F. Chikhi, B. Rothenburger, Nathalie Aussenac-Gilles

引用次数: 26

Perseus -- A Personalized Reputation System 珀尔修斯——一个个性化的声誉系统

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI: 10.1109/WI.2007.144

P. Nurmi

引用次数: 11

A Common Design-Features Ontology for Product Data Semantics Interoperability 面向产品数据语义互操作性的通用设计特征本体

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI: 10.1109/WI.2007.5

Samer Abdul Ghafour, P. Ghodous, B. Shariat, E. Perna

引用次数: 29

Correct your text with Google 用谷歌改正你的文字

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI: 10.1109/WI.2007.41

Stéphanie Jacquemont, F. Jacquenet, M. Sebban

引用次数: 18

A Didactic-based Model of Scenarios for Designing an Adaptive and Context-Aware Learning System 一种基于教学的情景模型，用于设计自适应和情境感知学习系统

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI: 10.1109/WI.2007.118

Jean-Louis Tetchueng, Serge Garlatti, S. Laubé

引用次数: 13