RIAO ConferencePub Date : 2007-05-30DOI: 10.5555/1931390.1931438
Jie Lu, Jamie Callan
{"title":"Content-Based Peer-to-Peer Network Overlay for Full-Text Federated Search","authors":"Jie Lu, Jamie Callan","doi":"10.5555/1931390.1931438","DOIUrl":"https://doi.org/10.5555/1931390.1931438","url":null,"abstract":"Peer-to-peer network overlays have mostly been designed to support search over document names, identifiers, or keywords from a small or controlled vocabulary. In this paper we propose a content-based P2P network overlay for full-text federated search over heterogeneous, open-domain contents. Local algorithms are developed to dynamically construct a network overlay with content-based locality and content-based small-world properties. Experimental results using P2P testbeds of real documents demonstrate the effectiveness of our approach.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121518244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RIAO ConferencePub Date : 2007-05-30DOI: 10.5555/1931390.1931396
G. Tzanetakis, M. Lagrange, P. Spong, H. Symonds
{"title":"ORCHIVE: Digitizing and Analyzing Orca Vocalizations","authors":"G. Tzanetakis, M. Lagrange, P. Spong, H. Symonds","doi":"10.5555/1931390.1931396","DOIUrl":"https://doi.org/10.5555/1931390.1931396","url":null,"abstract":"This paper describes the process of creating a large digital archive of killer whale or orca vocalizations. The goal of the project is to digitize approximately 20000 hours of existing analog recordings of these vocalizations in order to facilitate access to researchers internationally. We are also developing tools to assist content-based access and retrieval over this large digital audio archive. After describing the logistics of the digitization process we describe algorithms for denoising the vocalizations and for segmenting the recordings into regions of interest. It is our hope that the creation of this archive and the associated tools will lead to better understanding of the acoustic communications of Orca communities worldwide.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134438559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RIAO ConferencePub Date : 2007-05-30DOI: 10.5555/1931390.1931412
Thierry Hamon, A. Nazarenko, T. Poibeau, S. Aubin, Julien Derivière
{"title":"A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis","authors":"Thierry Hamon, A. Nazarenko, T. Poibeau, S. Aubin, Julien Derivière","doi":"10.5555/1931390.1931412","DOIUrl":"https://doi.org/10.5555/1931390.1931412","url":null,"abstract":"Web semantic access in specific domains calls for specialized search engines with enhanced semantic querying and indexing capacities, which pertain both to information retrieval (IR) and to information extraction (IE). A rich linguistic analysis is required either to identify the relevant semantic units to index and weight them according to linguistic specific statistical distribution, or as the basis of an information extraction process. Recent developments make Natural Language Processing (NLP) techniques reliable enough to process large collections of documents and to enrich them with semantic annotations. This paper focuses on the design and the development of a text processing platform, Ogmios, which has been developed in the ALVIS project. The Ogmios platform exploits existing NLP modules and resources, which may be tuned to specific domains and produces linguistically annotated documents. We show how the three constraints of genericity, domain semantic awareness and performance can be handled all together.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116714449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RIAO ConferencePub Date : 2007-05-30DOI: 10.5555/1931390.1931432
Guillaume Wisniewski, P. Gallinari
{"title":"From Layout to Semantic: a Reranking Model for Mapping Web Documents to Mediated XML Representations","authors":"Guillaume Wisniewski, P. Gallinari","doi":"10.5555/1931390.1931432","DOIUrl":"https://doi.org/10.5555/1931390.1931432","url":null,"abstract":"Many documents on the Web are formated in a weakly structured format. Because of their weak semantic and because of the heterogeneity of their formats, the information conveyed by their structure cannot be directly exploited. We consider here the conversion of such documents into a predefined mediated semi-structured format which will be more amenable to automatic processing of the document content. We develop a machine learning approach to this conversion problem where the transformation is learned automatically from a set of document examples manually transformed into the target structure. Our method proceeds in three steps. Given an input document, document elements are first annotated with labels of the target schema. Structured candidate documents are then generated using a generalized probabilistic context-free parsing algorithm. Finally candidates are reranked using a perceptron like ranking algorithm. Experiments performed on two different datasets show that the proposed method performs well in different contexts.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121490444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RIAO ConferencePub Date : 2007-05-30DOI: 10.5555/1931390.1931401
Davide Picca
{"title":"Semantic Domains and Supersense Tagging for Domain-Specific Ontology Learning","authors":"Davide Picca","doi":"10.5555/1931390.1931401","DOIUrl":"https://doi.org/10.5555/1931390.1931401","url":null,"abstract":"In this paper we propose a novel unsupervised approach to learning domain-specific ontologies from large open-domain text collections. The method is based on the joint exploitation of Semantic Domains and Super Sense Tagging for Information Retrieval tasks. Our approach is able to retrieve domain specific terms and concepts while associating them with a set of high level ontological types, named supersenses, providing flat ontologies characterized by very high accuracy and pertinence to the domain.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122304819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RIAO ConferencePub Date : 2007-05-30DOI: 10.5555/1931390.1931392
T. Sakai, Tatsuya Uehara, Taishi Shimomori, M. Koyama, Mika Fukui
{"title":"Pic-A-Topic: Efficient Viewing of Informative TV Contents on Travel, Cooking, Food and More","authors":"T. Sakai, Tatsuya Uehara, Taishi Shimomori, M. Koyama, Mika Fukui","doi":"10.5555/1931390.1931392","DOIUrl":"https://doi.org/10.5555/1931390.1931392","url":null,"abstract":"Pic-A-Topic is a prototype system designed for enabling the user to view topical segments of recorded TV shows selectively. By analysing closed captions and eletronic program guide texts, it performs topic segmentation and topic sentence selection, and presents a clickable table of contents to the user. Our previous work handled TV shows on travel, and included a user study which suggested that Pic-A-Topic's average segmentation accuracy at that point was possibly indistinguishable from that of manual segmentation. This paper shows that the latest version of Pic-A-Topic is capable of effectively segmenting several TV genres related to travel, cooking, food and talk/variety shows, by means of genre-specific strategies. According to an experiment using 26.5 hours of real Japanese TV shows (25 clips) which subsumes the travel test collection we used earlier (10 clips), Pic-A-Topic's topic segmentation results for non-travel genres are as accurate as those for travel. We adopt an evaluation method that is more demanding than the one we used in our previous work, but even in terms of this strict measurement, Pic-A-Topic's accuracy is around 82% of manual performance on average. Moreover, the fusion of cue phrase detection and vocabulary shift detection is very successful for all the genres that we have targeted.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132899175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RIAO ConferencePub Date : 2007-05-30DOI: 10.5555/1931390.1931447
Vishnu Challam, Susan Gauch, A. Chandramouli
{"title":"Contextual Search Using Ontology-Based User Profiles","authors":"Vishnu Challam, Susan Gauch, A. Chandramouli","doi":"10.5555/1931390.1931447","DOIUrl":"https://doi.org/10.5555/1931390.1931447","url":null,"abstract":"Search engines, generally, return results without any regard for the concepts in which the user is interested. In this paper, we present our approach to personalizing search engines using ontology based contextual profiles. In contrast to long-term user profiles, we construct contextual user profiles that capture what the user is working on at the time they conduct a search. These profiles are used to personalize the search results to suit the information needs of the user at a particular instant of time. We present the results of experiments evaluating the effect of the original versus conceptual ranking and the use of multiple sources of information to build the contextual profile. We were able to achieve a 15% improvement over Google in the average rank of the result clicked by a user when contextual information extracted from open Word documents and Web pages was used to re-rank the results.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123837340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RIAO ConferencePub Date : 2007-05-30DOI: 10.5555/1931390.1931445
Idir Chibane, Bich-Liên Doan
{"title":"Relevance Propagation Model for Large Hypertext Documents Collections","authors":"Idir Chibane, Bich-Liên Doan","doi":"10.5555/1931390.1931445","DOIUrl":"https://doi.org/10.5555/1931390.1931445","url":null,"abstract":"Web search engines have become indispensable in our daily life to help us finding the information we need. Several search tools, for instance Google, use links to select the matching documents against a query. In this paper, we propose a new ranking function that combines content and link rank based on propagation of scores over links. This function propagates scores from source pages to destination pages in relation with query terms. We assessed our ranking function with experiments over two test collections WT10g and GOV. We conclude that propagating link scores according to query terms provides significant improvement for information retrieval.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128466923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RIAO ConferencePub Date : 2007-05-30DOI: 10.5555/1931390.1931431
Y. Mass, D. Sheinwald, B. Sznajder, Sivan Yogev
{"title":"XML Fragments Extended with Database Operators","authors":"Y. Mass, D. Sheinwald, B. Sznajder, Sivan Yogev","doi":"10.5555/1931390.1931431","DOIUrl":"https://doi.org/10.5555/1931390.1931431","url":null,"abstract":"XML documents represent a middle range between unstructured data such as textual documents and fully structured data encoded in databases. Typically, information retrieval techniques are used to support search on the \"unstructured\" end of this scale, while database techniques are used for the structured part. To date, most of the works on XML query and search have stemmed from the structured side and are strongly inspired by database techniques. In a previous work we described a new query approach via pieces of XML data called \"XML Fragments\" which are of the same nature as the queried XML documents and are specifically targeted to support the information needs of end-users in an intuitive way. In addition to its simplicity, XML Fragments represent a natural extension to traditional free text information retrieval queries where both documents and queries are represented as vectors of words and as such it enables a natural extension of IR ranking models to rank XML documents by context and structure. In this paper, we extend XML Fragments with database operators thus allowing both IR style approach together with database \"structured\" query capabilities.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126241552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
RIAO ConferencePub Date : 2007-05-30DOI: 10.5555/1931390.1931455
G. Kumaran, James Allan
{"title":"Information Retrieval Techniques for Templated Queries","authors":"G. Kumaran, James Allan","doi":"10.5555/1931390.1931455","DOIUrl":"https://doi.org/10.5555/1931390.1931455","url":null,"abstract":"Queries in template form are gaining in popularity as a means of conveying specific information needs to search engines. We explore the utility of Information Retrieval (IR) techniques in the context of templated queries. Our investigations show that IR techniques known to be well-suited for ad hoc retrieval don't seamlessly extend to the case of templated queries. We show that what works is a combination of IR techniques and intuition-driven modifications to the templated queries, resulting in statistically significant improvements over the baseline.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116093780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}