A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis

Thierry Hamon, A. Nazarenko, T. Poibeau, S. Aubin, Julien Derivière
{"title":"A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis","authors":"Thierry Hamon, A. Nazarenko, T. Poibeau, S. Aubin, Julien Derivière","doi":"10.5555/1931390.1931412","DOIUrl":null,"url":null,"abstract":"Web semantic access in specific domains calls for specialized search engines with enhanced semantic querying and indexing capacities, which pertain both to information retrieval (IR) and to information extraction (IE). A rich linguistic analysis is required either to identify the relevant semantic units to index and weight them according to linguistic specific statistical distribution, or as the basis of an information extraction process. Recent developments make Natural Language Processing (NLP) techniques reliable enough to process large collections of documents and to enrich them with semantic annotations. This paper focuses on the design and the development of a text processing platform, Ogmios, which has been developed in the ALVIS project. The Ogmios platform exploits existing NLP modules and resources, which may be tuned to specific domains and produces linguistically annotated documents. We show how the three constraints of genericity, domain semantic awareness and performance can be handled all together.","PeriodicalId":120472,"journal":{"name":"RIAO Conference","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"RIAO Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/1931390.1931412","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Web semantic access in specific domains calls for specialized search engines with enhanced semantic querying and indexing capacities, which pertain both to information retrieval (IR) and to information extraction (IE). A rich linguistic analysis is required either to identify the relevant semantic units to index and weight them according to linguistic specific statistical distribution, or as the basis of an information extraction process. Recent developments make Natural Language Processing (NLP) techniques reliable enough to process large collections of documents and to enrich them with semantic annotations. This paper focuses on the design and the development of a text processing platform, Ogmios, which has been developed in the ALVIS project. The Ogmios platform exploits existing NLP modules and resources, which may be tuned to specific domains and produces linguistically annotated documents. We show how the three constraints of genericity, domain semantic awareness and performance can be handled all together.
一个用于高效和特定领域Web内容分析的健壮语言平台
特定领域中的Web语义访问需要具有增强的语义查询和索引能力的专门搜索引擎,这既适用于信息检索(IR),也适用于信息提取(IE)。需要进行丰富的语言分析,以识别相关的语义单位,并根据语言特定的统计分布对其进行索引和加权,或者作为信息提取过程的基础。最近的发展使得自然语言处理(NLP)技术足够可靠,可以处理大量文档,并用语义注释丰富它们。本文主要介绍了在ALVIS项目中开发的文本处理平台Ogmios的设计与开发。Ogmios平台利用现有的NLP模块和资源,这些模块和资源可以调优到特定的领域,并生成带有语言注释的文档。我们将展示如何同时处理通用性、领域语义感知和性能这三个约束。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信