Shallow syntax analysis in Sanskrit guided by semantic nets constraints

G. Huet
{"title":"Shallow syntax analysis in Sanskrit guided by semantic nets constraints","authors":"G. Huet","doi":"10.1145/1364742.1364750","DOIUrl":null,"url":null,"abstract":"We present the state of the art of a computational platform for the analysis of classical Sanskrit. The platform comprises modules for phonology, morphology, segmentation and shallow syntax analysis, organized around a structured lexical database. It relies on the Zen toolkit for finite state automata and transducers, which provides data structures and algorithms for the modular construction and execution of finite state machines, in a functional framework.\n Some of the layers proceed in bottom-up synthesis mode - for instance, noun and verb morphological modules generate all inflected forms from stems and roots listed in the lexicon. Morphemes are assembled through internal sandhi, and the inflected forms are stored with morphological tags in dictionaries usable for lemmatizing. These dictionaries are then compiled into transducers, implementing the analysis of external sandhi, the phonological process which merges words together by euphony. This provides a tagging segmenter, which analyses a sentence presented as a stream of phonemes and produces a stream of tagged lexical entries, hyperlinked to the lexicon.\n The next layer is a syntax analyser, guided by semantic nets constraints expressing dependencies between the word forms. Finite verb forms demand semantic roles, according to valency patterns depending on the voice (active, passive) of the form and the governance (transitive, etc) of the root. Conversely, noun/adjective forms provide actors which may fill those roles, provided agreement constraints are satisfied. Tool words are mapped to transducers operating on tagged streams, allowing the modeling of linguistic phenomena such as coordination by abstract interpretation of actor streams. The parser ranks the various interpretations (matching actors with roles) with penalties, and returns to the user the minimum penalty analyses, for final validation of ambiguities. The whole platform is organized as a Web service, allowing the piecewise tagging of a Sanskrit text.","PeriodicalId":287514,"journal":{"name":"International Workshop On Research Issues in Digital Libraries","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop On Research Issues in Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1364742.1364750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38

Abstract

We present the state of the art of a computational platform for the analysis of classical Sanskrit. The platform comprises modules for phonology, morphology, segmentation and shallow syntax analysis, organized around a structured lexical database. It relies on the Zen toolkit for finite state automata and transducers, which provides data structures and algorithms for the modular construction and execution of finite state machines, in a functional framework. Some of the layers proceed in bottom-up synthesis mode - for instance, noun and verb morphological modules generate all inflected forms from stems and roots listed in the lexicon. Morphemes are assembled through internal sandhi, and the inflected forms are stored with morphological tags in dictionaries usable for lemmatizing. These dictionaries are then compiled into transducers, implementing the analysis of external sandhi, the phonological process which merges words together by euphony. This provides a tagging segmenter, which analyses a sentence presented as a stream of phonemes and produces a stream of tagged lexical entries, hyperlinked to the lexicon. The next layer is a syntax analyser, guided by semantic nets constraints expressing dependencies between the word forms. Finite verb forms demand semantic roles, according to valency patterns depending on the voice (active, passive) of the form and the governance (transitive, etc) of the root. Conversely, noun/adjective forms provide actors which may fill those roles, provided agreement constraints are satisfied. Tool words are mapped to transducers operating on tagged streams, allowing the modeling of linguistic phenomena such as coordination by abstract interpretation of actor streams. The parser ranks the various interpretations (matching actors with roles) with penalties, and returns to the user the minimum penalty analyses, for final validation of ambiguities. The whole platform is organized as a Web service, allowing the piecewise tagging of a Sanskrit text.
语义网约束下的梵文浅语法分析
我们提出了一个用于分析古典梵文的计算平台的最新状态。该平台包括音系、词法、分词和浅层语法分析模块,围绕一个结构化的词汇数据库进行组织。它依赖于有限状态自动机和传感器的Zen工具包,该工具包在功能框架中为有限状态机的模块化构造和执行提供数据结构和算法。有些层以自下而上的合成模式进行——例如,名词和动词形态模块根据词典中列出的词干和词根生成所有的屈折形式。语素通过内部变调组合,词形变化形式与词形标记一起存储在词典中,用于词法转换。然后,这些词典被编译成换能器,实现外部变调的分析,即通过谐音将单词合并在一起的语音过程。它提供了一个标记切分器,该切分器分析作为音素流呈现的句子,并生成标记的词汇条目流,这些条目与词汇有超链接。下一层是语法分析器,由表达词形式之间依赖关系的语义网约束指导。有限动词形式需要语义角色,根据形式的语态(主动、被动)和词根的支配(及物等)的配价模式。相反,名词/形容词形式提供了可以填补这些角色的行动者,前提是协议约束得到满足。工具词被映射到在标记流上操作的换能器,允许对语言现象进行建模,例如通过对行动者流的抽象解释进行协调。解析器根据惩罚对各种解释(匹配参与者和角色)进行排序,并将最小惩罚分析返回给用户,以便对歧义进行最终验证。整个平台被组织为一个Web服务,允许对梵文文本进行分段标记。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信