Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation最新文献

筛选
英文 中文
Script-based classification of hand-written text documents in a multilingual environment 多语言环境中手写文本文档的基于脚本的分类
Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249845
Vivek Singhal, N. Navin, D. Ghosh
{"title":"Script-based classification of hand-written text documents in a multilingual environment","authors":"Vivek Singhal, N. Navin, D. Ghosh","doi":"10.1109/RIDE.2003.1249845","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249845","url":null,"abstract":"Script-based text document classification is an important field of research in the context of multilingual textual document processing. But, all script identification techniques available in the literature so far do not consider handwritten documents. Variations in the writing style, character size, inter-line and inter-word spacings, etc. make the recognition process difficult and unreliable when these script identification algorithms, more specifically visual appearance based approaches, are applied directly on hand-written documents. Therefore, in this paper, we propose to preprocess the input document images so as to compensate for the variations due to writing style and thereby making them suitable for analysis on the basis of their visual appearances. Accordingly, we apply denoising, thinning, pruning, m-connectivity and text size normalization in sequence. Multi-channel Gabor filtering is used to extract texture features that characterize the visual appearances of the document images. Experimental result proves the potentiality of our proposed method of script identification for hand-written text document classification.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130593961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Event information extraction using link grammar 使用链接语法提取事件信息
Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249841
H. Madhyastha, N. Balakrishnan, K. Ramakrishnan
{"title":"Event information extraction using link grammar","authors":"H. Madhyastha, N. Balakrishnan, K. Ramakrishnan","doi":"10.1109/RIDE.2003.1249841","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249841","url":null,"abstract":"In this paper, we present a scheme for identifying instances of events and extracting information about them. The scheme can handle all events with which an action can be associated, which covers most types of events. Our system basically tries to extract semantic information from the syntactic structure given by the link grammar system described by D. Sleator and D. Temperly (1991) to any English sentence. The instances of events are identified by finding all sentences in the text where the verb, which best represents the action in the event, or one of its synonyms/hyponyms occurs as a main verb. Then, information about that instance of the event is derived using a set of rules which we have developed to identify the subject and object as well as the modifiers of all verbs and nouns in any English sentence, making use of the structure given by the link parser. The scheme was tested on the Reuters corpus and gave recall and precision even up to 100%.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131678413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
ABHIDHA: an extended WordNet for Indo Aryan languages ABHIDHA:印度雅利安语言的扩展WordNet
Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249839
S. R. Annam, M. Choudhury, S. Sarkar, A. Basu
{"title":"ABHIDHA: an extended WordNet for Indo Aryan languages","authors":"S. R. Annam, M. Choudhury, S. Sarkar, A. Basu","doi":"10.1109/RIDE.2003.1249839","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249839","url":null,"abstract":"A lexical knowledge base is an important component of any intelligent information processing system. The WordNet developed at the Cognitive Systems Laboratories at Princeton has served as a lexical reference system for natural language processing activities. The Indian language based activities at our institute mainly in text-to-speech synthesis and natural language generation from iconic inputs require the inclusion of additional features in the lexical reference system like phonology, word roots, and etymological information. Our initial efforts have been in Hindi and Bengali but commonality of Indo Aryan Languages and the importance of these extra features lead us to believe that it is a worthwhile effort to build-up a WordNet for other Indo Aryan languages containing these features. In this paper, we speak of the issues relating to the structured design and development of a generalized extended WordNet for Indo Aryan languages with special reference to Hindi and Bengali.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133912733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Correlating summarization of a pair of multilingual documents 对多语言文档的关联摘要
Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249844
Xiang-Hua Ji, H. Zha
{"title":"Correlating summarization of a pair of multilingual documents","authors":"Xiang-Hua Ji, H. Zha","doi":"10.1109/RIDE.2003.1249844","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249844","url":null,"abstract":"With the emergence of enormous amount of documents in multiple languages, it is desirable to construct text mining methods that can compare and highlight similarities of them. In this paper, we explore the research issue of comparative summarization for a pair of multilingual documents. A bipartite graph based algorithm is proposed to correlate textual content against sources in various languages. The algorithm aligns the (sub)topics of a pair of multilingual documents and summarizes their correlation by sentence extraction. A pair of documents in different languages is modeled with a weighted bipartite graph. A mutual reinforcement principle is applied to identify a dense subgraph of the weighted bipartite graph. Sentences corresponding to the subgraph are correlated well in textual content and convey the dominant shared topic of the pair of documents. As a further enhancement, a bi-clustering algorithm can first be used to partition the bipartite graph into several clusters, each containing sentences from the two documents. These clusters correspond to shared subtopics, and the above mutual reinforcement principle can be applied to extract topic sentences within each subtopic group.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122793218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On database support for multilingual environments 关于对多语言环境的数据库支持
Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249842
A. Kumaran, J. Haritsa
{"title":"On database support for multilingual environments","authors":"A. Kumaran, J. Haritsa","doi":"10.1109/RIDE.2003.1249842","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249842","url":null,"abstract":"Global e-commerce and mass-outreach e-governance programs have brought into sharp focus the need for database systems to store and manipulate text data efficiently in a suite of natural languages. While some means of storing and querying multilingual data are provided by all current database systems, to the best of our knowledge, there has been no prior study of their functionality or efficiency in this regard. In this paper, we explore the multilingual support needed by the user community and what is currently provided by the popular database systems to satisfy these needs. Specifically, a comparison of multilingual features supported by the database systems is provided against a set of relevant parameters. Initial results from our performance study indicate that serious lacunae exist in the performance with respect to multilingual data. We propose a new data type and associated database system architecture components for making the performance of the database system to be language independent. Results from our initial implementation of the proposed methodology are encouraging indicating the value of such an approach.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131037807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Creation of data resources and design of an evaluation test bed for Devanagari script recognition 数据资源的创建和Devanagari文字识别评估测试平台的设计
Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249846
S. Setlur, Suryaprakash Kompalli, V. Ramanaprasad, V. Govindaraju
{"title":"Creation of data resources and design of an evaluation test bed for Devanagari script recognition","authors":"S. Setlur, Suryaprakash Kompalli, V. Ramanaprasad, V. Govindaraju","doi":"10.1109/RIDE.2003.1249846","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249846","url":null,"abstract":"The Indian subcontinent has a large number of languages, dialects, and scripts with the Devanagari script being the primary and most widely used of all the scripts. To date, much of the Devanagari optical character recognition (OCR) research has been restricted to a handful of groups. So, techniques have not yet been widely disseminated or evaluated independently and automated evaluation tools are currently not available for lack of a standard representation of ground-truth and result data. A key reason for the absence of sustained research efforts in off-line Devanagari OCR appears to be the paucity of data resources. Ground truthed data for words and characters, on-line dictionaries, corpora of text documents and reliable, standardized statistical analyses and evaluation tools are currently lacking. So, the creation of such data resources will undoubtedly provide a much needed fillip to researchers working on Devanagari OCR. This paper describes a National Science Foundation sponsored project under the International Digital Libraries program to create data resources that will facilitate development of Devanagari OCR technology and provide a standardized test bed and evaluation tools for Devanagari script recognition.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121365745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Semi-automatic indexing of documents with a multilingual thesaurus 具有多语言同义词典的文档半自动索引
Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249843
U. Schiel, Ianna M. S. F. de Sousa
{"title":"Semi-automatic indexing of documents with a multilingual thesaurus","authors":"U. Schiel, Ianna M. S. F. de Sousa","doi":"10.1109/RIDE.2003.1249843","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249843","url":null,"abstract":"With the growing significance of digital libraries and the Internet, more and more electronic texts become accessible to a wide and geographically disperse public. This requires adequate tools to facilitate indexing, storage, and retrieval of documents written in different languages. We present a method for semi-automatic indexing of electronic documents and construction of a multilingual thesaurus, which can be used for query formulation and information retrieval. We use special dictionaries and user interaction in order to solve ambiguities and find adequate canonical terms in the language and an adequate abstract language-independent term. The abstract thesaurus is updated incrementally by new indexed documents and is used to search for documents using adequate terms.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125311909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exploiting multi-lingual text potentialities in EBMT systems 利用EBMT系统中多语言文本的潜力
Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249840
F. Mandreoli, R. Martoglia, P. Tiberio
{"title":"Exploiting multi-lingual text potentialities in EBMT systems","authors":"F. Mandreoli, R. Martoglia, P. Tiberio","doi":"10.1109/RIDE.2003.1249840","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249840","url":null,"abstract":"Translating documents from a source to a target language is a repetitive activity. The attempt to automate such a difficult task has been a long-term scientific dream. Among the several types of approaches in machine translation (MT), one of the most promising paradigms is example-based machine translation (EBMT). An EBMT system translates by analogy, using past translations to translate other similar source-language material into the target language. In this paper, we introduce EXTRA (EXample-based TRanslation Assistant), a complete EBMT system that exploits some innovative ideas in information retrieval and multilingual text management to effectively and efficiently extract useful suggestions from past translations and present them to the translator. This work has been developed as a joint work with the LOGOS group, a worldwide leader in multilingual document translation.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130119090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An extensible approach to high-quality multilingual typesetting 高质量多语言排版的可扩展方法
Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249847
J. Plaice, Y. Haralambous, C. Rowley
{"title":"An extensible approach to high-quality multilingual typesetting","authors":"J. Plaice, Y. Haralambous, C. Rowley","doi":"10.1109/RIDE.2003.1249847","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249847","url":null,"abstract":"We propose to create and study a new model for the micro-typography part of automated multilingual typesetting. This new model will support quality typesetting for a number of modern and ancient scripts. The major innovations in the proposal are: the process is refined into four phases, each dependent on a multidimensional tree-structured context summarizing the current linguistic and cultural environment. The four phases are: preparing the input stream for typesetting; segmenting the stream into clusters (words); typesetting these clusters; and then recombining the clusters into a typeset text stream. The context is pervasive throughout the process; the algorithms used in each phase are context-dependent, as are the meanings of fundamental entities such as language, script, font and character.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133510747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Proceedings. Thirteenth International Workshop on Research Issues in Data Engineering: Multi-lingual Information Management. RIDE-MILM 2003 (IEEE Cat.No.03TH8687) 程序。第十三届数据工程研究问题国际研讨会:多语言信息管理。read - milm 2003 (IEEE Cat.No.03TH8687)
Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 1900-01-01 DOI: 10.1109/RIDE.2003.1249838
{"title":"Proceedings. Thirteenth International Workshop on Research Issues in Data Engineering: Multi-lingual Information Management. RIDE-MILM 2003 (IEEE Cat.No.03TH8687)","authors":"","doi":"10.1109/RIDE.2003.1249838","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249838","url":null,"abstract":"The following topics are dealt with: NLP (natural language processing) technologies for MLIM (multi-lingual information management); system issues in MLIM; and multilingual text processing.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128017184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信