Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation最新文献

Script-based classification of hand-written text documents in a multilingual environment 多语言环境中手写文本文档的基于脚本的分类

Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249845

Vivek Singhal, N. Navin, D. Ghosh

{"title":"Script-based classification of hand-written text documents in a multilingual environment","authors":"Vivek Singhal, N. Navin, D. Ghosh","doi":"10.1109/RIDE.2003.1249845","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249845","url":null,"abstract":"Script-based text document classification is an important field of research in the context of multilingual textual document processing. But, all script identification techniques available in the literature so far do not consider handwritten documents. Variations in the writing style, character size, inter-line and inter-word spacings, etc. make the recognition process difficult and unreliable when these script identification algorithms, more specifically visual appearance based approaches, are applied directly on hand-written documents. Therefore, in this paper, we propose to preprocess the input document images so as to compensate for the variations due to writing style and thereby making them suitable for analysis on the basis of their visual appearances. Accordingly, we apply denoising, thinning, pruning, m-connectivity and text size normalization in sequence. Multi-channel Gabor filtering is used to extract texture features that characterize the visual appearances of the document images. Experimental result proves the potentiality of our proposed method of script identification for hand-written text document classification.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130593961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

Event information extraction using link grammar 使用链接语法提取事件信息

Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249841

H. Madhyastha, N. Balakrishnan, K. Ramakrishnan

引用次数: 31

ABHIDHA: an extended WordNet for Indo Aryan languages ABHIDHA:印度雅利安语言的扩展WordNet

Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249839

S. R. Annam, M. Choudhury, S. Sarkar, A. Basu

引用次数: 3

Correlating summarization of a pair of multilingual documents 对多语言文档的关联摘要

Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249844

Xiang-Hua Ji, H. Zha

{"title":"Correlating summarization of a pair of multilingual documents","authors":"Xiang-Hua Ji, H. Zha","doi":"10.1109/RIDE.2003.1249844","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249844","url":null,"abstract":"With the emergence of enormous amount of documents in multiple languages, it is desirable to construct text mining methods that can compare and highlight similarities of them. In this paper, we explore the research issue of comparative summarization for a pair of multilingual documents. A bipartite graph based algorithm is proposed to correlate textual content against sources in various languages. The algorithm aligns the (sub)topics of a pair of multilingual documents and summarizes their correlation by sentence extraction. A pair of documents in different languages is modeled with a weighted bipartite graph. A mutual reinforcement principle is applied to identify a dense subgraph of the weighted bipartite graph. Sentences corresponding to the subgraph are correlated well in textual content and convey the dominant shared topic of the pair of documents. As a further enhancement, a bi-clustering algorithm can first be used to partition the bipartite graph into several clusters, each containing sentences from the two documents. These clusters correspond to shared subtopics, and the above mutual reinforcement principle can be applied to extract topic sentences within each subtopic group.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122793218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

On database support for multilingual environments 关于对多语言环境的数据库支持

Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249842

A. Kumaran, J. Haritsa

引用次数: 11

Creation of data resources and design of an evaluation test bed for Devanagari script recognition 数据资源的创建和Devanagari文字识别评估测试平台的设计

Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249846

S. Setlur, Suryaprakash Kompalli, V. Ramanaprasad, V. Govindaraju

{"title":"Creation of data resources and design of an evaluation test bed for Devanagari script recognition","authors":"S. Setlur, Suryaprakash Kompalli, V. Ramanaprasad, V. Govindaraju","doi":"10.1109/RIDE.2003.1249846","DOIUrl":"https://doi.org/10.1109/RIDE.2003.1249846","url":null,"abstract":"The Indian subcontinent has a large number of languages, dialects, and scripts with the Devanagari script being the primary and most widely used of all the scripts. To date, much of the Devanagari optical character recognition (OCR) research has been restricted to a handful of groups. So, techniques have not yet been widely disseminated or evaluated independently and automated evaluation tools are currently not available for lack of a standard representation of ground-truth and result data. A key reason for the absence of sustained research efforts in off-line Devanagari OCR appears to be the paucity of data resources. Ground truthed data for words and characters, on-line dictionaries, corpora of text documents and reliable, standardized statistical analyses and evaluation tools are currently lacking. So, the creation of such data resources will undoubtedly provide a much needed fillip to researchers working on Devanagari OCR. This paper describes a National Science Foundation sponsored project under the International Digital Libraries program to create data resources that will facilitate development of Devanagari OCR technology and provide a standardized test bed and evaluation tools for Devanagari script recognition.","PeriodicalId":208636,"journal":{"name":"Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121365745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Semi-automatic indexing of documents with a multilingual thesaurus 具有多语言同义词典的文档半自动索引

Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249843

U. Schiel, Ianna M. S. F. de Sousa

引用次数: 3

Exploiting multi-lingual text potentialities in EBMT systems 利用EBMT系统中多语言文本的潜力

Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249840

F. Mandreoli, R. Martoglia, P. Tiberio

引用次数: 4

An extensible approach to high-quality multilingual typesetting 高质量多语言排版的可扩展方法

Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 2003-03-10 DOI: 10.1109/RIDE.2003.1249847

J. Plaice, Y. Haralambous, C. Rowley

引用次数: 5

Proceedings. Thirteenth International Workshop on Research Issues in Data Engineering: Multi-lingual Information Management. RIDE-MILM 2003 (IEEE Cat.No.03TH8687) 程序。第十三届数据工程研究问题国际研讨会:多语言信息管理。read - milm 2003 (IEEE Cat.No.03TH8687)

Proceedings. Seventeenth Workshop on Parallel and Distributed Simulation Pub Date : 1900-01-01 DOI: 10.1109/RIDE.2003.1249838

引用次数: 0