Cost of error correction quantification with Bengali text transcription

Soumalya Ghosh, D. Samanta, M. Sarma
{"title":"Cost of error correction quantification with Bengali text transcription","authors":"Soumalya Ghosh, D. Samanta, M. Sarma","doi":"10.1109/IHCI.2012.6481828","DOIUrl":null,"url":null,"abstract":"Text based interaction in user mother language becomes one of the most primitive and usable interaction technique with the gigantic growth of the digital communication application like e-mailing, messaging, chatting, blogging etc. Efficiency of a text entry technology mainly depends on text entry rate and error rate. Number of errors occurs during typing are generally measured by Levenshtein minimum string distance (MSD) statistic. It counts the number of error present in transcribed text concerning presented text and quantifies the numbers of single edit operation required to transform transcribed text to presented text. It reckons one single edit operation for one error. However, it is not applicable to Indian languages to error correction operation quantification due to language-related features like - complex (`juktakkhor') and matra characters. As single error in a complex character may required multiple single edit operations to correct. The error on complex character is basically confined the practical usefulness of MSD quantification method in Indian language scenario. In this paper, we quantify the minimum number of single edit primitive required to correct the errors in Indian language transcribed text typed by any single stroke or tap text entry tool. To accomplish our objective, initially, we identify the mismatched character (error) positions in both transcribed and presented text both by employing longest common subsequence (LCS) algorithm. Then, we proposed an algorithm to identify whether error in simple character or in complex character. After that, we propose another algorithm to calculate the minimum number of operations to renovate transcribed to presented text depending upon error positions and type (error in simple or complex character). Lastly, we define correction cost per error (CCPE) metric to calculate average correction cost for an erroneous transcribed text.","PeriodicalId":107245,"journal":{"name":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHCI.2012.6481828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Text based interaction in user mother language becomes one of the most primitive and usable interaction technique with the gigantic growth of the digital communication application like e-mailing, messaging, chatting, blogging etc. Efficiency of a text entry technology mainly depends on text entry rate and error rate. Number of errors occurs during typing are generally measured by Levenshtein minimum string distance (MSD) statistic. It counts the number of error present in transcribed text concerning presented text and quantifies the numbers of single edit operation required to transform transcribed text to presented text. It reckons one single edit operation for one error. However, it is not applicable to Indian languages to error correction operation quantification due to language-related features like - complex (`juktakkhor') and matra characters. As single error in a complex character may required multiple single edit operations to correct. The error on complex character is basically confined the practical usefulness of MSD quantification method in Indian language scenario. In this paper, we quantify the minimum number of single edit primitive required to correct the errors in Indian language transcribed text typed by any single stroke or tap text entry tool. To accomplish our objective, initially, we identify the mismatched character (error) positions in both transcribed and presented text both by employing longest common subsequence (LCS) algorithm. Then, we proposed an algorithm to identify whether error in simple character or in complex character. After that, we propose another algorithm to calculate the minimum number of operations to renovate transcribed to presented text depending upon error positions and type (error in simple or complex character). Lastly, we define correction cost per error (CCPE) metric to calculate average correction cost for an erroneous transcribed text.
用孟加拉文转录进行错误校正量化的成本
随着电子邮件、短信、聊天、博客等数字通信应用的迅猛发展,基于用户母语的文本交互成为最原始、最实用的交互技术之一。文本输入技术的效率主要取决于文本输入率和错误率。输入过程中发生的错误数量通常由Levenshtein最小字符串距离(MSD)统计量来衡量。它计算与呈现文本有关的转录文本中存在的错误数量,并量化将转录文本转换为呈现文本所需的单个编辑操作的数量。它为一个错误计算一个编辑操作。然而,由于与语言相关的- complex (' juktakkhor')和matra字符等特征,它不适用于印度语言的纠错操作量化。由于复杂字符中的单个错误可能需要多次单个编辑操作才能纠正。复杂字符上的误差基本限制了MSD量化方法在印度语场景中的实用性。在本文中,我们量化了纠正任何单一笔划或点击文本输入工具输入的印度语言转录文本错误所需的单个编辑原语的最小数量。为了实现我们的目标,首先,我们通过使用最长公共子序列(LCS)算法来识别转录文本和呈现文本中的不匹配字符(错误)位置。然后,我们提出了一种识别简单字符和复杂字符错误的算法。之后,我们提出了另一种算法,根据错误位置和类型(简单或复杂字符中的错误)计算将转录到呈现文本的最小操作数。最后,我们定义了每错误更正成本(CCPE)度量来计算错误转录文本的平均更正成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信