{"title":"A comparison of morpheme and word based document retrieval for Asian languages","authors":"Van Be Hai Nguyen, Phil Vines, R. Wilkinson","doi":"10.1109/DEXA.1996.558329","DOIUrl":null,"url":null,"abstract":"Most document retrieval systems are word based. Words are very convenient retrieval units in English but not so in some Asian languages. The task of determining which morphemes constitute words in Vietnamese and Chinese is problematic, and has been assumed to be the reason that word based retrieval does not work so well. The paper examines a number of segmentation algorithms, and then reports on some experiments comparing morpheme and word based retrieval. It shows that morpheme based retrieval is hard to improve on.","PeriodicalId":438695,"journal":{"name":"Proceedings of 7th International Conference and Workshop on Database and Expert Systems Applications: DEXA 96","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1996-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 7th International Conference and Workshop on Database and Expert Systems Applications: DEXA 96","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.1996.558329","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Most document retrieval systems are word based. Words are very convenient retrieval units in English but not so in some Asian languages. The task of determining which morphemes constitute words in Vietnamese and Chinese is problematic, and has been assumed to be the reason that word based retrieval does not work so well. The paper examines a number of segmentation algorithms, and then reports on some experiments comparing morpheme and word based retrieval. It shows that morpheme based retrieval is hard to improve on.