按字母后继变体分词

Margaret A. Hafer, Stephen F. Weiss
{"title":"按字母后继变体分词","authors":"Margaret A. Hafer,&nbsp;Stephen F. Weiss","doi":"10.1016/0020-0271(74)90044-8","DOIUrl":null,"url":null,"abstract":"<div><p>This paper describes a method for automatically segmenting words into their stems and affixes. The process uses certain statistical properties of a corpus (successor and predecessor letter variety counts) to indicate where words should be divided. Consequently, this process is less reliant on human intervention than are other methods for automated stemming.</p><p>The segmentation system is used to construct stem dictionaries for document classification. Information retrieval experiments are then performed using documents and queries so classified. Results show not only that this method is capable of high quality word segmentation, but also that its use in information retrieval produces results that are at least as good as those obtained using the more traditional stemming processes.</p></div>","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 11","pages":"Pages 371-385"},"PeriodicalIF":0.0000,"publicationDate":"1974-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90044-8","citationCount":"238","resultStr":"{\"title\":\"Word segmentation by letter successor varieties\",\"authors\":\"Margaret A. Hafer,&nbsp;Stephen F. Weiss\",\"doi\":\"10.1016/0020-0271(74)90044-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper describes a method for automatically segmenting words into their stems and affixes. The process uses certain statistical properties of a corpus (successor and predecessor letter variety counts) to indicate where words should be divided. Consequently, this process is less reliant on human intervention than are other methods for automated stemming.</p><p>The segmentation system is used to construct stem dictionaries for document classification. Information retrieval experiments are then performed using documents and queries so classified. Results show not only that this method is capable of high quality word segmentation, but also that its use in information retrieval produces results that are at least as good as those obtained using the more traditional stemming processes.</p></div>\",\"PeriodicalId\":100670,\"journal\":{\"name\":\"Information Storage and Retrieval\",\"volume\":\"10 11\",\"pages\":\"Pages 371-385\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1974-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/0020-0271(74)90044-8\",\"citationCount\":\"238\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Storage and Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/0020027174900448\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Storage and Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/0020027174900448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 238

摘要

本文描述了一种词干词缀自动分词的方法。该过程使用语料库的某些统计属性(后继字母和前导字母的变化计数)来指示应该在哪里划分单词。因此,与其他自动提取方法相比,该过程较少依赖于人为干预。该分词系统用于构建词干词典,用于文档分类。然后使用分类后的文档和查询进行信息检索实验。结果表明,该方法不仅能够实现高质量的分词,而且在信息检索中所产生的结果至少与使用更传统的词干提取过程所获得的结果一样好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Word segmentation by letter successor varieties

This paper describes a method for automatically segmenting words into their stems and affixes. The process uses certain statistical properties of a corpus (successor and predecessor letter variety counts) to indicate where words should be divided. Consequently, this process is less reliant on human intervention than are other methods for automated stemming.

The segmentation system is used to construct stem dictionaries for document classification. Information retrieval experiments are then performed using documents and queries so classified. Results show not only that this method is capable of high quality word segmentation, but also that its use in information retrieval produces results that are at least as good as those obtained using the more traditional stemming processes.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信