信息存储与检索

ACM '59 Pub Date : 1959-09-01 DOI:10.1145/612201.612221
Susan Brewer
{"title":"信息存储与检索","authors":"Susan Brewer","doi":"10.1145/612201.612221","DOIUrl":null,"url":null,"abstract":"The letter and/or sound combinations that make up a human language are limited by the human's ability to pronounce tnese sounds° Therefore, the standard library search, which as a rule looks for all possible combinations of letters to find a word, is wasteful. Certain letters simply cannot be followed by certain other letters and a search for them is senseless. Following this same line of reasoning, letters very frequently occur in the combinations that are germane to the particular language. The growing amount of alphanumeric information presently being stored on magnetic tape presents increasingly difficult problems in both the number of tape reels used and the time necessary to search this mass of information in order to extract pertinent literature. At the present time most of this literature on tape utilizes the standard IBM 6-bit code to express alphanumeric symbols. ~t is entirely feasible to record standard English literature on tape -be it professional abstracts or novels -using only approximately two-thirds of the binary bits utilized to represent the same piece of written material in the conventional code. This can be accomplished by setting up, in a 9-bit code, the 400-odd letter combinations occurring most frequently. A 9-bit representation allows the programmer to set up as many as 512 symbols, thus leaving sufficient leeway to assign symbols to the most frequentlyused words, mathematical symbols, professional expressions, that are expected to be encountered in the literature to be recorded. In addition, these relatively short 9-bit symbols can be assigned to all key words that it may be necessary to look for later, thereby accelerating any future library search.","PeriodicalId":109454,"journal":{"name":"ACM '59","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1959-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Information storage and retrieval\",\"authors\":\"Susan Brewer\",\"doi\":\"10.1145/612201.612221\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The letter and/or sound combinations that make up a human language are limited by the human's ability to pronounce tnese sounds° Therefore, the standard library search, which as a rule looks for all possible combinations of letters to find a word, is wasteful. Certain letters simply cannot be followed by certain other letters and a search for them is senseless. Following this same line of reasoning, letters very frequently occur in the combinations that are germane to the particular language. The growing amount of alphanumeric information presently being stored on magnetic tape presents increasingly difficult problems in both the number of tape reels used and the time necessary to search this mass of information in order to extract pertinent literature. At the present time most of this literature on tape utilizes the standard IBM 6-bit code to express alphanumeric symbols. ~t is entirely feasible to record standard English literature on tape -be it professional abstracts or novels -using only approximately two-thirds of the binary bits utilized to represent the same piece of written material in the conventional code. This can be accomplished by setting up, in a 9-bit code, the 400-odd letter combinations occurring most frequently. A 9-bit representation allows the programmer to set up as many as 512 symbols, thus leaving sufficient leeway to assign symbols to the most frequentlyused words, mathematical symbols, professional expressions, that are expected to be encountered in the literature to be recorded. In addition, these relatively short 9-bit symbols can be assigned to all key words that it may be necessary to look for later, thereby accelerating any future library search.\",\"PeriodicalId\":109454,\"journal\":{\"name\":\"ACM '59\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1959-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM '59\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/612201.612221\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM '59","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/612201.612221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

构成人类语言的字母和/或发音组合受到人类发音能力的限制。因此,标准的图书馆搜索通常会查找所有可能的字母组合来查找一个单词,这是浪费的。某些字母后面根本没有其他字母,搜索它们是毫无意义的。按照同样的推理路线,字母经常出现在与特定语言相关的组合中。目前存储在磁带上的字母数字信息的数量不断增加,在所用磁带卷的数量和搜索这些大量信息以提取相关文献所需的时间方面,都提出了越来越困难的问题。目前,磁带上的大多数文献都使用标准的IBM 6位代码来表示字母数字符号。在磁带上记录标准的英语文学作品是完全可行的——无论是专业摘要还是小说——只需要大约三分之二的二进制位就可以表示传统代码中相同的书面材料。这可以通过在9位代码中设置最频繁出现的400多个字母组合来实现。9位表示允许程序员设置多达512个符号,从而留下足够的余地将符号分配给最常用的单词、数学符号、专业表达,这些符号预计会在要记录的文献中遇到。此外,这些相对较短的9位符号可以分配给以后可能需要查找的所有关键字,从而加快任何未来的库搜索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Information storage and retrieval
The letter and/or sound combinations that make up a human language are limited by the human's ability to pronounce tnese sounds° Therefore, the standard library search, which as a rule looks for all possible combinations of letters to find a word, is wasteful. Certain letters simply cannot be followed by certain other letters and a search for them is senseless. Following this same line of reasoning, letters very frequently occur in the combinations that are germane to the particular language. The growing amount of alphanumeric information presently being stored on magnetic tape presents increasingly difficult problems in both the number of tape reels used and the time necessary to search this mass of information in order to extract pertinent literature. At the present time most of this literature on tape utilizes the standard IBM 6-bit code to express alphanumeric symbols. ~t is entirely feasible to record standard English literature on tape -be it professional abstracts or novels -using only approximately two-thirds of the binary bits utilized to represent the same piece of written material in the conventional code. This can be accomplished by setting up, in a 9-bit code, the 400-odd letter combinations occurring most frequently. A 9-bit representation allows the programmer to set up as many as 512 symbols, thus leaving sufficient leeway to assign symbols to the most frequentlyused words, mathematical symbols, professional expressions, that are expected to be encountered in the literature to be recorded. In addition, these relatively short 9-bit symbols can be assigned to all key words that it may be necessary to look for later, thereby accelerating any future library search.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信