The catalog: a flexible data structure for magnetic tape

AFIPS '65 (Fall, part I) Pub Date : 1965-11-30 DOI:10.1145/1463891.1463925

M. Kay, Theodore W. Ziehe

{"title":"The catalog: a flexible data structure for magnetic tape","authors":"M. Kay, Theodore W. Ziehe","doi":"10.1145/1463891.1463925","DOIUrl":null,"url":null,"abstract":"The files of data used in linguistic research differ from those found in other research applications in at least three important ways: (1) they are larger, (2) they have more structure, and (3) they have more different kinds of information. These are, of course, all simplifications but not gross ones. It is true that the files that must be maintained by a large insurance company or by the patent office are so large as to pose very special problems, but the uses to which the files are to be put are fairly well understood and their format and organization is not usually subject to drastic and unexpected change. It is also true that the data from a bubble chamber is interesting only if collected in vast quantities, but this is not the only respect in which a bubble chamber is a special kind of tool. A typical linguistic job will bring together a number of files, each very large by the standards of everyday computing: a body of text, a dictionary and a grammar for example. The grammar, if it is anything but a very simple one, will contain a large number of elementary items of information of different kinds, each related to others in a number of different ways. This is what it means to say that the file has a lot of structure. The dictionary may also contain grammatical codes which may consist of characters from one of the languages represented in the dictionary or may be something altogether different. If the dictionary contains alternatives to which probabilities are assigned, then these will presumably be in the form of floating-point numbers. This is what it is like for a file to contain different kinds of information.","PeriodicalId":143723,"journal":{"name":"AFIPS '65 (Fall, part I)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1965-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AFIPS '65 (Fall, part I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1463891.1463925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

The files of data used in linguistic research differ from those found in other research applications in at least three important ways: (1) they are larger, (2) they have more structure, and (3) they have more different kinds of information. These are, of course, all simplifications but not gross ones. It is true that the files that must be maintained by a large insurance company or by the patent office are so large as to pose very special problems, but the uses to which the files are to be put are fairly well understood and their format and organization is not usually subject to drastic and unexpected change. It is also true that the data from a bubble chamber is interesting only if collected in vast quantities, but this is not the only respect in which a bubble chamber is a special kind of tool. A typical linguistic job will bring together a number of files, each very large by the standards of everyday computing: a body of text, a dictionary and a grammar for example. The grammar, if it is anything but a very simple one, will contain a large number of elementary items of information of different kinds, each related to others in a number of different ways. This is what it means to say that the file has a lot of structure. The dictionary may also contain grammatical codes which may consist of characters from one of the languages represented in the dictionary or may be something altogether different. If the dictionary contains alternatives to which probabilities are assigned, then these will presumably be in the form of floating-point numbers. This is what it is like for a file to contain different kinds of information.

查看原文本刊更多论文

目录:磁带的灵活数据结构

语言学研究中使用的数据文件与其他研究应用中发现的数据文件至少在三个重要方面有所不同:(1)它们更大，(2)它们有更多的结构，(3)它们有更多不同种类的信息。当然，这些都是简化，但不是粗鄙的。诚然，必须由大型保险公司或专利局维护的文件太大，以致造成非常特殊的问题，但是这些文件的用途是相当清楚的，它们的格式和组织通常不会受到剧烈的和意想不到的变化。气泡室的数据只有在大量收集时才有趣，这也是事实，但这并不是气泡室作为一种特殊工具的唯一方面。一项典型的语言学工作将汇集大量文件，按照日常计算的标准，每个文件都非常大:例如一段文本、一本字典和一个语法。语法，如果不是非常简单的话，将包含大量不同种类的基本信息项，每个信息项以许多不同的方式相互关联。这就是文件有很多结构的意思。字典还可能包含语法代码，这些代码可能由字典中所表示的一种语言的字符组成，也可能是完全不同的东西。如果字典包含分配概率的替代选项，则这些选项可能以浮点数的形式存在。这就是文件包含不同类型信息的情况。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AFIPS '65 (Fall, part I)

自引率

0.00%

发文量