基于经验的数学衍生分类系统的构建

H. Borko
{"title":"基于经验的数学衍生分类系统的构建","authors":"H. Borko","doi":"10.1145/1460833.1460865","DOIUrl":null,"url":null,"abstract":"This study describes a method for developing an empirically based, computer derived classification system. 618 psychological abstracts were coded in machine language for computer processing. The total text consisted of approximately 50,000 words of which nearly 6,800 were unique words. The computer program arranged these words in order of frequency of occurrence. From the list of words which occurred 20 or more times, excluding syntactical terms, such as, and, but, of, etc., the investigator selected 90 words for use as index terms. These were arranged in a data matrix with the terms on the horizontal and the document number on the vertical axis. The cells contained the number of times the term was used in the document. Based on these data, a correlation matrix, 90x90 in size, was computed which showed the relationship of each term to every other term. The matrix was factor analyzed and the first 10 eigenvectors were selected as factors. These were rotated for meaning and interpreted as major categories in a classification system. These factors were compared with, and shown to be compatible but not identical to, the classification system used by the American Psychological Association. The results demonstrate the feasibility of an empirically derived classification system and establish the value of factor analysis as a technique in language data processing.","PeriodicalId":307707,"journal":{"name":"AIEE-IRE '62 (Spring)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1899-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":"{\"title\":\"The construction of an empirically based mathematically derived classification system\",\"authors\":\"H. Borko\",\"doi\":\"10.1145/1460833.1460865\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study describes a method for developing an empirically based, computer derived classification system. 618 psychological abstracts were coded in machine language for computer processing. The total text consisted of approximately 50,000 words of which nearly 6,800 were unique words. The computer program arranged these words in order of frequency of occurrence. From the list of words which occurred 20 or more times, excluding syntactical terms, such as, and, but, of, etc., the investigator selected 90 words for use as index terms. These were arranged in a data matrix with the terms on the horizontal and the document number on the vertical axis. The cells contained the number of times the term was used in the document. Based on these data, a correlation matrix, 90x90 in size, was computed which showed the relationship of each term to every other term. The matrix was factor analyzed and the first 10 eigenvectors were selected as factors. These were rotated for meaning and interpreted as major categories in a classification system. These factors were compared with, and shown to be compatible but not identical to, the classification system used by the American Psychological Association. The results demonstrate the feasibility of an empirically derived classification system and establish the value of factor analysis as a technique in language data processing.\",\"PeriodicalId\":307707,\"journal\":{\"name\":\"AIEE-IRE '62 (Spring)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1899-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"38\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AIEE-IRE '62 (Spring)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1460833.1460865\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIEE-IRE '62 (Spring)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1460833.1460865","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38

摘要

本研究描述了一种开发基于经验的计算机衍生分类系统的方法。618篇心理学文摘用机器语言编码,供计算机处理。全文约有5万字,其中近6800字为独特字。计算机程序将这些单词按出现频率排列。研究者从出现20次或20次以上的单词列表中(不包括语法术语,如and, but, of等)选择了90个单词作为索引词。它们被排列在一个数据矩阵中,横轴是术语,纵轴是文档号。单元格包含该术语在文档中使用的次数。基于这些数据,计算出一个大小为90x90的相关矩阵,其中显示了每个项与其他项之间的关系。对矩阵进行因子分析,选取前10个特征向量作为因子。这些词的含义是轮流的,并被解释为分类系统中的主要类别。这些因素与美国心理协会使用的分类系统进行了比较,结果显示它们是相容的,但并不完全相同。结果证明了一个经验推导的分类系统的可行性,并确立了因子分析作为一种语言数据处理技术的价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The construction of an empirically based mathematically derived classification system
This study describes a method for developing an empirically based, computer derived classification system. 618 psychological abstracts were coded in machine language for computer processing. The total text consisted of approximately 50,000 words of which nearly 6,800 were unique words. The computer program arranged these words in order of frequency of occurrence. From the list of words which occurred 20 or more times, excluding syntactical terms, such as, and, but, of, etc., the investigator selected 90 words for use as index terms. These were arranged in a data matrix with the terms on the horizontal and the document number on the vertical axis. The cells contained the number of times the term was used in the document. Based on these data, a correlation matrix, 90x90 in size, was computed which showed the relationship of each term to every other term. The matrix was factor analyzed and the first 10 eigenvectors were selected as factors. These were rotated for meaning and interpreted as major categories in a classification system. These factors were compared with, and shown to be compatible but not identical to, the classification system used by the American Psychological Association. The results demonstrate the feasibility of an empirically derived classification system and establish the value of factor analysis as a technique in language data processing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信