{"title":"An evaluation of type-10 homograph discrimination at the semi-colon level in Roget's International Thesaurus","authors":"J. Talburt, D. Mooney","doi":"10.1145/99412.99453","DOIUrl":null,"url":null,"abstract":"This paper reports the results of evaluating a large sample of the 23,858 type-10 homographs found in Roget's International Thesaurus (3rd Ed.) as defined by the Bryan Model of abstract thesauri, of which Roget's is an instantiation. According to the Bryan model, two different entries in a thesaurus that have the same spelling are homographs (semantically unrelated) if and only if they cannot be the endpoints of a sequence of entries called a type-10 chain. The Bryan definition of a type-10 homograph has not been tested thoroughly until recently because of the combinatorial complexity associated with a direct application of the definition to a large instantiation such as Roget's. However, in 1989, the authors were able to decompose Roget's in into its type-10 components, and as a result, generate all 23,858 type-10 homographs at the semi-colon category level.\nThe principal result is that Bryan's definition of homographs by type-10 semantic disjunction does not appear to work uniformly over a broad range of entries in Roget's when the selected semantic category is the semi-colon group. Although there are many cases where type-10 homographs agree with conventional classifications, in general type-10 discrimination at the semi-colon level “over discriminates” in that it generates many more homographs than are found in standard English language dictionaries.","PeriodicalId":147067,"journal":{"name":"Symposium on Small Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symposium on Small Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/99412.99453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
This paper reports the results of evaluating a large sample of the 23,858 type-10 homographs found in Roget's International Thesaurus (3rd Ed.) as defined by the Bryan Model of abstract thesauri, of which Roget's is an instantiation. According to the Bryan model, two different entries in a thesaurus that have the same spelling are homographs (semantically unrelated) if and only if they cannot be the endpoints of a sequence of entries called a type-10 chain. The Bryan definition of a type-10 homograph has not been tested thoroughly until recently because of the combinatorial complexity associated with a direct application of the definition to a large instantiation such as Roget's. However, in 1989, the authors were able to decompose Roget's in into its type-10 components, and as a result, generate all 23,858 type-10 homographs at the semi-colon category level.
The principal result is that Bryan's definition of homographs by type-10 semantic disjunction does not appear to work uniformly over a broad range of entries in Roget's when the selected semantic category is the semi-colon group. Although there are many cases where type-10 homographs agree with conventional classifications, in general type-10 discrimination at the semi-colon level “over discriminates” in that it generates many more homographs than are found in standard English language dictionaries.