{"title":"The use of an association measure based on character structure to identify semantically related pairs of words and document titles","authors":"George W. Adamson, Jillian Boreham","doi":"10.1016/0020-0271(74)90020-5","DOIUrl":null,"url":null,"abstract":"<div><p>An automatic classification technique has been developed, based on the character structure of words. Dice's Similarity Coefficient is computed from the number of matching digrams in pairs of character strings, and used to cluster sets of character strings. A sample of words from a chemical data base was chosen to contain certain stems derived from the names of chemical elements. They were successfully clustered into groups of semantically related words. Each cluster is characterised by the root word from which all its members are derived. A second sample of titles from Mathematical Reviews was clustered into well-defined classes, which compare favourably with the subject groupings of Mathematical Reviews.</p></div>","PeriodicalId":100670,"journal":{"name":"Information Storage and Retrieval","volume":"10 7","pages":"Pages 253-260"},"PeriodicalIF":0.0000,"publicationDate":"1974-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0020-0271(74)90020-5","citationCount":"168","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Storage and Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/0020027174900205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 168
Abstract
An automatic classification technique has been developed, based on the character structure of words. Dice's Similarity Coefficient is computed from the number of matching digrams in pairs of character strings, and used to cluster sets of character strings. A sample of words from a chemical data base was chosen to contain certain stems derived from the names of chemical elements. They were successfully clustered into groups of semantically related words. Each cluster is characterised by the root word from which all its members are derived. A second sample of titles from Mathematical Reviews was clustered into well-defined classes, which compare favourably with the subject groupings of Mathematical Reviews.