{"title":"一种新的文献索引概念方法","authors":"S. Barresi, S. Nefti-Meziani, Y. Rezgui","doi":"10.1109/ENC.2009.50","DOIUrl":null,"url":null,"abstract":"This paper presents a new conceptual indexing technique intended to overcome the major problems resulting from the use of Term Frequency (TF) based approaches. To resolve the semantic problems related to TF approaches, the proposed technique disambiguates the words contained in a document and creates a list of super ordinates based on an external knowledge source. In order to reduce the dimension of the document vector, the final set of index values is created by extracting a set of common concepts, shared by multiple related words, from the list of hypernyms. Subsequently, a weight is assigned to each concept index by considering its position in the knowledge source's hierarchical tree (i.e. distance from the substituted words) and its number of occurrences. By applying the proposed technique, we were able to disambiguate words within different contexts, extrapolate concepts from documents, assigning appropriate normalised weights, and significantly reduce the vector dimension.","PeriodicalId":273670,"journal":{"name":"2009 Mexican International Conference on Computer Science","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A New Conceptual Approach to Document Indexing\",\"authors\":\"S. Barresi, S. Nefti-Meziani, Y. Rezgui\",\"doi\":\"10.1109/ENC.2009.50\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a new conceptual indexing technique intended to overcome the major problems resulting from the use of Term Frequency (TF) based approaches. To resolve the semantic problems related to TF approaches, the proposed technique disambiguates the words contained in a document and creates a list of super ordinates based on an external knowledge source. In order to reduce the dimension of the document vector, the final set of index values is created by extracting a set of common concepts, shared by multiple related words, from the list of hypernyms. Subsequently, a weight is assigned to each concept index by considering its position in the knowledge source's hierarchical tree (i.e. distance from the substituted words) and its number of occurrences. By applying the proposed technique, we were able to disambiguate words within different contexts, extrapolate concepts from documents, assigning appropriate normalised weights, and significantly reduce the vector dimension.\",\"PeriodicalId\":273670,\"journal\":{\"name\":\"2009 Mexican International Conference on Computer Science\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Mexican International Conference on Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ENC.2009.50\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Mexican International Conference on Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ENC.2009.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper presents a new conceptual indexing technique intended to overcome the major problems resulting from the use of Term Frequency (TF) based approaches. To resolve the semantic problems related to TF approaches, the proposed technique disambiguates the words contained in a document and creates a list of super ordinates based on an external knowledge source. In order to reduce the dimension of the document vector, the final set of index values is created by extracting a set of common concepts, shared by multiple related words, from the list of hypernyms. Subsequently, a weight is assigned to each concept index by considering its position in the knowledge source's hierarchical tree (i.e. distance from the substituted words) and its number of occurrences. By applying the proposed technique, we were able to disambiguate words within different contexts, extrapolate concepts from documents, assigning appropriate normalised weights, and significantly reduce the vector dimension.