{"title":"用于在大数据中选择扩展特征的新术语-术语相似性测量方法","authors":"Ilyes Khennak, H. Drias","doi":"10.1109/INDS.2014.23","DOIUrl":null,"url":null,"abstract":"The massive growth of information and the exponential increase in the number of documents published and uploaded online each day have led to led to the appearance of new words in the Internet. Due to the difficulty of reaching the meanings of these new terms, which play a central role in retrieving the desired information, it becomes necessary to give more importance to the sites and topics where these new words appear, or rather, to give value to the words that occur frequently with them. For this purpose, in this paper, we propose a new term-term similarity measure based on the co-occurrence and closeness of words. It relies on searching for each query feature the locations where it appears, then selecting from these locations the words which often neighbor and co-occur with the query features, and finally used the selected words in the retrieval process. Our experiments were performed using the OHSUMED test collection and show significant performance enhancement over the state-of-the-art.","PeriodicalId":388358,"journal":{"name":"2014 International Conference on Advanced Networking Distributed Systems and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A New Term-Term Similarity Measure for Selecting Expansion Features in Big Data\",\"authors\":\"Ilyes Khennak, H. Drias\",\"doi\":\"10.1109/INDS.2014.23\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The massive growth of information and the exponential increase in the number of documents published and uploaded online each day have led to led to the appearance of new words in the Internet. Due to the difficulty of reaching the meanings of these new terms, which play a central role in retrieving the desired information, it becomes necessary to give more importance to the sites and topics where these new words appear, or rather, to give value to the words that occur frequently with them. For this purpose, in this paper, we propose a new term-term similarity measure based on the co-occurrence and closeness of words. It relies on searching for each query feature the locations where it appears, then selecting from these locations the words which often neighbor and co-occur with the query features, and finally used the selected words in the retrieval process. Our experiments were performed using the OHSUMED test collection and show significant performance enhancement over the state-of-the-art.\",\"PeriodicalId\":388358,\"journal\":{\"name\":\"2014 International Conference on Advanced Networking Distributed Systems and Applications\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Advanced Networking Distributed Systems and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDS.2014.23\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Advanced Networking Distributed Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDS.2014.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A New Term-Term Similarity Measure for Selecting Expansion Features in Big Data
The massive growth of information and the exponential increase in the number of documents published and uploaded online each day have led to led to the appearance of new words in the Internet. Due to the difficulty of reaching the meanings of these new terms, which play a central role in retrieving the desired information, it becomes necessary to give more importance to the sites and topics where these new words appear, or rather, to give value to the words that occur frequently with them. For this purpose, in this paper, we propose a new term-term similarity measure based on the co-occurrence and closeness of words. It relies on searching for each query feature the locations where it appears, then selecting from these locations the words which often neighbor and co-occur with the query features, and finally used the selected words in the retrieval process. Our experiments were performed using the OHSUMED test collection and show significant performance enhancement over the state-of-the-art.