Jeong-Joon Kim, Yong-Soo Lee, Jin-Yong Moon, Jeongmin Park
{"title":"基于潜在语义索引的文档分类方法","authors":"Jeong-Joon Kim, Yong-Soo Lee, Jin-Yong Moon, Jeongmin Park","doi":"10.14257/IJGDC.2018.11.4.09","DOIUrl":null,"url":null,"abstract":"Among the studies, Latent Semantic Indexing and Non-negative Matrix Factorization, which are algorithms to classify the document by meaning, try solve the problems by converting the document to vector. However, there are 2 problems in these algorithms that the different understanding according to education document and the difficulties to analyze the multiple representations of the terms. Meanwhile, WordNet is a word dictionary interpreting the relationship of the words based on Human Intelligence Science and widely used in such as query term extension of the search engine. However, it is difficult to adapt to the neologism and slang and word meaning change to fast-changing time. Therefore, in this paper we solve the problem of the multiple representations of the words by partly applying the words relationship of the WordNet to Latent Semantic Indexing using by genetic algorithms for more efficient clustering document with the strength and weakness of the Latent Semantic Indexing and WordNet. And with this we try to improve precision and increase the efficiency of the overall clusters","PeriodicalId":46000,"journal":{"name":"International Journal of Grid and Distributed Computing","volume":"11 1","pages":"97-112"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Document Classification Method based on Latent Semantic Indexing\",\"authors\":\"Jeong-Joon Kim, Yong-Soo Lee, Jin-Yong Moon, Jeongmin Park\",\"doi\":\"10.14257/IJGDC.2018.11.4.09\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Among the studies, Latent Semantic Indexing and Non-negative Matrix Factorization, which are algorithms to classify the document by meaning, try solve the problems by converting the document to vector. However, there are 2 problems in these algorithms that the different understanding according to education document and the difficulties to analyze the multiple representations of the terms. Meanwhile, WordNet is a word dictionary interpreting the relationship of the words based on Human Intelligence Science and widely used in such as query term extension of the search engine. However, it is difficult to adapt to the neologism and slang and word meaning change to fast-changing time. Therefore, in this paper we solve the problem of the multiple representations of the words by partly applying the words relationship of the WordNet to Latent Semantic Indexing using by genetic algorithms for more efficient clustering document with the strength and weakness of the Latent Semantic Indexing and WordNet. And with this we try to improve precision and increase the efficiency of the overall clusters\",\"PeriodicalId\":46000,\"journal\":{\"name\":\"International Journal of Grid and Distributed Computing\",\"volume\":\"11 1\",\"pages\":\"97-112\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Grid and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14257/IJGDC.2018.11.4.09\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Grid and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/IJGDC.2018.11.4.09","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Document Classification Method based on Latent Semantic Indexing
Among the studies, Latent Semantic Indexing and Non-negative Matrix Factorization, which are algorithms to classify the document by meaning, try solve the problems by converting the document to vector. However, there are 2 problems in these algorithms that the different understanding according to education document and the difficulties to analyze the multiple representations of the terms. Meanwhile, WordNet is a word dictionary interpreting the relationship of the words based on Human Intelligence Science and widely used in such as query term extension of the search engine. However, it is difficult to adapt to the neologism and slang and word meaning change to fast-changing time. Therefore, in this paper we solve the problem of the multiple representations of the words by partly applying the words relationship of the WordNet to Latent Semantic Indexing using by genetic algorithms for more efficient clustering document with the strength and weakness of the Latent Semantic Indexing and WordNet. And with this we try to improve precision and increase the efficiency of the overall clusters
期刊介绍:
IJGDC aims to facilitate and support research related to control and automation technology and its applications. Our Journal provides a chance for academic and industry professionals to discuss recent progress in the area of control and automation. To bridge the gap of users who do not have access to major databases where one should pay for every downloaded article; this online publication platform is open to all readers as part of our commitment to global scientific society. Journal Topics: -Architectures and Fabrics -Autonomic and Adaptive Systems -Cluster and Grid Integration -Creation and Management of Virtual Enterprises and Organizations -Dependable and Survivable Distributed Systems -Distributed and Large-Scale Data Access and Management -Distributed Multimedia Systems -Distributed Trust Management -eScience and eBusiness Applications -Fuzzy Algorithm -Grid Economy and Business Models -Histogram Methodology -Image or Speech Filtering -Image or Speech Recognition -Information Services -Large-Scale Group Communication -Metadata, Ontologies, and Provenance -Middleware and Toolkits -Monitoring, Management and Organization Tools -Networking and Security -Novel Distributed Applications -Performance Measurement and Modeling -Pervasive Computing -Problem Solving Environments -Programming Models, Tools and Environments -QoS and resource management -Real-time and Embedded Systems -Security and Trust in Grid and Distributed Systems -Sensor Networks -Utility Computing on Global Grids -Web Services and Service-Oriented Architecture -Wireless and Mobile Ad Hoc Networks -Workflow and Multi-agent Systems