{"title":"Classification of text to subject using LDA","authors":"Douglas A. Smith, Charles McManis","doi":"10.1109/ICOSC.2015.7050791","DOIUrl":null,"url":null,"abstract":"Blekko Inc., an Internet search company, has divided web sites into subjects we call slash tags. Text from these web sites can be processed using Latent Dirichlet Allocations (LDA), to determine sets of topics for each subject. These topics can then be used to classify any text to determine the subject. We will discuss the methods used to do this; the details of the corpus used for training and testing; and results on how well the system works to classify a priori known text.","PeriodicalId":126701,"journal":{"name":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOSC.2015.7050791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Blekko Inc., an Internet search company, has divided web sites into subjects we call slash tags. Text from these web sites can be processed using Latent Dirichlet Allocations (LDA), to determine sets of topics for each subject. These topics can then be used to classify any text to determine the subject. We will discuss the methods used to do this; the details of the corpus used for training and testing; and results on how well the system works to classify a priori known text.