{"title":"Mining Unstructured Text at Gigabyte per Second Speeds","authors":"A. Ratner","doi":"10.1109/ICDMW.2008.9","DOIUrl":null,"url":null,"abstract":"Humans communicate with text in thousands of languages, in dozens of scripts, in a variety of binary codes, on millions of topics. There is a need, for both government and commercial applications, to identify these text characteristics to enable follow-on processing such as transcoding, translation, transliteration, routing and prioritization. This paper deals with the implementation of real-time mining of unstructured text on high-speed hardware capable of processing network data streams at gigabyte per second speeds.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"84 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2008.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Humans communicate with text in thousands of languages, in dozens of scripts, in a variety of binary codes, on millions of topics. There is a need, for both government and commercial applications, to identify these text characteristics to enable follow-on processing such as transcoding, translation, transliteration, routing and prioritization. This paper deals with the implementation of real-time mining of unstructured text on high-speed hardware capable of processing network data streams at gigabyte per second speeds.