以每秒千兆字节的速度挖掘非结构化文本

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI:10.1109/ICDMW.2008.9

A. Ratner

{"title":"以每秒千兆字节的速度挖掘非结构化文本","authors":"A. Ratner","doi":"10.1109/ICDMW.2008.9","DOIUrl":null,"url":null,"abstract":"Humans communicate with text in thousands of languages, in dozens of scripts, in a variety of binary codes, on millions of topics. There is a need, for both government and commercial applications, to identify these text characteristics to enable follow-on processing such as transcoding, translation, transliteration, routing and prioritization. This paper deals with the implementation of real-time mining of unstructured text on high-speed hardware capable of processing network data streams at gigabyte per second speeds.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"84 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mining Unstructured Text at Gigabyte per Second Speeds\",\"authors\":\"A. Ratner\",\"doi\":\"10.1109/ICDMW.2008.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Humans communicate with text in thousands of languages, in dozens of scripts, in a variety of binary codes, on millions of topics. There is a need, for both government and commercial applications, to identify these text characteristics to enable follow-on processing such as transcoding, translation, transliteration, routing and prioritization. This paper deals with the implementation of real-time mining of unstructured text on high-speed hardware capable of processing network data streams at gigabyte per second speeds.\",\"PeriodicalId\":175955,\"journal\":{\"name\":\"2008 IEEE International Conference on Data Mining Workshops\",\"volume\":\"84 12\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Data Mining Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2008.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2008.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人类用数千种语言、数十种脚本、各种二进制代码、数百万个主题进行文本交流。政府和商业应用程序都需要识别这些文本特征，以便进行后续处理，如转码、翻译、音译、路由和优先级排序。本文讨论了在能够以每秒千兆字节的速度处理网络数据流的高速硬件上实现非结构化文本的实时挖掘。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mining Unstructured Text at Gigabyte per Second Speeds

Humans communicate with text in thousands of languages, in dozens of scripts, in a variety of binary codes, on millions of topics. There is a need, for both government and commercial applications, to identify these text characteristics to enable follow-on processing such as transcoding, translation, transliteration, routing and prioritization. This paper deals with the implementation of real-time mining of unstructured text on high-speed hardware capable of processing network data streams at gigabyte per second speeds.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量