Hardware-Accelerated Parser for Extraction of Metadata in Semantic Network Content

J. Moscola, Young-Hee Cho, J. Lockwood
{"title":"Hardware-Accelerated Parser for Extraction of Metadata in Semantic Network Content","authors":"J. Moscola, Young-Hee Cho, J. Lockwood","doi":"10.1109/AERO.2007.352793","DOIUrl":null,"url":null,"abstract":"We have implemented a new network information processing system using reconfigurable hardware that scans volumes of data in real-time. One of the key functions of the system is to extract semantic information. Before we can determine the meaning of text, we must identify its language. In a previous project, we have implemented an N-gram based language identifier that can process up to 1 Gbps throughput. However, a large percentage of computer network traffic, such as email and Web data, consists of markup information such as tags and protocol specific options. This additional data interferes with the language identification process causing decreased accuracy. Thus, we developed a hardware architecture for configurable application level processing. Our Application Level Processing System (ALPS) is a custom processor that is automatically generated using syntactic structure of the content. The resulting circuit is mapped on to a reconfigurable device to efficiently extract only the relevant data for the language identifier. To illustrate the effectiveness of the architecture, we have implemented a system that can process electronic mail. Our experiments show that ALPS can improve the accuracy of the hardware language identifier by up to a factor of 200 as compared to a system that does not decode the application-level protocol data.","PeriodicalId":6295,"journal":{"name":"2007 IEEE Aerospace Conference","volume":"151 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2007-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE Aerospace Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AERO.2007.352793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

We have implemented a new network information processing system using reconfigurable hardware that scans volumes of data in real-time. One of the key functions of the system is to extract semantic information. Before we can determine the meaning of text, we must identify its language. In a previous project, we have implemented an N-gram based language identifier that can process up to 1 Gbps throughput. However, a large percentage of computer network traffic, such as email and Web data, consists of markup information such as tags and protocol specific options. This additional data interferes with the language identification process causing decreased accuracy. Thus, we developed a hardware architecture for configurable application level processing. Our Application Level Processing System (ALPS) is a custom processor that is automatically generated using syntactic structure of the content. The resulting circuit is mapped on to a reconfigurable device to efficiently extract only the relevant data for the language identifier. To illustrate the effectiveness of the architecture, we have implemented a system that can process electronic mail. Our experiments show that ALPS can improve the accuracy of the hardware language identifier by up to a factor of 200 as compared to a system that does not decode the application-level protocol data.
语义网络内容中元数据提取的硬件加速解析器
我们已经实现了一个新的网络信息处理系统,使用可重构硬件实时扫描大量数据。语义信息提取是该系统的关键功能之一。在我们确定文本的意义之前,我们必须识别它的语言。在之前的一个项目中,我们实现了一个基于N-gram的语言标识符,可以处理高达1gbps的吞吐量。然而,很大比例的计算机网络流量,如电子邮件和Web数据,由标记信息(如标签和特定于协议的选项)组成。这些额外的数据会干扰语言识别过程,导致准确性下降。因此,我们开发了一个用于可配置应用程序级处理的硬件体系结构。我们的应用程序级处理系统(ALPS)是使用内容的语法结构自动生成的自定义处理器。所得到的电路被映射到一个可重构的设备上,以有效地提取语言标识符的相关数据。为了说明该体系结构的有效性,我们实现了一个可以处理电子邮件的系统。我们的实验表明,与不解码应用层协议数据的系统相比,ALPS可以将硬件语言标识符的准确性提高200倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信