文本数据中的语义表示

Triveni Lal Pa, Madhu Kumari, Tajinder Singh, Mohammad Ahsan
{"title":"文本数据中的语义表示","authors":"Triveni Lal Pa, Madhu Kumari, Tajinder Singh, Mohammad Ahsan","doi":"10.14257/ijgdc.2018.11.9.06","DOIUrl":null,"url":null,"abstract":"Automatic text mining processes and other sophisticated natural language processing constructs need realistic representations of text/documents which embed semantics efficiently. All the representations work on the notion that every data contains different explanatory factors (attributes). In this article, we exploit these explanatory factors to study and compare various semantic representation methods for text documents. The article critically reviews recent trends in the area of semi-supervised semantic representations, covering cutting-edge methods in distributed representations such as embeddings. This article gives a broad and synthesized description of various forms of text representations, presented in their chronological order ranging from BoW models to the most recent embeddings learning. Conclusively, various findings taken together provide valuable pointers for researchers looking to work in the field of semantic representations. In addition, the article also shows that one need to develop a model for learning universal embeddings in unsupervised/semi-supervised settings that incorporate contextual as well as word-order information, with language independent features and which would be feasible for large dataset.","PeriodicalId":46000,"journal":{"name":"International Journal of Grid and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Semantic Representations in Text Data\",\"authors\":\"Triveni Lal Pa, Madhu Kumari, Tajinder Singh, Mohammad Ahsan\",\"doi\":\"10.14257/ijgdc.2018.11.9.06\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic text mining processes and other sophisticated natural language processing constructs need realistic representations of text/documents which embed semantics efficiently. All the representations work on the notion that every data contains different explanatory factors (attributes). In this article, we exploit these explanatory factors to study and compare various semantic representation methods for text documents. The article critically reviews recent trends in the area of semi-supervised semantic representations, covering cutting-edge methods in distributed representations such as embeddings. This article gives a broad and synthesized description of various forms of text representations, presented in their chronological order ranging from BoW models to the most recent embeddings learning. Conclusively, various findings taken together provide valuable pointers for researchers looking to work in the field of semantic representations. In addition, the article also shows that one need to develop a model for learning universal embeddings in unsupervised/semi-supervised settings that incorporate contextual as well as word-order information, with language independent features and which would be feasible for large dataset.\",\"PeriodicalId\":46000,\"journal\":{\"name\":\"International Journal of Grid and Distributed Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Grid and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14257/ijgdc.2018.11.9.06\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Grid and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/ijgdc.2018.11.9.06","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

自动文本挖掘过程和其他复杂的自然语言处理结构需要有效嵌入语义的文本/文档的真实表示。所有的表示都基于这样一个概念,即每个数据都包含不同的解释因素(属性)。在本文中,我们利用这些解释因素来研究和比较文本文档的各种语义表示方法。这篇文章批判性地回顾了半监督语义表示领域的最新趋势,涵盖了分布式表示(如嵌入)的前沿方法。本文对各种形式的文本表示进行了广泛而综合的描述,按时间顺序排列,从BoW模型到最新的嵌入学习。总之,各种研究结果为希望在语义表征领域工作的研究人员提供了有价值的指导。此外,文章还表明,需要开发一种在无监督/半监督环境中学习通用嵌入的模型,该模型结合了上下文和语序信息,具有独立于语言的特征,并且对于大型数据集是可行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Semantic Representations in Text Data
Automatic text mining processes and other sophisticated natural language processing constructs need realistic representations of text/documents which embed semantics efficiently. All the representations work on the notion that every data contains different explanatory factors (attributes). In this article, we exploit these explanatory factors to study and compare various semantic representation methods for text documents. The article critically reviews recent trends in the area of semi-supervised semantic representations, covering cutting-edge methods in distributed representations such as embeddings. This article gives a broad and synthesized description of various forms of text representations, presented in their chronological order ranging from BoW models to the most recent embeddings learning. Conclusively, various findings taken together provide valuable pointers for researchers looking to work in the field of semantic representations. In addition, the article also shows that one need to develop a model for learning universal embeddings in unsupervised/semi-supervised settings that incorporate contextual as well as word-order information, with language independent features and which would be feasible for large dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Grid and Distributed Computing
International Journal of Grid and Distributed Computing COMPUTER SCIENCE, SOFTWARE ENGINEERING-
自引率
0.00%
发文量
0
期刊介绍: IJGDC aims to facilitate and support research related to control and automation technology and its applications. Our Journal provides a chance for academic and industry professionals to discuss recent progress in the area of control and automation. To bridge the gap of users who do not have access to major databases where one should pay for every downloaded article; this online publication platform is open to all readers as part of our commitment to global scientific society. Journal Topics: -Architectures and Fabrics -Autonomic and Adaptive Systems -Cluster and Grid Integration -Creation and Management of Virtual Enterprises and Organizations -Dependable and Survivable Distributed Systems -Distributed and Large-Scale Data Access and Management -Distributed Multimedia Systems -Distributed Trust Management -eScience and eBusiness Applications -Fuzzy Algorithm -Grid Economy and Business Models -Histogram Methodology -Image or Speech Filtering -Image or Speech Recognition -Information Services -Large-Scale Group Communication -Metadata, Ontologies, and Provenance -Middleware and Toolkits -Monitoring, Management and Organization Tools -Networking and Security -Novel Distributed Applications -Performance Measurement and Modeling -Pervasive Computing -Problem Solving Environments -Programming Models, Tools and Environments -QoS and resource management -Real-time and Embedded Systems -Security and Trust in Grid and Distributed Systems -Sensor Networks -Utility Computing on Global Grids -Web Services and Service-Oriented Architecture -Wireless and Mobile Ad Hoc Networks -Workflow and Multi-agent Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信