Automatic Naming of Clusters: A Novel Approach using Information Extracted from Instant Text Messages

Gopa Kumar, N. Vaishnav
{"title":"Automatic Naming of Clusters: A Novel Approach using Information Extracted from Instant Text Messages","authors":"Gopa Kumar, N. Vaishnav","doi":"10.1109/IC3I44769.2018.9007275","DOIUrl":null,"url":null,"abstract":"Instant messages are abundant in today’s world. Typically, around 55 billion text messages are exchanged over these platforms every day. These represent a huge source of information from which useful knowledge can be mined. Instant messages are an accurate description of each user’s characteristics and interests. Right from waking up in the morning to hitting the bed at night, people share everything with their closed ones via an Instant messaging platform. These, therefore give profound insights about a person’s different interests and their preferences towards certain entities over others. In addition to being unstructured, these instant messages present new challenges in the form of shortenings, contractions, letter/number homophones, all of which require dedicated pre-processing steps. First, the characteristics of these instant messages are discussed. Subsequently, the approaches to deal with these challenges are reviewed. Second, this data is used to cluster people of similar interests together. These clusters have to be named by a domain expert in order to gain insights and the naming becomes challenging when the dimensions of the data points increases. This problem is dealt with in the next section where useful features are extracted from the clusters in order to output a unique legitimate name for each cluster. In addition, these steps are elaborated using an unknown dataset and the working of the algorithm is demonstrated on a known dataset for validation.","PeriodicalId":161694,"journal":{"name":"2018 3rd International Conference on Contemporary Computing and Informatics (IC3I)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 3rd International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I44769.2018.9007275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Instant messages are abundant in today’s world. Typically, around 55 billion text messages are exchanged over these platforms every day. These represent a huge source of information from which useful knowledge can be mined. Instant messages are an accurate description of each user’s characteristics and interests. Right from waking up in the morning to hitting the bed at night, people share everything with their closed ones via an Instant messaging platform. These, therefore give profound insights about a person’s different interests and their preferences towards certain entities over others. In addition to being unstructured, these instant messages present new challenges in the form of shortenings, contractions, letter/number homophones, all of which require dedicated pre-processing steps. First, the characteristics of these instant messages are discussed. Subsequently, the approaches to deal with these challenges are reviewed. Second, this data is used to cluster people of similar interests together. These clusters have to be named by a domain expert in order to gain insights and the naming becomes challenging when the dimensions of the data points increases. This problem is dealt with in the next section where useful features are extracted from the clusters in order to output a unique legitimate name for each cluster. In addition, these steps are elaborated using an unknown dataset and the working of the algorithm is demonstrated on a known dataset for validation.
聚类自动命名:一种利用即时文本信息提取信息的新方法
在当今世界,即时消息非常多。一般来说,每天大约有550亿条短信在这些平台上交换。这些代表着一个巨大的信息源,从中可以挖掘出有用的知识。即时消息是对每个用户的特征和兴趣的准确描述。从早上醒来到晚上睡觉,人们通过即时通讯平台与亲近的人分享一切。因此,这些可以深刻地了解一个人的不同兴趣以及他们对某些实体的偏好。除了非结构化之外,这些即时消息还以缩略、缩写、字母/数字同音异义字的形式提出了新的挑战,所有这些都需要专门的预处理步骤。首先,讨论了这些即时消息的特点。随后,对应对这些挑战的方法进行了审查。其次,这些数据被用来将兴趣相似的人聚在一起。这些集群必须由领域专家命名,以便获得洞察力,当数据点的维度增加时,命名变得具有挑战性。下一节将处理这个问题,其中将从集群中提取有用的特性,以便为每个集群输出唯一的合法名称。此外,使用未知数据集详细阐述了这些步骤,并在已知数据集上演示了算法的工作以进行验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信