{"title":"聚类自动命名:一种利用即时文本信息提取信息的新方法","authors":"Gopa Kumar, N. Vaishnav","doi":"10.1109/IC3I44769.2018.9007275","DOIUrl":null,"url":null,"abstract":"Instant messages are abundant in today’s world. Typically, around 55 billion text messages are exchanged over these platforms every day. These represent a huge source of information from which useful knowledge can be mined. Instant messages are an accurate description of each user’s characteristics and interests. Right from waking up in the morning to hitting the bed at night, people share everything with their closed ones via an Instant messaging platform. These, therefore give profound insights about a person’s different interests and their preferences towards certain entities over others. In addition to being unstructured, these instant messages present new challenges in the form of shortenings, contractions, letter/number homophones, all of which require dedicated pre-processing steps. First, the characteristics of these instant messages are discussed. Subsequently, the approaches to deal with these challenges are reviewed. Second, this data is used to cluster people of similar interests together. These clusters have to be named by a domain expert in order to gain insights and the naming becomes challenging when the dimensions of the data points increases. This problem is dealt with in the next section where useful features are extracted from the clusters in order to output a unique legitimate name for each cluster. In addition, these steps are elaborated using an unknown dataset and the working of the algorithm is demonstrated on a known dataset for validation.","PeriodicalId":161694,"journal":{"name":"2018 3rd International Conference on Contemporary Computing and Informatics (IC3I)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic Naming of Clusters: A Novel Approach using Information Extracted from Instant Text Messages\",\"authors\":\"Gopa Kumar, N. Vaishnav\",\"doi\":\"10.1109/IC3I44769.2018.9007275\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Instant messages are abundant in today’s world. Typically, around 55 billion text messages are exchanged over these platforms every day. These represent a huge source of information from which useful knowledge can be mined. Instant messages are an accurate description of each user’s characteristics and interests. Right from waking up in the morning to hitting the bed at night, people share everything with their closed ones via an Instant messaging platform. These, therefore give profound insights about a person’s different interests and their preferences towards certain entities over others. In addition to being unstructured, these instant messages present new challenges in the form of shortenings, contractions, letter/number homophones, all of which require dedicated pre-processing steps. First, the characteristics of these instant messages are discussed. Subsequently, the approaches to deal with these challenges are reviewed. Second, this data is used to cluster people of similar interests together. These clusters have to be named by a domain expert in order to gain insights and the naming becomes challenging when the dimensions of the data points increases. This problem is dealt with in the next section where useful features are extracted from the clusters in order to output a unique legitimate name for each cluster. In addition, these steps are elaborated using an unknown dataset and the working of the algorithm is demonstrated on a known dataset for validation.\",\"PeriodicalId\":161694,\"journal\":{\"name\":\"2018 3rd International Conference on Contemporary Computing and Informatics (IC3I)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 3rd International Conference on Contemporary Computing and Informatics (IC3I)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC3I44769.2018.9007275\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 3rd International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I44769.2018.9007275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic Naming of Clusters: A Novel Approach using Information Extracted from Instant Text Messages
Instant messages are abundant in today’s world. Typically, around 55 billion text messages are exchanged over these platforms every day. These represent a huge source of information from which useful knowledge can be mined. Instant messages are an accurate description of each user’s characteristics and interests. Right from waking up in the morning to hitting the bed at night, people share everything with their closed ones via an Instant messaging platform. These, therefore give profound insights about a person’s different interests and their preferences towards certain entities over others. In addition to being unstructured, these instant messages present new challenges in the form of shortenings, contractions, letter/number homophones, all of which require dedicated pre-processing steps. First, the characteristics of these instant messages are discussed. Subsequently, the approaches to deal with these challenges are reviewed. Second, this data is used to cluster people of similar interests together. These clusters have to be named by a domain expert in order to gain insights and the naming becomes challenging when the dimensions of the data points increases. This problem is dealt with in the next section where useful features are extracted from the clusters in order to output a unique legitimate name for each cluster. In addition, these steps are elaborated using an unknown dataset and the working of the algorithm is demonstrated on a known dataset for validation.