Ksh. Nareshkumar Singh, H. Devi, K. Robindro, A. Mahanta
{"title":"文本挖掘中文本数据处理的系统研究","authors":"Ksh. Nareshkumar Singh, H. Devi, K. Robindro, A. Mahanta","doi":"10.1109/ICCCIS48478.2019.8974506","DOIUrl":null,"url":null,"abstract":"Advancement in digital technology has led to an increase in the text data exponentially. A field called ‘text mining’ turns the massive amount of text data into high quality or actionable knowledge so that it can help in making the optimal decision, reduces the time and human effort to analyze it. We can perform several tasks on text data including part of speech tagging, parsing text, extract the relevant information, classify the text data, clustering, etc. Text representation is a necessary step to do all these tasks and its effect, especially on the end results of text classification or clustering is highly considerable. The aim of this paper is to highlight the prerequisite procedures to represent text data, different text representation methods, the role of dimensionality reduction, different proximity measures and their evaluation methods to assess results of text clustering or classification.","PeriodicalId":436154,"journal":{"name":"2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Systematic Study on Textual Data Processing in Text Mining\",\"authors\":\"Ksh. Nareshkumar Singh, H. Devi, K. Robindro, A. Mahanta\",\"doi\":\"10.1109/ICCCIS48478.2019.8974506\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advancement in digital technology has led to an increase in the text data exponentially. A field called ‘text mining’ turns the massive amount of text data into high quality or actionable knowledge so that it can help in making the optimal decision, reduces the time and human effort to analyze it. We can perform several tasks on text data including part of speech tagging, parsing text, extract the relevant information, classify the text data, clustering, etc. Text representation is a necessary step to do all these tasks and its effect, especially on the end results of text classification or clustering is highly considerable. The aim of this paper is to highlight the prerequisite procedures to represent text data, different text representation methods, the role of dimensionality reduction, different proximity measures and their evaluation methods to assess results of text clustering or classification.\",\"PeriodicalId\":436154,\"journal\":{\"name\":\"2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCIS48478.2019.8974506\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCIS48478.2019.8974506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Systematic Study on Textual Data Processing in Text Mining
Advancement in digital technology has led to an increase in the text data exponentially. A field called ‘text mining’ turns the massive amount of text data into high quality or actionable knowledge so that it can help in making the optimal decision, reduces the time and human effort to analyze it. We can perform several tasks on text data including part of speech tagging, parsing text, extract the relevant information, classify the text data, clustering, etc. Text representation is a necessary step to do all these tasks and its effect, especially on the end results of text classification or clustering is highly considerable. The aim of this paper is to highlight the prerequisite procedures to represent text data, different text representation methods, the role of dimensionality reduction, different proximity measures and their evaluation methods to assess results of text clustering or classification.