{"title":"Text Extraction and Mining Methods Used in Data Science","authors":"K. Deepa, P. Perumal, B. Mathivanan","doi":"10.1109/ICECAA58104.2023.10212101","DOIUrl":null,"url":null,"abstract":"Online Customer Reviews (OCRs) make it difficult for firms to examine them due to their number, diversity, pace, and validity. The big data analytics study predicts OCR reading and its usefulness. Titles with positive emotion and sentimental reviews with neutral polarity attract more readers. Online merchants may use this work to build scale automated processes for sorting and categorizing huge OCR data, benefiting vendors and consumers. Current OCR sorting approaches may prejudice readership and usefulness. Python crawled, processed, and displayed data using Natural Language Processing (NLP). The crawling dataset collected literature using a Pubmed Application Programming Interface (API) module. Natural Language Toolkit (NLTK) processed text data. Tokens were processed into bigrams and trigrams using n-grams. According to study abstracts, West Java has the most stunting research. Text mining and NLP may enhance oral history and historical archaeology. Text mining algorithms were intended for enormous data and public texts, making them inappropriate for historical and archaeological interpretation. Text analysis can effectively handle and evaluate vast amounts of data, which may substantially enrich historical archaeology study, especially when dealing with digital data banks or extensive texts.","PeriodicalId":114624,"journal":{"name":"2023 2nd International Conference on Edge Computing and Applications (ICECAA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd International Conference on Edge Computing and Applications (ICECAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECAA58104.2023.10212101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Online Customer Reviews (OCRs) make it difficult for firms to examine them due to their number, diversity, pace, and validity. The big data analytics study predicts OCR reading and its usefulness. Titles with positive emotion and sentimental reviews with neutral polarity attract more readers. Online merchants may use this work to build scale automated processes for sorting and categorizing huge OCR data, benefiting vendors and consumers. Current OCR sorting approaches may prejudice readership and usefulness. Python crawled, processed, and displayed data using Natural Language Processing (NLP). The crawling dataset collected literature using a Pubmed Application Programming Interface (API) module. Natural Language Toolkit (NLTK) processed text data. Tokens were processed into bigrams and trigrams using n-grams. According to study abstracts, West Java has the most stunting research. Text mining and NLP may enhance oral history and historical archaeology. Text mining algorithms were intended for enormous data and public texts, making them inappropriate for historical and archaeological interpretation. Text analysis can effectively handle and evaluate vast amounts of data, which may substantially enrich historical archaeology study, especially when dealing with digital data banks or extensive texts.