Hevelyn Sthefany Lima de Carvalho, Vinícius R. P. Borges
{"title":"A Comparative Study of Text Document Representation Approaches Using Point Placement-based Visualizations","authors":"Hevelyn Sthefany Lima de Carvalho, Vinícius R. P. Borges","doi":"10.5753/sibgrapi.est.2021.20035","DOIUrl":null,"url":null,"abstract":"In natural language processing, text representation plays an important role which can affect the performance of language models and machine learning algorithms. Basic vector space models, such as the term frequency-inverse document frequency, became popular approaches to represent text documents. In the last years, approaches based on word embeddings have been proposed to preserve the meaning and semantic relations of words, phrases and texts. In this paper, we focus on studying the influences of different text representations to the quality of the 2D visual spaces (layouts) generated by state-of-art visualizations based on point placement. For that purpose, a visualizationassisted approach is proposed to support users when exploring such representations in classification tasks. Experimental results using two public labeled corpora were conducted to assess the quality of the layouts and to discuss possible relations to the classification performances. The results are promising, indicating that the proposed approach can guide users to understand the relevant patterns of a corpus in each representation.","PeriodicalId":110864,"journal":{"name":"Anais Estendidos da XXXIV Conference on Graphics, Patterns and Images (SIBRAPI Estendido 2021)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais Estendidos da XXXIV Conference on Graphics, Patterns and Images (SIBRAPI Estendido 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/sibgrapi.est.2021.20035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In natural language processing, text representation plays an important role which can affect the performance of language models and machine learning algorithms. Basic vector space models, such as the term frequency-inverse document frequency, became popular approaches to represent text documents. In the last years, approaches based on word embeddings have been proposed to preserve the meaning and semantic relations of words, phrases and texts. In this paper, we focus on studying the influences of different text representations to the quality of the 2D visual spaces (layouts) generated by state-of-art visualizations based on point placement. For that purpose, a visualizationassisted approach is proposed to support users when exploring such representations in classification tasks. Experimental results using two public labeled corpora were conducted to assess the quality of the layouts and to discuss possible relations to the classification performances. The results are promising, indicating that the proposed approach can guide users to understand the relevant patterns of a corpus in each representation.