{"title":"词嵌入技术与文本相似度度量的比较分析","authors":"Nagothi Vaibhav Anjani Kumar, S. Mehrotra","doi":"10.1109/IC3I56241.2022.10072927","DOIUrl":null,"url":null,"abstract":"Digital text data is increasing daily in various uses, such as clinical notes, lab test reports, research articles, etc. Most of the mentioned data are unstructured. While searching for information lot of unrelated information is returned against the query. The paper presents a comparative analysis of word embedding techniques and text similarity measures to determine how similar two bits of text are in respective lexical, semantic characteristics, and closeness. The principal aim of this paper is to perform pre-processing process of medical history notes of the patient's data followed by word embedding techniques such as Word2Vec, FastText, and Doc2Vec.","PeriodicalId":274660,"journal":{"name":"2022 5th International Conference on Contemporary Computing and Informatics (IC3I)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Analysis of word embedding techniques and text similarity Measures\",\"authors\":\"Nagothi Vaibhav Anjani Kumar, S. Mehrotra\",\"doi\":\"10.1109/IC3I56241.2022.10072927\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Digital text data is increasing daily in various uses, such as clinical notes, lab test reports, research articles, etc. Most of the mentioned data are unstructured. While searching for information lot of unrelated information is returned against the query. The paper presents a comparative analysis of word embedding techniques and text similarity measures to determine how similar two bits of text are in respective lexical, semantic characteristics, and closeness. The principal aim of this paper is to perform pre-processing process of medical history notes of the patient's data followed by word embedding techniques such as Word2Vec, FastText, and Doc2Vec.\",\"PeriodicalId\":274660,\"journal\":{\"name\":\"2022 5th International Conference on Contemporary Computing and Informatics (IC3I)\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 5th International Conference on Contemporary Computing and Informatics (IC3I)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC3I56241.2022.10072927\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I56241.2022.10072927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Analysis of word embedding techniques and text similarity Measures
Digital text data is increasing daily in various uses, such as clinical notes, lab test reports, research articles, etc. Most of the mentioned data are unstructured. While searching for information lot of unrelated information is returned against the query. The paper presents a comparative analysis of word embedding techniques and text similarity measures to determine how similar two bits of text are in respective lexical, semantic characteristics, and closeness. The principal aim of this paper is to perform pre-processing process of medical history notes of the patient's data followed by word embedding techniques such as Word2Vec, FastText, and Doc2Vec.