{"title":"基于知识图谱的可视化文档相似度框架","authors":"Prakhyath Rai, B. Shamantha Rai","doi":"10.1109/DISCOVER55800.2022.9974739","DOIUrl":null,"url":null,"abstract":"Document processing has its foundation laid over the precise and efficient computation of document similarity. With the exponential growth of information resources, the document quantity explodes digitally and there’s always a tendency to equip with tools and frameworks which would assist in capturing the relevant and useful patterns from this free flow of contents. This paper illustrates a text refinement framework to compute the similarity of documents and visualize the similarity analysis. The method proposed in the paper employs knowledge graph technique to aid in visualizing the similarity scores of documents. The visualization is built on top of an information rich corpus extracted from the input documents in the form of triplets. The triplet information corpus then facilitates the computation of similarity score and aids in visualizing the analysis. Prior to triplet generation the input documents are pre-processed to eliminate noise, reduce randomness and lemmatized. The pre-processing and the triplet corpus aid in handling long documents by enhancing the process of similarity computation and visual analysis.","PeriodicalId":264177,"journal":{"name":"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)","volume":"225 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Visualized Document Similarity Framework with the aid of Knowledge Graph\",\"authors\":\"Prakhyath Rai, B. Shamantha Rai\",\"doi\":\"10.1109/DISCOVER55800.2022.9974739\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Document processing has its foundation laid over the precise and efficient computation of document similarity. With the exponential growth of information resources, the document quantity explodes digitally and there’s always a tendency to equip with tools and frameworks which would assist in capturing the relevant and useful patterns from this free flow of contents. This paper illustrates a text refinement framework to compute the similarity of documents and visualize the similarity analysis. The method proposed in the paper employs knowledge graph technique to aid in visualizing the similarity scores of documents. The visualization is built on top of an information rich corpus extracted from the input documents in the form of triplets. The triplet information corpus then facilitates the computation of similarity score and aids in visualizing the analysis. Prior to triplet generation the input documents are pre-processed to eliminate noise, reduce randomness and lemmatized. The pre-processing and the triplet corpus aid in handling long documents by enhancing the process of similarity computation and visual analysis.\",\"PeriodicalId\":264177,\"journal\":{\"name\":\"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)\",\"volume\":\"225 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DISCOVER55800.2022.9974739\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics ( DISCOVER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DISCOVER55800.2022.9974739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Visualized Document Similarity Framework with the aid of Knowledge Graph
Document processing has its foundation laid over the precise and efficient computation of document similarity. With the exponential growth of information resources, the document quantity explodes digitally and there’s always a tendency to equip with tools and frameworks which would assist in capturing the relevant and useful patterns from this free flow of contents. This paper illustrates a text refinement framework to compute the similarity of documents and visualize the similarity analysis. The method proposed in the paper employs knowledge graph technique to aid in visualizing the similarity scores of documents. The visualization is built on top of an information rich corpus extracted from the input documents in the form of triplets. The triplet information corpus then facilitates the computation of similarity score and aids in visualizing the analysis. Prior to triplet generation the input documents are pre-processed to eliminate noise, reduce randomness and lemmatized. The pre-processing and the triplet corpus aid in handling long documents by enhancing the process of similarity computation and visual analysis.