{"title":"基于位置和余弦的字符串相似度计算","authors":"Na Cheng, Zhongqing Yu, Kaixi Wang","doi":"10.1109/ICEIEC.2017.8076557","DOIUrl":null,"url":null,"abstract":"E-Business platform needs to have the production selection functionalities according to the products' feature and their cost performance, and at the same time, we need to clean data in the production and sale process, so it is important to calculate similarity between products. This paper proposes a new way to compute the similarity of string by segmenting string into words, numbering the corresponding positions and vectorizing the string. Then the similarity between the strings is computed by computing the cosine angle of the two vectors. Experiments show that the method avoids the maximum or minimum of LCS and GST. In addition, the proposed method also improves the accuracy of similarity calculation.","PeriodicalId":163990,"journal":{"name":"2017 7th IEEE International Conference on Electronics Information and Emergency Communication (ICEIEC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"String similarity computing based on position and cosine\",\"authors\":\"Na Cheng, Zhongqing Yu, Kaixi Wang\",\"doi\":\"10.1109/ICEIEC.2017.8076557\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"E-Business platform needs to have the production selection functionalities according to the products' feature and their cost performance, and at the same time, we need to clean data in the production and sale process, so it is important to calculate similarity between products. This paper proposes a new way to compute the similarity of string by segmenting string into words, numbering the corresponding positions and vectorizing the string. Then the similarity between the strings is computed by computing the cosine angle of the two vectors. Experiments show that the method avoids the maximum or minimum of LCS and GST. In addition, the proposed method also improves the accuracy of similarity calculation.\",\"PeriodicalId\":163990,\"journal\":{\"name\":\"2017 7th IEEE International Conference on Electronics Information and Emergency Communication (ICEIEC)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 7th IEEE International Conference on Electronics Information and Emergency Communication (ICEIEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEIEC.2017.8076557\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 7th IEEE International Conference on Electronics Information and Emergency Communication (ICEIEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIEC.2017.8076557","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
String similarity computing based on position and cosine
E-Business platform needs to have the production selection functionalities according to the products' feature and their cost performance, and at the same time, we need to clean data in the production and sale process, so it is important to calculate similarity between products. This paper proposes a new way to compute the similarity of string by segmenting string into words, numbering the corresponding positions and vectorizing the string. Then the similarity between the strings is computed by computing the cosine angle of the two vectors. Experiments show that the method avoids the maximum or minimum of LCS and GST. In addition, the proposed method also improves the accuracy of similarity calculation.