J. Delgado, Fernando Galarraga, Walter Fuertes, T. Toulkeridis, César Villacís, Fidel Castro
{"title":"一种整合政府数据库的实体名称识别算法","authors":"J. Delgado, Fernando Galarraga, Walter Fuertes, T. Toulkeridis, César Villacís, Fidel Castro","doi":"10.1109/ICEDEG.2016.7461472","DOIUrl":null,"url":null,"abstract":"Based on the analysis of existing name recognition techniques, an improvement in efficiency of such undertaking in matching citizen's registers is proposed with the introduction of a new algorithm. In order to fulfill the mentioned conditions, a case study initiates with a spin of two representative but random samples, which assess real-life circumstances. The first sample contains a great variety of tourist's names from all over the world collected from visitors of the Galápagos Islands in the past three years, at about the last population census. The second sample has been used with islands' resident names out of the last census. The used algorithm matches the sampled with those of the citizens taken from the database of National Registration Identity Card Number Department of Ecuador in two steps. The first step separates the exact coincidences and identifies the top approximate name's coincidences through a phonetic code comparison. The second step includes a refinement of the first one carried out by a distance edition technique. To offer evidence on the effectiveness of those steps, the accuracy and viability quality factors of the algorithm has been evaluated. A final confirmation of the obtained results has been given by the use of a t Pair Test, which determines if there appear any significant differences of those factors prior and after the execution of the new algorithm.","PeriodicalId":430614,"journal":{"name":"2016 Third International Conference on eDemocracy & eGovernment (ICEDEG)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A proposal of an entity name recognition algorithm to integrate governmental databases\",\"authors\":\"J. Delgado, Fernando Galarraga, Walter Fuertes, T. Toulkeridis, César Villacís, Fidel Castro\",\"doi\":\"10.1109/ICEDEG.2016.7461472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Based on the analysis of existing name recognition techniques, an improvement in efficiency of such undertaking in matching citizen's registers is proposed with the introduction of a new algorithm. In order to fulfill the mentioned conditions, a case study initiates with a spin of two representative but random samples, which assess real-life circumstances. The first sample contains a great variety of tourist's names from all over the world collected from visitors of the Galápagos Islands in the past three years, at about the last population census. The second sample has been used with islands' resident names out of the last census. The used algorithm matches the sampled with those of the citizens taken from the database of National Registration Identity Card Number Department of Ecuador in two steps. The first step separates the exact coincidences and identifies the top approximate name's coincidences through a phonetic code comparison. The second step includes a refinement of the first one carried out by a distance edition technique. To offer evidence on the effectiveness of those steps, the accuracy and viability quality factors of the algorithm has been evaluated. A final confirmation of the obtained results has been given by the use of a t Pair Test, which determines if there appear any significant differences of those factors prior and after the execution of the new algorithm.\",\"PeriodicalId\":430614,\"journal\":{\"name\":\"2016 Third International Conference on eDemocracy & eGovernment (ICEDEG)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Third International Conference on eDemocracy & eGovernment (ICEDEG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEDEG.2016.7461472\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Third International Conference on eDemocracy & eGovernment (ICEDEG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEDEG.2016.7461472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A proposal of an entity name recognition algorithm to integrate governmental databases
Based on the analysis of existing name recognition techniques, an improvement in efficiency of such undertaking in matching citizen's registers is proposed with the introduction of a new algorithm. In order to fulfill the mentioned conditions, a case study initiates with a spin of two representative but random samples, which assess real-life circumstances. The first sample contains a great variety of tourist's names from all over the world collected from visitors of the Galápagos Islands in the past three years, at about the last population census. The second sample has been used with islands' resident names out of the last census. The used algorithm matches the sampled with those of the citizens taken from the database of National Registration Identity Card Number Department of Ecuador in two steps. The first step separates the exact coincidences and identifies the top approximate name's coincidences through a phonetic code comparison. The second step includes a refinement of the first one carried out by a distance edition technique. To offer evidence on the effectiveness of those steps, the accuracy and viability quality factors of the algorithm has been evaluated. A final confirmation of the obtained results has been given by the use of a t Pair Test, which determines if there appear any significant differences of those factors prior and after the execution of the new algorithm.