Lei Wang, Jiahui Hu, Qian Wang, Yusheng Yang, Pei Lou, An Fang
{"title":"大开放数据辅助机构名称规范化与属性充实","authors":"Lei Wang, Jiahui Hu, Qian Wang, Yusheng Yang, Pei Lou, An Fang","doi":"10.1109/ictc55111.2022.9778332","DOIUrl":null,"url":null,"abstract":"The institution is a critical component in the scientific resources. The peers in the field of the library have explored a variety of practices in the construction of its authority file. But the construction of the institution authority file still faces a series of challenges. The paper proposed a method to accomplish the institutions’ name normalization and attribute enrichment, to link the affiliations’ abbreviation in the article with its attribute value in the open data. It was separated into several parts, including data cleaning and normalization, big open data selection, data linking, big open data management, and export results. Open data was used to aid the name normalization and to provide the resource of the institutions’ attributes. Compared with other practices, the method also considered the management of big data to reduce the cost of data storage and support updating data in the future. In practice, it is feasible and has been applied.","PeriodicalId":123022,"journal":{"name":"2022 3rd Information Communication Technologies Conference (ICTC)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Big Open Data Aided Institutions’ Name Normalization and Attribute Enrichment\",\"authors\":\"Lei Wang, Jiahui Hu, Qian Wang, Yusheng Yang, Pei Lou, An Fang\",\"doi\":\"10.1109/ictc55111.2022.9778332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The institution is a critical component in the scientific resources. The peers in the field of the library have explored a variety of practices in the construction of its authority file. But the construction of the institution authority file still faces a series of challenges. The paper proposed a method to accomplish the institutions’ name normalization and attribute enrichment, to link the affiliations’ abbreviation in the article with its attribute value in the open data. It was separated into several parts, including data cleaning and normalization, big open data selection, data linking, big open data management, and export results. Open data was used to aid the name normalization and to provide the resource of the institutions’ attributes. Compared with other practices, the method also considered the management of big data to reduce the cost of data storage and support updating data in the future. In practice, it is feasible and has been applied.\",\"PeriodicalId\":123022,\"journal\":{\"name\":\"2022 3rd Information Communication Technologies Conference (ICTC)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 3rd Information Communication Technologies Conference (ICTC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ictc55111.2022.9778332\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 3rd Information Communication Technologies Conference (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ictc55111.2022.9778332","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Big Open Data Aided Institutions’ Name Normalization and Attribute Enrichment
The institution is a critical component in the scientific resources. The peers in the field of the library have explored a variety of practices in the construction of its authority file. But the construction of the institution authority file still faces a series of challenges. The paper proposed a method to accomplish the institutions’ name normalization and attribute enrichment, to link the affiliations’ abbreviation in the article with its attribute value in the open data. It was separated into several parts, including data cleaning and normalization, big open data selection, data linking, big open data management, and export results. Open data was used to aid the name normalization and to provide the resource of the institutions’ attributes. Compared with other practices, the method also considered the management of big data to reduce the cost of data storage and support updating data in the future. In practice, it is feasible and has been applied.