Felipe Penhorate Carvalho da Fonseca, Luciano Antonio Digiampietri
{"title":"研究人员专业领域推断","authors":"Felipe Penhorate Carvalho da Fonseca, Luciano Antonio Digiampietri","doi":"10.1109/bracis.2018.00020","DOIUrl":null,"url":null,"abstract":"Nowadays, there is a wide range of academic data available on the web. This information allows solving tasks such as the discovery of specialists in a given area, identification of potential scholarship holders, suggestion of collaborators, among others. However, the success of these tasks depends on the quality of the data used, since incorrect or incomplete data tend to impair the performance of the applied algorithms. The present work utilized machine learning techniques to help to infer the researchers' areas based on the data registered in the Lattes Platform, using the subareas as a case study. The subareas present a variant of the original problem with more challenges, as the number of classes is bigger. The goal of this paper is to analyze the contribution of factors such as social network metrics, the language of the titles and the hierarchical structure of the areas in the performance of the algorithms, and propose a new approach combining different characteristics. The proposed approach can be applied to different academic data, but the data from the Lattes Platform was used for the tests and validations of the proposed solution. As a result, we identified that the social network metrics and the numerical representations of the data improved inference accuracy when compared to state-of-the-art techniques, and the use of the hierarchical structure information achieved even better results.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Inference of Researchers' Area of Expertise\",\"authors\":\"Felipe Penhorate Carvalho da Fonseca, Luciano Antonio Digiampietri\",\"doi\":\"10.1109/bracis.2018.00020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, there is a wide range of academic data available on the web. This information allows solving tasks such as the discovery of specialists in a given area, identification of potential scholarship holders, suggestion of collaborators, among others. However, the success of these tasks depends on the quality of the data used, since incorrect or incomplete data tend to impair the performance of the applied algorithms. The present work utilized machine learning techniques to help to infer the researchers' areas based on the data registered in the Lattes Platform, using the subareas as a case study. The subareas present a variant of the original problem with more challenges, as the number of classes is bigger. The goal of this paper is to analyze the contribution of factors such as social network metrics, the language of the titles and the hierarchical structure of the areas in the performance of the algorithms, and propose a new approach combining different characteristics. The proposed approach can be applied to different academic data, but the data from the Lattes Platform was used for the tests and validations of the proposed solution. As a result, we identified that the social network metrics and the numerical representations of the data improved inference accuracy when compared to state-of-the-art techniques, and the use of the hierarchical structure information achieved even better results.\",\"PeriodicalId\":405190,\"journal\":{\"name\":\"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/bracis.2018.00020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bracis.2018.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Nowadays, there is a wide range of academic data available on the web. This information allows solving tasks such as the discovery of specialists in a given area, identification of potential scholarship holders, suggestion of collaborators, among others. However, the success of these tasks depends on the quality of the data used, since incorrect or incomplete data tend to impair the performance of the applied algorithms. The present work utilized machine learning techniques to help to infer the researchers' areas based on the data registered in the Lattes Platform, using the subareas as a case study. The subareas present a variant of the original problem with more challenges, as the number of classes is bigger. The goal of this paper is to analyze the contribution of factors such as social network metrics, the language of the titles and the hierarchical structure of the areas in the performance of the algorithms, and propose a new approach combining different characteristics. The proposed approach can be applied to different academic data, but the data from the Lattes Platform was used for the tests and validations of the proposed solution. As a result, we identified that the social network metrics and the numerical representations of the data improved inference accuracy when compared to state-of-the-art techniques, and the use of the hierarchical structure information achieved even better results.