{"title":"基于内容的生物医学数据集推荐系统","authors":"Zitong Zhang, Ashraf Yaseen","doi":"10.1109/icict58900.2023.00040","DOIUrl":null,"url":null,"abstract":"Nowadays, with the rapid development of cloud data and online collaboration platforms, there is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. On one hand, sharing data with collaborators increases the visibility of the work. On the other hand, the abundance of data on multiple platforms makes it hard for researchers to find data relevant to their own research. To overcome this challenge, a dataset recommendation system capable of finding relevant datasets from multiple resources would be helpful. In the past two decades, few dataset recommendation methods have been implemented, that are mostly domain-specific or simply recommend datasets based on keywords. We believe a general dataset recommender system that recommends datasets with information either extracted from another dataset or supplied by researchers can enhance researchers’ efficiency in searching for relevant data and significantly improve their research efficiency. This work adopts an information retrieval (IR) paradigm for dataset recommendation. By extracting summary information from each dataset and generating a profile for each, we use and compare multiple content-based recommendation methods to recommend the most-relevant datasets in GEO, SRA, and several other repositories. Our results and evaluations prove the usefulness and need for such system.","PeriodicalId":425057,"journal":{"name":"2023 6th International Conference on Information and Computer Technologies (ICICT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Content-Based Dataset Recommendation System for Biomedical Datasets\",\"authors\":\"Zitong Zhang, Ashraf Yaseen\",\"doi\":\"10.1109/icict58900.2023.00040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, with the rapid development of cloud data and online collaboration platforms, there is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. On one hand, sharing data with collaborators increases the visibility of the work. On the other hand, the abundance of data on multiple platforms makes it hard for researchers to find data relevant to their own research. To overcome this challenge, a dataset recommendation system capable of finding relevant datasets from multiple resources would be helpful. In the past two decades, few dataset recommendation methods have been implemented, that are mostly domain-specific or simply recommend datasets based on keywords. We believe a general dataset recommender system that recommends datasets with information either extracted from another dataset or supplied by researchers can enhance researchers’ efficiency in searching for relevant data and significantly improve their research efficiency. This work adopts an information retrieval (IR) paradigm for dataset recommendation. By extracting summary information from each dataset and generating a profile for each, we use and compare multiple content-based recommendation methods to recommend the most-relevant datasets in GEO, SRA, and several other repositories. Our results and evaluations prove the usefulness and need for such system.\",\"PeriodicalId\":425057,\"journal\":{\"name\":\"2023 6th International Conference on Information and Computer Technologies (ICICT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 6th International Conference on Information and Computer Technologies (ICICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/icict58900.2023.00040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Conference on Information and Computer Technologies (ICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icict58900.2023.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Content-Based Dataset Recommendation System for Biomedical Datasets
Nowadays, with the rapid development of cloud data and online collaboration platforms, there is a growing trend among researchers to make their data publicly available for experimental reproducibility and data reusability. On one hand, sharing data with collaborators increases the visibility of the work. On the other hand, the abundance of data on multiple platforms makes it hard for researchers to find data relevant to their own research. To overcome this challenge, a dataset recommendation system capable of finding relevant datasets from multiple resources would be helpful. In the past two decades, few dataset recommendation methods have been implemented, that are mostly domain-specific or simply recommend datasets based on keywords. We believe a general dataset recommender system that recommends datasets with information either extracted from another dataset or supplied by researchers can enhance researchers’ efficiency in searching for relevant data and significantly improve their research efficiency. This work adopts an information retrieval (IR) paradigm for dataset recommendation. By extracting summary information from each dataset and generating a profile for each, we use and compare multiple content-based recommendation methods to recommend the most-relevant datasets in GEO, SRA, and several other repositories. Our results and evaluations prove the usefulness and need for such system.