{"title":"A Chinese Knowledge Graph Dataset in the Field of Scientific Fitness.","authors":"Shutong Du, Zhitong Liu, Bingyu Pan","doi":"10.1038/s41597-025-04519-6","DOIUrl":null,"url":null,"abstract":"<p><p>To promote the development of scientific fitness research and practice, we propose the Chinese Knowledge Graph Dataset in the Field of Scientific Fitness (FitKG-CN). This knowledge graph contains over 10,000 fitness-related terms, categorized into eight main groups: body parts, items of exercise, fitness movement, equipment and tools, exercise goals, anatomical structures, nutrients, and technical terms. The construction of FitKG-CN is based on authoritative data sources, undergoing rigorous preprocessing, including noise removal, format standardization, and normalization of entities and relationships. The data is manually annotated on a professional platform and ultimately stored in a Neo4j graph database for visualization. Additionally, we trained a Chinese SpERT model using the manually annotated data to enhance the automation of data processing. The experimental results show that the model achieved an F1 score of 94.05% in entity recognition tasks and 82.00% in relation extraction tasks, validating the effectiveness of the model and improving the scalability of the dataset.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"205"},"PeriodicalIF":5.8000,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11794866/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-025-04519-6","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
To promote the development of scientific fitness research and practice, we propose the Chinese Knowledge Graph Dataset in the Field of Scientific Fitness (FitKG-CN). This knowledge graph contains over 10,000 fitness-related terms, categorized into eight main groups: body parts, items of exercise, fitness movement, equipment and tools, exercise goals, anatomical structures, nutrients, and technical terms. The construction of FitKG-CN is based on authoritative data sources, undergoing rigorous preprocessing, including noise removal, format standardization, and normalization of entities and relationships. The data is manually annotated on a professional platform and ultimately stored in a Neo4j graph database for visualization. Additionally, we trained a Chinese SpERT model using the manually annotated data to enhance the automation of data processing. The experimental results show that the model achieved an F1 score of 94.05% in entity recognition tasks and 82.00% in relation extraction tasks, validating the effectiveness of the model and improving the scalability of the dataset.
期刊介绍:
Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data.
The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.