Alison M Veintimilla, Chintan K Acharya, Connie J Mulligan, Ruogu Fang, Erika Moore
{"title":"TRACE:应用人工智能语言模型从精心整理的生物医学文献中提取祖先信息。","authors":"Alison M Veintimilla, Chintan K Acharya, Connie J Mulligan, Ruogu Fang, Erika Moore","doi":"10.3389/fdgth.2025.1608370","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Ancestry reporting is essential to ensure transparency and proper representation in biomedical studies. However, manually extracting this information from study texts is time-consuming and inefficient. In this paper, we present TRACE (Tool for Researching Ancestry and Cell Extraction), powered by GPT-4 and web-crawling, to automate ancestry identification by detecting cell lines or cultures in texts and tracing their ancestry.</p><p><strong>Methods: </strong>TRACE extracts cell lines and primary cultures from research articles and follows web sources to determine their ancestry. We compared TRACE's outputs to a manually generated database to confirm its performance in identifying and verifying ancestry information.</p><p><strong>Results: </strong>The results reveal an overrepresentation of European/White samples and significant underreporting. TRACE enables large-scale, systematic ancestry analysis-a valuable resource for researchers and agencies assessing biases in sample selection.</p><p><strong>Conclusions: </strong>As an open-source tool, TRACE it facilitates broader use to evaluate and improve ancestry representation in biomedical research.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1608370"},"PeriodicalIF":3.2000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12491185/pdf/","citationCount":"0","resultStr":"{\"title\":\"TRACE: applying AI language models to extract ancestry information from curated biomedical literature.\",\"authors\":\"Alison M Veintimilla, Chintan K Acharya, Connie J Mulligan, Ruogu Fang, Erika Moore\",\"doi\":\"10.3389/fdgth.2025.1608370\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>Ancestry reporting is essential to ensure transparency and proper representation in biomedical studies. However, manually extracting this information from study texts is time-consuming and inefficient. In this paper, we present TRACE (Tool for Researching Ancestry and Cell Extraction), powered by GPT-4 and web-crawling, to automate ancestry identification by detecting cell lines or cultures in texts and tracing their ancestry.</p><p><strong>Methods: </strong>TRACE extracts cell lines and primary cultures from research articles and follows web sources to determine their ancestry. We compared TRACE's outputs to a manually generated database to confirm its performance in identifying and verifying ancestry information.</p><p><strong>Results: </strong>The results reveal an overrepresentation of European/White samples and significant underreporting. TRACE enables large-scale, systematic ancestry analysis-a valuable resource for researchers and agencies assessing biases in sample selection.</p><p><strong>Conclusions: </strong>As an open-source tool, TRACE it facilitates broader use to evaluate and improve ancestry representation in biomedical research.</p>\",\"PeriodicalId\":73078,\"journal\":{\"name\":\"Frontiers in digital health\",\"volume\":\"7 \",\"pages\":\"1608370\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12491185/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fdgth.2025.1608370\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdgth.2025.1608370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
TRACE: applying AI language models to extract ancestry information from curated biomedical literature.
Introduction: Ancestry reporting is essential to ensure transparency and proper representation in biomedical studies. However, manually extracting this information from study texts is time-consuming and inefficient. In this paper, we present TRACE (Tool for Researching Ancestry and Cell Extraction), powered by GPT-4 and web-crawling, to automate ancestry identification by detecting cell lines or cultures in texts and tracing their ancestry.
Methods: TRACE extracts cell lines and primary cultures from research articles and follows web sources to determine their ancestry. We compared TRACE's outputs to a manually generated database to confirm its performance in identifying and verifying ancestry information.
Results: The results reveal an overrepresentation of European/White samples and significant underreporting. TRACE enables large-scale, systematic ancestry analysis-a valuable resource for researchers and agencies assessing biases in sample selection.
Conclusions: As an open-source tool, TRACE it facilitates broader use to evaluate and improve ancestry representation in biomedical research.