Jacqueline G Urdang, Stephanie Masters, Nneoma Edokobi, Chitra Mukherjee, Arnib Quazi, David A Liem, Monica Ahrens, Xuan Wang, Megan Whitham
{"title":"Text phrase-mining in identifying and classifying maternal proteins and genes across preeclampsia and similar pathologies.","authors":"Jacqueline G Urdang, Stephanie Masters, Nneoma Edokobi, Chitra Mukherjee, Arnib Quazi, David A Liem, Monica Ahrens, Xuan Wang, Megan Whitham","doi":"10.14814/phy2.70262","DOIUrl":null,"url":null,"abstract":"<p><p>This study aims to demonstrate that text phrase-mining and natural language processing (NLP) can annotate huge quantities of obstetrics textual data for the discovery and evaluation of maternal protein/gene (MPG)-disease interactions involved in the preeclampsia pathway. We employ a phrase-mining/NLP pipeline to evaluate unique MPGs involved in six cardiovascular derangements with overlapping presentations during pregnancy. The diseases were matched with Medical Subject Headings. A textual corpus was developed from abstracts matched to these terms through PubMed. Fourty-four MPGs were identified with respect to the diseases. Processing was performed, with unique scores for each MPG-disease pair. Components of the score were calculated and weighted for distinctness, integrity, and popularity. Statistical analyses were conducted for the examination of protein-disease relationships. Fourty-four MPGs with known associations to cardiovascular disease and preeclampsia pathways were identified among the 6 diseases. MPGs shared across the greatest number of disease states were implicated in: (1) angiogenesis and vasoconstriction, (2) hemodynamic regulation, (3) hormonal regulation of metabolism, and (4) inflammation. NLP and text phrase-mining are successfully applied to Obstetrics abstracts with accuracy and speed. This approach holds promise in synthesizing large volumes of data for presenting trends in the Obstetric literature and for the identification of promising biomarkers.</p>","PeriodicalId":20083,"journal":{"name":"Physiological Reports","volume":"13 6","pages":"e70262"},"PeriodicalIF":2.2000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11919630/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physiological Reports","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14814/phy2.70262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHYSIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
This study aims to demonstrate that text phrase-mining and natural language processing (NLP) can annotate huge quantities of obstetrics textual data for the discovery and evaluation of maternal protein/gene (MPG)-disease interactions involved in the preeclampsia pathway. We employ a phrase-mining/NLP pipeline to evaluate unique MPGs involved in six cardiovascular derangements with overlapping presentations during pregnancy. The diseases were matched with Medical Subject Headings. A textual corpus was developed from abstracts matched to these terms through PubMed. Fourty-four MPGs were identified with respect to the diseases. Processing was performed, with unique scores for each MPG-disease pair. Components of the score were calculated and weighted for distinctness, integrity, and popularity. Statistical analyses were conducted for the examination of protein-disease relationships. Fourty-four MPGs with known associations to cardiovascular disease and preeclampsia pathways were identified among the 6 diseases. MPGs shared across the greatest number of disease states were implicated in: (1) angiogenesis and vasoconstriction, (2) hemodynamic regulation, (3) hormonal regulation of metabolism, and (4) inflammation. NLP and text phrase-mining are successfully applied to Obstetrics abstracts with accuracy and speed. This approach holds promise in synthesizing large volumes of data for presenting trends in the Obstetric literature and for the identification of promising biomarkers.
期刊介绍:
Physiological Reports is an online only, open access journal that will publish peer reviewed research across all areas of basic, translational, and clinical physiology and allied disciplines. Physiological Reports is a collaboration between The Physiological Society and the American Physiological Society, and is therefore in a unique position to serve the international physiology community through quick time to publication while upholding a quality standard of sound research that constitutes a useful contribution to the field.