Data mining polycystic ovary morphology in electronic medical record ultrasound reports.

Fertility research and practice Pub Date : 2019-12-01 eCollection Date: 2019-01-01 DOI:10.1186/s40738-019-0067-7
Jay Jojo Cheng, Shruthi Mahalingaiah
{"title":"Data mining polycystic ovary morphology in electronic medical record ultrasound reports.","authors":"Jay Jojo Cheng,&nbsp;Shruthi Mahalingaiah","doi":"10.1186/s40738-019-0067-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Polycystic ovary syndrome (PCOS) is characterized by hyperandrogenemia, oligo-anovulation, and numerous ovarian cysts. Hospital electronic medical records provide an avenue for investigating polycystic ovary morphology commonly seen in PCOS at a large scale. The purpose of this study was to develop and evaluate the performance of two machine learning text algorithms, for classification of polycystic ovary morphology (PCOM) in pelvic ultrasounds.</p><p><strong>Methods: </strong>Pelvic ultrasound reports from patients at Boston Medical Center between October 1, 2003 and December 12, 2016 were included for analysis, which resulted in 39,093 ultrasound reports from 25,535 unique women. Following the 2003 Rotterdam Consensus Criteria for polycystic ovary syndrome, 2000 randomly selected ultrasounds were expert labeled for PCOM status as present, absent, or unidentifiable (not able to be determined from text alone). An ovary was marked as having PCOM if there was mention of numerous peripheral follicles or if the volume was greater than 10 ml in the absence of a dominant follicle or other confounding pathology. Half of the labeled data was used to develop and refine the algorithms, and the other half was used as a test set for evaluating its accuracy.</p><p><strong>Results: </strong>On the evaluation set of 1000 random US reports, the accuracy of the classifiers were 97.6% (95% CI: 96.5, 98.5%) and 96.1% (94.7, 97.2%). Both models were more adept at identifying PCOM-absent ultrasounds than either PCOM-unidentifiable or PCOM-present ultrasounds. The two classifiers estimated prevalence of PCOM within the whole set of 39,093 ultrasounds to be 44% PCOM-absent, 32% PCOM-unidentifiable, and 24% PCOM-present.</p><p><strong>Conclusions: </strong>Although accuracy measured on the test set and inter-rater agreement between the two classifiers (Cohen's Kappa = 0.988) was high, a major limitation of our approach is that it uses the ultrasound report text as a proxy and does not directly count follicles from the ultrasound images themselves.</p>","PeriodicalId":87254,"journal":{"name":"Fertility research and practice","volume":"5 ","pages":"13"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s40738-019-0067-7","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fertility research and practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40738-019-0067-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Background: Polycystic ovary syndrome (PCOS) is characterized by hyperandrogenemia, oligo-anovulation, and numerous ovarian cysts. Hospital electronic medical records provide an avenue for investigating polycystic ovary morphology commonly seen in PCOS at a large scale. The purpose of this study was to develop and evaluate the performance of two machine learning text algorithms, for classification of polycystic ovary morphology (PCOM) in pelvic ultrasounds.

Methods: Pelvic ultrasound reports from patients at Boston Medical Center between October 1, 2003 and December 12, 2016 were included for analysis, which resulted in 39,093 ultrasound reports from 25,535 unique women. Following the 2003 Rotterdam Consensus Criteria for polycystic ovary syndrome, 2000 randomly selected ultrasounds were expert labeled for PCOM status as present, absent, or unidentifiable (not able to be determined from text alone). An ovary was marked as having PCOM if there was mention of numerous peripheral follicles or if the volume was greater than 10 ml in the absence of a dominant follicle or other confounding pathology. Half of the labeled data was used to develop and refine the algorithms, and the other half was used as a test set for evaluating its accuracy.

Results: On the evaluation set of 1000 random US reports, the accuracy of the classifiers were 97.6% (95% CI: 96.5, 98.5%) and 96.1% (94.7, 97.2%). Both models were more adept at identifying PCOM-absent ultrasounds than either PCOM-unidentifiable or PCOM-present ultrasounds. The two classifiers estimated prevalence of PCOM within the whole set of 39,093 ultrasounds to be 44% PCOM-absent, 32% PCOM-unidentifiable, and 24% PCOM-present.

Conclusions: Although accuracy measured on the test set and inter-rater agreement between the two classifiers (Cohen's Kappa = 0.988) was high, a major limitation of our approach is that it uses the ultrasound report text as a proxy and does not directly count follicles from the ultrasound images themselves.

Abstract Image

Abstract Image

电子病历超声报告中多囊卵巢形态的数据挖掘。
背景:多囊卵巢综合征(PCOS)以高雄激素血症、少排卵和大量卵巢囊肿为特征。医院电子病历为大规模调查多囊卵巢形态学提供了途径。本研究的目的是开发和评估两种机器学习文本算法的性能,用于盆腔超声中多囊卵巢形态(PCOM)的分类。方法:纳入2003年10月1日至2016年12月12日波士顿医疗中心患者盆腔超声报告进行分析,共获得来自25,535名独特女性的39,093份超声报告。根据2003年多囊卵巢综合征鹿特丹共识标准,2000张随机选择的超声波被专家标记为PCOM状态为存在、不存在或无法识别(不能仅从文本确定)。如果有大量外周卵泡,或者在没有显性卵泡或其他混杂病理的情况下体积大于10ml,则卵巢被标记为PCOM。标记数据的一半用于开发和改进算法,另一半用作评估其准确性的测试集。结果:在1000份随机美国报告的评估集上,分类器的准确率分别为97.6% (95% CI: 96.5, 98.5%)和96.1%(94.7,97.2%)。两种模型都比pcom无法识别或pcom存在的超声更善于识别pcom缺失的超声。两种分类方法估计,在全部39093张超声检查中,PCOM的患病率为44%无PCOM, 32% PCOM无法识别,24% PCOM存在。结论:尽管在测试集上测量的准确性和两个分类器之间的一致性(Cohen’s Kappa = 0.988)很高,但我们方法的一个主要限制是它使用超声报告文本作为代理,而不是直接从超声图像本身计算卵泡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信