利用机器学习对从大型互联网论坛中提取的多囊卵巢综合征实验室结果进行聚类

Rebecca H.K. Emanuel , Paul D. Docherty , Helen Lunt , Rua Murray , Rebecca E. Campbell
{"title":"利用机器学习对从大型互联网论坛中提取的多囊卵巢综合征实验室结果进行聚类","authors":"Rebecca H.K. Emanuel ,&nbsp;Paul D. Docherty ,&nbsp;Helen Lunt ,&nbsp;Rua Murray ,&nbsp;Rebecca E. Campbell","doi":"10.1016/j.ibmed.2024.100135","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Polycystic Ovary Syndrome (PCOS) is reported to affect between 4% and 21% of reproductive aged people with ovaries. It is a heterogeneous condition with a lack of established phenotypes that address the range of reproductive and metabolic features present in PCOS. These reproductive and metabolic features may result in patients undergoing a variety of relevant laboratory tests. Previous work has led to the gathering of laboratory test results from a PCOS specific forum, hosted on a website called reddit.</p></div><div><h3>Objectives</h3><p>In this paper, laboratory results and body mass index (BMI) posted on the PCOS reddit forum were clustered to show the usefulness of the PCOS forum for PCOS research and validate existing PCOS phenotypes or discover other appropriate phenotypes.</p></div><div><h3>Methods and results</h3><p>Over 1500 sets of PCOS-related reddit laboratory test results and BMIs were clustered using nearest neighbour imputation and K-means clustering. However, only non-imputed data was included in the final clusters. Kernel Density Estimation plots were used to display the distinct clusters. The clustered test results suggested the existence of distinct metabolic and reproductive phenotypes, as well as a group displaying mild features of both types of dysregulations and a group skewed towards normal results. It was also possible to separate the groups further into distinct hypothyroid groups within the mixed dysregulation group and to separate insulin resistant and diabetes-like groups within the metabolic group.</p></div><div><h3>Conclusions</h3><p>This research further validates the usefulness of exploring alternate data sources in the age of the internet and machine learning. The reddit clusters reinforced the existing notion that people with PCOS can be separated into a primarily metabolic pathology group, a primarily reproductive pathology group and an in between group with pathology in both domains.</p></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"9 ","pages":"Article 100135"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666521224000024/pdfft?md5=87b2d688b9b327bd7f8d3d181ee40e71&pid=1-s2.0-S2666521224000024-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Clustering polycystic ovary syndrome laboratory results extracted from a large internet forum with machine learning\",\"authors\":\"Rebecca H.K. Emanuel ,&nbsp;Paul D. Docherty ,&nbsp;Helen Lunt ,&nbsp;Rua Murray ,&nbsp;Rebecca E. Campbell\",\"doi\":\"10.1016/j.ibmed.2024.100135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Polycystic Ovary Syndrome (PCOS) is reported to affect between 4% and 21% of reproductive aged people with ovaries. It is a heterogeneous condition with a lack of established phenotypes that address the range of reproductive and metabolic features present in PCOS. These reproductive and metabolic features may result in patients undergoing a variety of relevant laboratory tests. Previous work has led to the gathering of laboratory test results from a PCOS specific forum, hosted on a website called reddit.</p></div><div><h3>Objectives</h3><p>In this paper, laboratory results and body mass index (BMI) posted on the PCOS reddit forum were clustered to show the usefulness of the PCOS forum for PCOS research and validate existing PCOS phenotypes or discover other appropriate phenotypes.</p></div><div><h3>Methods and results</h3><p>Over 1500 sets of PCOS-related reddit laboratory test results and BMIs were clustered using nearest neighbour imputation and K-means clustering. However, only non-imputed data was included in the final clusters. Kernel Density Estimation plots were used to display the distinct clusters. The clustered test results suggested the existence of distinct metabolic and reproductive phenotypes, as well as a group displaying mild features of both types of dysregulations and a group skewed towards normal results. It was also possible to separate the groups further into distinct hypothyroid groups within the mixed dysregulation group and to separate insulin resistant and diabetes-like groups within the metabolic group.</p></div><div><h3>Conclusions</h3><p>This research further validates the usefulness of exploring alternate data sources in the age of the internet and machine learning. The reddit clusters reinforced the existing notion that people with PCOS can be separated into a primarily metabolic pathology group, a primarily reproductive pathology group and an in between group with pathology in both domains.</p></div>\",\"PeriodicalId\":73399,\"journal\":{\"name\":\"Intelligence-based medicine\",\"volume\":\"9 \",\"pages\":\"Article 100135\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666521224000024/pdfft?md5=87b2d688b9b327bd7f8d3d181ee40e71&pid=1-s2.0-S2666521224000024-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligence-based medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666521224000024\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521224000024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景据报道,多囊卵巢综合症(PCOS)影响到 4% 到 21% 的育龄卵巢患者。多囊卵巢综合征是一种异质性疾病,缺乏针对多囊卵巢综合征一系列生殖和代谢特征的既定表型。这些生殖和代谢特征可能导致患者接受各种相关的实验室检查。本文对 PCOS reddit 论坛上发布的实验室结果和体重指数 (BMI) 进行了聚类,以显示 PCOS 论坛对 PCOS 研究的有用性,并验证现有的 PCOS 表型或发现其他合适的表型。方法和结果使用近邻估算和 K-means 聚类对 1500 多组与 PCOS 相关的 reddit 实验室测试结果和 BMI 进行了聚类。不过,最终的聚类只包括非估算数据。核密度估计图用于显示不同的聚类。聚类测试结果表明,存在不同的代谢和生殖表型,一组显示出两种类型失调的轻微特征,另一组则偏向于正常结果。这项研究进一步验证了在互联网和机器学习时代探索其他数据源的实用性。reddit 聚类加强了现有的概念,即多囊卵巢综合症患者可分为以代谢病理为主的组别、以生殖病理为主的组别以及在两个领域都有病理的介于两者之间的组别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Clustering polycystic ovary syndrome laboratory results extracted from a large internet forum with machine learning

Background

Polycystic Ovary Syndrome (PCOS) is reported to affect between 4% and 21% of reproductive aged people with ovaries. It is a heterogeneous condition with a lack of established phenotypes that address the range of reproductive and metabolic features present in PCOS. These reproductive and metabolic features may result in patients undergoing a variety of relevant laboratory tests. Previous work has led to the gathering of laboratory test results from a PCOS specific forum, hosted on a website called reddit.

Objectives

In this paper, laboratory results and body mass index (BMI) posted on the PCOS reddit forum were clustered to show the usefulness of the PCOS forum for PCOS research and validate existing PCOS phenotypes or discover other appropriate phenotypes.

Methods and results

Over 1500 sets of PCOS-related reddit laboratory test results and BMIs were clustered using nearest neighbour imputation and K-means clustering. However, only non-imputed data was included in the final clusters. Kernel Density Estimation plots were used to display the distinct clusters. The clustered test results suggested the existence of distinct metabolic and reproductive phenotypes, as well as a group displaying mild features of both types of dysregulations and a group skewed towards normal results. It was also possible to separate the groups further into distinct hypothyroid groups within the mixed dysregulation group and to separate insulin resistant and diabetes-like groups within the metabolic group.

Conclusions

This research further validates the usefulness of exploring alternate data sources in the age of the internet and machine learning. The reddit clusters reinforced the existing notion that people with PCOS can be separated into a primarily metabolic pathology group, a primarily reproductive pathology group and an in between group with pathology in both domains.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Intelligence-based medicine
Intelligence-based medicine Health Informatics
CiteScore
5.00
自引率
0.00%
发文量
0
审稿时长
187 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信