利用机器学习，根据精神疾病和中间表型的多基因评分，区分健康参与者和恐慌症患者

Australian & New Zealand Journal of Psychiatry Pub Date : 2024-04-06 DOI:10.1177/00048674241242936

Kazutaka Ohi, Yuta Tanaka, Takeshi Otowa, Mihoko Shimada, Hisanobu Kaiya, Fumichika Nishimura, Tsukasa Sasaki, Hisashi Tanii, Toshiki Shioiri, Takeshi Hara

{"title":"利用机器学习，根据精神疾病和中间表型的多基因评分，区分健康参与者和恐慌症患者","authors":"Kazutaka Ohi, Yuta Tanaka, Takeshi Otowa, Mihoko Shimada, Hisanobu Kaiya, Fumichika Nishimura, Tsukasa Sasaki, Hisashi Tanii, Toshiki Shioiri, Takeshi Hara","doi":"10.1177/00048674241242936","DOIUrl":null,"url":null,"abstract":"Objective:Panic disorder is a modestly heritable condition. Currently, diagnosis is based only on clinical symptoms; identifying objective biomarkers and a more reliable diagnostic procedure is desirable. We investigated whether people with panic disorder can be reliably diagnosed utilizing combinations of multiple polygenic scores for psychiatric disorders and their intermediate phenotypes, compared with single polygenic score approaches, by applying specific machine learning techniques.Methods:Polygenic scores for 48 psychiatric disorders and intermediate phenotypes based on large-scale genome-wide association studies ( n = 7556–1,131,881) were calculated for people with panic disorder ( n = 718) and healthy controls ( n = 1717). Discrimination between people with panic disorder and healthy controls was based on the 48 polygenic scores using five methods for classification: logistic regression, neural networks, quadratic discriminant analysis, random forests and a support vector machine. Differences in discrimination accuracy (area under the curve) due to an increased number of polygenic score combinations and differences in the accuracy across five classifiers were investigated.Results:All five classifiers performed relatively well for distinguishing people with panic disorder from healthy controls by increasing the number of polygenic scores. Of the 48 polygenic scores, the polygenic score for anxiety UK Biobank was the most useful for discrimination by the classifiers. In combinations of two or three polygenic scores, the polygenic score for anxiety UK Biobank was included as one of polygenic scores in all classifiers. When all 48 polygenic scores were used in combination, the greatest areas under the curve significantly differed among the five classifiers. Support vector machine and logistic regression had higher accuracy than quadratic discriminant analysis and random forests. For each classifier, the greatest area under the curve was 0.600 ± 0.030 for logistic regression (polygenic score combinations N = 14), 0.591 ± 0.039 for neural networks ( N = 9), 0.603 ± 0.033 for quadratic discriminant analysis ( N = 10), 0.572 ± 0.039 for random forests ( N = 25) and 0.617 ± 0.041 for support vector machine ( N = 11). The greatest areas under the curve at the best polygenic score combination significantly differed among the five classifiers. Random forests had the lowest accuracy among classifiers. Support vector machine had higher accuracy than neural networks.Conclusions:These findings suggest that increasing the number of polygenic score combinations up to approximately 10 effectively improved the discrimination accuracy and that support vector machine exhibited greater accuracy among classifiers. However, the discrimination accuracy for panic disorder, when based solely on polygenic score combinations, was found to be modest.","PeriodicalId":8576,"journal":{"name":"Australian & New Zealand Journal of Psychiatry","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discrimination between healthy participants and people with panic disorder based on polygenic scores for psychiatric disorders and for intermediate phenotypes using machine learning\",\"authors\":\"Kazutaka Ohi, Yuta Tanaka, Takeshi Otowa, Mihoko Shimada, Hisanobu Kaiya, Fumichika Nishimura, Tsukasa Sasaki, Hisashi Tanii, Toshiki Shioiri, Takeshi Hara\",\"doi\":\"10.1177/00048674241242936\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective:Panic disorder is a modestly heritable condition. Currently, diagnosis is based only on clinical symptoms; identifying objective biomarkers and a more reliable diagnostic procedure is desirable. We investigated whether people with panic disorder can be reliably diagnosed utilizing combinations of multiple polygenic scores for psychiatric disorders and their intermediate phenotypes, compared with single polygenic score approaches, by applying specific machine learning techniques.Methods:Polygenic scores for 48 psychiatric disorders and intermediate phenotypes based on large-scale genome-wide association studies ( n = 7556–1,131,881) were calculated for people with panic disorder ( n = 718) and healthy controls ( n = 1717). Discrimination between people with panic disorder and healthy controls was based on the 48 polygenic scores using five methods for classification: logistic regression, neural networks, quadratic discriminant analysis, random forests and a support vector machine. Differences in discrimination accuracy (area under the curve) due to an increased number of polygenic score combinations and differences in the accuracy across five classifiers were investigated.Results:All five classifiers performed relatively well for distinguishing people with panic disorder from healthy controls by increasing the number of polygenic scores. Of the 48 polygenic scores, the polygenic score for anxiety UK Biobank was the most useful for discrimination by the classifiers. In combinations of two or three polygenic scores, the polygenic score for anxiety UK Biobank was included as one of polygenic scores in all classifiers. When all 48 polygenic scores were used in combination, the greatest areas under the curve significantly differed among the five classifiers. Support vector machine and logistic regression had higher accuracy than quadratic discriminant analysis and random forests. For each classifier, the greatest area under the curve was 0.600 ± 0.030 for logistic regression (polygenic score combinations N = 14), 0.591 ± 0.039 for neural networks ( N = 9), 0.603 ± 0.033 for quadratic discriminant analysis ( N = 10), 0.572 ± 0.039 for random forests ( N = 25) and 0.617 ± 0.041 for support vector machine ( N = 11). The greatest areas under the curve at the best polygenic score combination significantly differed among the five classifiers. Random forests had the lowest accuracy among classifiers. Support vector machine had higher accuracy than neural networks.Conclusions:These findings suggest that increasing the number of polygenic score combinations up to approximately 10 effectively improved the discrimination accuracy and that support vector machine exhibited greater accuracy among classifiers. However, the discrimination accuracy for panic disorder, when based solely on polygenic score combinations, was found to be modest.\",\"PeriodicalId\":8576,\"journal\":{\"name\":\"Australian & New Zealand Journal of Psychiatry\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Australian & New Zealand Journal of Psychiatry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/00048674241242936\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Australian & New Zealand Journal of Psychiatry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/00048674241242936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目的：恐慌症是一种有一定遗传性的疾病。目前，诊断仅以临床症状为依据；我们希望找到客观的生物标志物和更可靠的诊断程序。方法：根据大规模全基因组关联研究（n = 7556-1,131,881 例）计算出惊恐障碍患者（n = 718 例）和健康对照组（n = 1717 例）的 48 种精神疾病和中间表型的多基因评分。恐慌症患者和健康对照组之间的鉴别基于 48 个多基因得分，采用了五种分类方法：逻辑回归、神经网络、二次判别分析、随机森林和支持向量机。结果：通过增加多基因得分的数量，所有五种分类器在区分惊恐障碍患者和健康对照者方面的表现都相对较好。在 48 个多基因评分中，英国生物库焦虑症多基因评分对分类器的区分作用最大。在两个或三个多基因得分的组合中，所有分类器都将英国生物库焦虑症多基因得分作为多基因得分之一。当组合使用所有 48 个多基因分数时，五个分类器的最大曲线下面积差异显著。支持向量机和逻辑回归的准确率高于二次判别分析和随机森林。在每个分类器中，逻辑回归（多基因分数组合 N = 14）的最大曲线下面积为 0.600 ± 0.030，神经网络（N = 9）为 0.591 ± 0.039，二次判别分析（N = 10）为 0.603 ± 0.033，随机森林（N = 25）为 0.572 ± 0.039，支持向量机（N = 11）为 0.617 ± 0.041。五种分类器在最佳多基因得分组合的最大曲线下面积上存在显著差异。随机森林分类器的准确率最低。结论：这些研究结果表明，将多基因分数组合的数量增加到大约 10 个，可以有效提高分辨准确率，而且支持向量机在分类器中表现出更高的准确率。然而，如果仅基于多基因分数组合，恐慌症的判别准确率并不高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Discrimination between healthy participants and people with panic disorder based on polygenic scores for psychiatric disorders and for intermediate phenotypes using machine learning

Objective:Panic disorder is a modestly heritable condition. Currently, diagnosis is based only on clinical symptoms; identifying objective biomarkers and a more reliable diagnostic procedure is desirable. We investigated whether people with panic disorder can be reliably diagnosed utilizing combinations of multiple polygenic scores for psychiatric disorders and their intermediate phenotypes, compared with single polygenic score approaches, by applying specific machine learning techniques.Methods:Polygenic scores for 48 psychiatric disorders and intermediate phenotypes based on large-scale genome-wide association studies ( n = 7556–1,131,881) were calculated for people with panic disorder ( n = 718) and healthy controls ( n = 1717). Discrimination between people with panic disorder and healthy controls was based on the 48 polygenic scores using five methods for classification: logistic regression, neural networks, quadratic discriminant analysis, random forests and a support vector machine. Differences in discrimination accuracy (area under the curve) due to an increased number of polygenic score combinations and differences in the accuracy across five classifiers were investigated.Results:All five classifiers performed relatively well for distinguishing people with panic disorder from healthy controls by increasing the number of polygenic scores. Of the 48 polygenic scores, the polygenic score for anxiety UK Biobank was the most useful for discrimination by the classifiers. In combinations of two or three polygenic scores, the polygenic score for anxiety UK Biobank was included as one of polygenic scores in all classifiers. When all 48 polygenic scores were used in combination, the greatest areas under the curve significantly differed among the five classifiers. Support vector machine and logistic regression had higher accuracy than quadratic discriminant analysis and random forests. For each classifier, the greatest area under the curve was 0.600 ± 0.030 for logistic regression (polygenic score combinations N = 14), 0.591 ± 0.039 for neural networks ( N = 9), 0.603 ± 0.033 for quadratic discriminant analysis ( N = 10), 0.572 ± 0.039 for random forests ( N = 25) and 0.617 ± 0.041 for support vector machine ( N = 11). The greatest areas under the curve at the best polygenic score combination significantly differed among the five classifiers. Random forests had the lowest accuracy among classifiers. Support vector machine had higher accuracy than neural networks.Conclusions:These findings suggest that increasing the number of polygenic score combinations up to approximately 10 effectively improved the discrimination accuracy and that support vector machine exhibited greater accuracy among classifiers. However, the discrimination accuracy for panic disorder, when based solely on polygenic score combinations, was found to be modest.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Australian & New Zealand Journal of Psychiatry

自引率

0.00%

发文量