Methods of Information in Medicine最新文献

筛选
英文 中文
Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop. 英语和日语有限注释病例/放射学报告的跨语言自然语言处理:Real-MedNLP 研讨会的启示。
IF 1.3 4区 医学
Methods of Information in Medicine Pub Date : 2024-10-29 DOI: 10.1055/a-2405-2489
Shuntaro Yada, Yuta Nakamura, Shoko Wakamiya, Eiji Aramaki
{"title":"Cross-lingual Natural Language Processing on Limited Annotated Case/Radiology Reports in English and Japanese: Insights from the Real-MedNLP Workshop.","authors":"Shuntaro Yada, Yuta Nakamura, Shoko Wakamiya, Eiji Aramaki","doi":"10.1055/a-2405-2489","DOIUrl":"10.1055/a-2405-2489","url":null,"abstract":"<p><strong>Background: </strong> Textual datasets (corpora) are crucial for the application of natural language processing (NLP) models. However, corpus creation in the medical field is challenging, primarily because of privacy issues with raw clinical data such as health records. Thus, the existing clinical corpora are generally small and scarce. Medical NLP (MedNLP) methodologies perform well with limited data availability.</p><p><strong>Objectives: </strong> We present the outcomes of the Real-MedNLP workshop, which was conducted using limited and parallel medical corpora. Real-MedNLP exhibits three distinct characteristics: (1) limited annotated documents: the training data comprise only a small set (∼100) of case reports (CRs) and radiology reports (RRs) that have been annotated. (2) Bilingually parallel: the constructed corpora are parallel in Japanese and English. (3) Practical tasks: the workshop addresses fundamental tasks, such as named entity recognition (NER) and applied practical tasks.</p><p><strong>Methods: </strong> We propose three tasks: NER of ∼100 available documents (Task 1), NER based only on annotation guidelines for humans (Task 2), and clinical applications (Task 3) consisting of adverse drug effect (ADE) detection for CRs and identical case identification (CI) for RRs.</p><p><strong>Results: </strong> Nine teams participated in this study. The best systems achieved 0.65 and 0.89 F1-scores for CRs and RRs in Task 1, whereas the top scores in Task 2 decreased by 50 to 70%. In Task 3, ADE reports were detected by up to 0.64 F1-score, and CI scored up to 0.96 binary accuracy.</p><p><strong>Conclusion: </strong> Most systems adopt medical-domain-specific pretrained language models using data augmentation methods. Despite the challenge of limited corpus size in Tasks 1 and 2, recent approaches are promising because the partial match scores reached ∼0.8-0.9 F1-scores. Task 3 applications revealed that the different availabilities of external language resources affected the performance per language.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142114054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning for Predicting Progression of Patellofemoral Osteoarthritis Based on Lateral Knee Radiographs, Demographic Data, and Symptomatic Assessments. 基于膝关节外侧X光片、人口统计学数据和症状评估的深度学习预测髌骨骨关节炎的进展情况
IF 1.3 4区 医学
Methods of Information in Medicine Pub Date : 2024-05-01 Epub Date: 2024-04-11 DOI: 10.1055/a-2305-2115
Neslihan Bayramoglu, Martin Englund, Ida K Haugen, Muneaki Ishijima, Simo Saarakkala
{"title":"Deep Learning for Predicting Progression of Patellofemoral Osteoarthritis Based on Lateral Knee Radiographs, Demographic Data, and Symptomatic Assessments.","authors":"Neslihan Bayramoglu, Martin Englund, Ida K Haugen, Muneaki Ishijima, Simo Saarakkala","doi":"10.1055/a-2305-2115","DOIUrl":"10.1055/a-2305-2115","url":null,"abstract":"<p><strong>Objective: </strong>In this study, we propose a novel framework that utilizes deep learning and attention mechanisms to predict the radiographic progression of patellofemoral osteoarthritis (PFOA) over a period of 7 years.</p><p><strong>Material and methods: </strong>This study included subjects (1,832 subjects, 3,276 knees) from the baseline of the Multicenter Osteoarthritis Study (MOST). Patellofemoral joint regions of interest were identified using an automated landmark detection tool (BoneFinder) on lateral knee X-rays. An end-to-end deep learning method was developed for predicting PFOA progression based on imaging data in a five-fold cross-validation setting. To evaluate the performance of the models, a set of baselines based on known risk factors were developed and analyzed using gradient boosting machine (GBM). Risk factors included age, sex, body mass index, and Western Ontario and McMaster Universities Arthritis Index score, and the radiographic osteoarthritis stage of the tibiofemoral joint (Kellgren and Lawrence [KL] score). Finally, to increase predictive power, we trained an ensemble model using both imaging and clinical data.</p><p><strong>Results: </strong>Among the individual models, the performance of our deep convolutional neural network attention model achieved the best performance with an area under the receiver operating characteristic curve (AUC) of 0.856 and average precision (AP) of 0.431, slightly outperforming the deep learning approach without attention (AUC = 0.832, AP = 0.4) and the best performing reference GBM model (AUC = 0.767, AP = 0.334). The inclusion of imaging data and clinical variables in an ensemble model allowed statistically more powerful prediction of PFOA progression (AUC = 0.865, AP = 0.447), although the clinical significance of this minor performance gain remains unknown. The spatial attention module improved the predictive performance of the backbone model, and the visual interpretation of attention maps focused on the joint space and the regions where osteophytes typically occur.</p><p><strong>Conclusion: </strong>This study demonstrated the potential of machine learning models to predict the progression of PFOA using imaging and clinical variables. These models could be used to identify patients who are at high risk of progression and prioritize them for new treatments. However, even though the accuracy of the models were excellent in this study using the MOST dataset, they should be still validated using external patient cohorts in the future.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"1-10"},"PeriodicalIF":1.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495941/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140854286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and Validation of a Natural Language Processing Algorithm to Pseudonymize Documents in the Context of a Clinical Data Warehouse. 开发和验证自然语言处理算法,在临床数据仓库中对文档进行匿名化处理。
IF 1.3 4区 医学
Methods of Information in Medicine Pub Date : 2024-05-01 Epub Date: 2024-03-05 DOI: 10.1055/s-0044-1778693
Xavier Tannier, Perceval Wajsbürt, Alice Calliger, Basile Dura, Alexandre Mouchet, Martin Hilka, Romain Bey
{"title":"Development and Validation of a Natural Language Processing Algorithm to Pseudonymize Documents in the Context of a Clinical Data Warehouse.","authors":"Xavier Tannier, Perceval Wajsbürt, Alice Calliger, Basile Dura, Alexandre Mouchet, Martin Hilka, Romain Bey","doi":"10.1055/s-0044-1778693","DOIUrl":"10.1055/s-0044-1778693","url":null,"abstract":"<p><strong>Objective: </strong>The objective of this study is to address the critical issue of deidentification of clinical reports to allow access to data for research purposes, while ensuring patient privacy. The study highlights the difficulties faced in sharing tools and resources in this domain and presents the experience of the Greater Paris University Hospitals (AP-HP for Assistance Publique-Hôpitaux de Paris) in implementing a systematic pseudonymization of text documents from its Clinical Data Warehouse.</p><p><strong>Methods: </strong>We annotated a corpus of clinical documents according to 12 types of identifying entities and built a hybrid system, merging the results of a deep learning model as well as manual rules.</p><p><strong>Results and discussion: </strong>Our results show an overall performance of 0.99 of F1-score. We discuss implementation choices and present experiments to better understand the effort involved in such a task, including dataset size, document types, language models, or rule addition. We share guidelines and code under a 3-Clause BSD license.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"21-34"},"PeriodicalIF":1.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495938/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140040727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Does Differentially Private Synthetic Data Lead to Synthetic Discoveries? 差异化私有合成数据会带来合成发现吗?
IF 1.3 4区 医学
Methods of Information in Medicine Pub Date : 2024-05-01 Epub Date: 2024-08-13 DOI: 10.1055/a-2385-1355
Ileana Montoya Perez, Parisa Movahedi, Valtteri Nieminen, Antti Airola, Tapio Pahikkala
{"title":"Does Differentially Private Synthetic Data Lead to Synthetic Discoveries?","authors":"Ileana Montoya Perez, Parisa Movahedi, Valtteri Nieminen, Antti Airola, Tapio Pahikkala","doi":"10.1055/a-2385-1355","DOIUrl":"10.1055/a-2385-1355","url":null,"abstract":"<p><strong>Background: </strong>Synthetic data have been proposed as a solution for sharing anonymized versions of sensitive biomedical datasets. Ideally, synthetic data should preserve the structure and statistical properties of the original data, while protecting the privacy of the individual subjects. Differential Privacy (DP) is currently considered the gold standard approach for balancing this trade-off.</p><p><strong>Objectives: </strong>The aim of this study is to investigate how trustworthy are group differences discovered by independent sample tests from DP-synthetic data. The evaluation is carried out in terms of the tests' Type I and Type II errors. With the former, we can quantify the tests' validity, i.e., whether the probability of false discoveries is indeed below the significance level, and the latter indicates the tests' power in making real discoveries.</p><p><strong>Methods: </strong>We evaluate the Mann-Whitney U test, Student's <i>t</i>-test, chi-squared test, and median test on DP-synthetic data. The private synthetic datasets are generated from real-world data, including a prostate cancer dataset (<i>n</i> = 500) and a cardiovascular dataset (<i>n</i> = 70,000), as well as on bivariate and multivariate simulated data. Five different DP-synthetic data generation methods are evaluated, including two basic DP histogram release methods and MWEM, Private-PGM, and DP GAN algorithms.</p><p><strong>Conclusion: </strong>A large portion of the evaluation results expressed dramatically inflated Type I errors, especially at levels of <i>ϵ</i> ≤ 1. This result calls for caution when releasing and analyzing DP-synthetic data: low <i>p</i>-values may be obtained in statistical tests simply as a byproduct of the noise added to protect privacy. A DP Smoothed Histogram-based synthetic data generation method was shown to produce valid Type I error for all privacy levels tested but required a large original dataset size and a modest privacy budget (<i>ϵ</i> ≥ 5) in order to have reasonable Type II error levels.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"35-51"},"PeriodicalIF":1.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495942/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141977081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial Intelligence-Based Prediction of Contrast Medium Doses for Computed Tomography Angiography Using Optimized Clinical Parameter Sets. 基于人工智能的计算机断层扫描血管造影术造影剂剂量预测,使用优化的临床参数集。
IF 1.3 4区 医学
Methods of Information in Medicine Pub Date : 2024-05-01 Epub Date: 2024-01-23 DOI: 10.1055/s-0044-1778694
Marja Fleitmann, Hristina Uzunova, René Pallenberg, Andreas M Stroth, Jan Gerlach, Alexander Fürschke, Jörg Barkhausen, Arpad Bischof, Heinz Handels
{"title":"Artificial Intelligence-Based Prediction of Contrast Medium Doses for Computed Tomography Angiography Using Optimized Clinical Parameter Sets.","authors":"Marja Fleitmann, Hristina Uzunova, René Pallenberg, Andreas M Stroth, Jan Gerlach, Alexander Fürschke, Jörg Barkhausen, Arpad Bischof, Heinz Handels","doi":"10.1055/s-0044-1778694","DOIUrl":"10.1055/s-0044-1778694","url":null,"abstract":"<p><strong>Objectives: </strong>In this paper, an artificial intelligence-based algorithm for predicting the optimal contrast medium dose for computed tomography (CT) angiography of the aorta is presented and evaluated in a clinical study. The prediction of the contrast dose reduction is modelled as a classification problem using the image contrast as the main feature.</p><p><strong>Methods: </strong>This classification is performed by random decision forests (RDF) and k-nearest-neighbor methods (KNN). For the selection of optimal parameter subsets all possible combinations of the 22 clinical parameters (age, blood pressure, etc.) are considered using the classification accuracy and precision of the KNN classifier and RDF as quality criteria. Subsequently, the results of the evaluation were optimized by means of feature transformation using regression neural networks (RNN). These were used for a direct classification based on regressed Hounsfield units as well as preprocessing for a subsequent KNN classification.</p><p><strong>Results: </strong>For feature selection, an RDF model achieved the highest accuracy of 84.42% and a KNN model achieved the best precision of 86.21%. The most important parameters include age, height, and hemoglobin. The feature transformation using an RNN considerably exceeded these values with an accuracy of 90.00% and a precision of 97.62% using all 22 parameters as input. However, also the feasibility of the parameter sets in routine clinical practice has to be considered, because some of the 22 parameters are not measured in routine clinical practice and additional measurement time of 15 to 20 minutes per patient is needed. Using the standard feature set available in clinical routine the best accuracy of 86.67% and precision of 93.18% was achieved by the RNN.</p><p><strong>Conclusion: </strong>We developed a reliable hybrid system that helps radiologists determine the optimal contrast dose for CT angiography based on patient-specific parameters.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"11-20"},"PeriodicalIF":1.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495943/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139543328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations. 欧洲最大的带语义注释的医学数据模型研究基础设施。
IF 1.3 4区 医学
Methods of Information in Medicine Pub Date : 2024-05-01 Epub Date: 2024-05-13 DOI: 10.1055/s-0044-1786839
Sarah Riepenhausen, Max Blumenstock, Christian Niklas, Stefan Hegselmann, Philipp Neuhaus, Alexandra Meidt, Cornelia Püttmann, Michael Storck, Matthias Ganzinger, Julian Varghese, Martin Dugas
{"title":"Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations.","authors":"Sarah Riepenhausen, Max Blumenstock, Christian Niklas, Stefan Hegselmann, Philipp Neuhaus, Alexandra Meidt, Cornelia Püttmann, Michael Storck, Matthias Ganzinger, Julian Varghese, Martin Dugas","doi":"10.1055/s-0044-1786839","DOIUrl":"10.1055/s-0044-1786839","url":null,"abstract":"<p><strong>Background: </strong>Structural metadata from the majority of clinical studies and routine health care systems is currently not yet available to the scientific community.</p><p><strong>Objective: </strong>To provide an overview of available contents in the Portal of Medical Data Models (MDM Portal).</p><p><strong>Methods: </strong>The MDM Portal is a registered European information infrastructure for research and health care, and its contents are curated and semantically annotated by medical experts. It enables users to search, view, discuss, and download existing medical data models.</p><p><strong>Results: </strong>The most frequent keyword is \"clinical trial\" (<i>n</i> = 18,777), and the most frequent disease-specific keyword is \"breast neoplasms\" (<i>n</i> = 1,943). Most data items are available in English (<i>n</i> = 545,749) and German (<i>n</i> = 109,267). Manually curated semantic annotations are available for 805,308 elements (554,352 items, 58,101 item groups, and 192,855 code list items), which were derived from 25,257 data models. In total, 1,609,225 Unified Medical Language System (UMLS) codes have been assigned, with 66,373 unique UMLS codes.</p><p><strong>Conclusion: </strong>To our knowledge, the MDM Portal constitutes Europe's largest collection of medical data models with semantically annotated elements. As such, it can be used to increase compatibility of medical datasets and can be utilized as a large expert-annotated medical text corpus for natural language processing.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"52-61"},"PeriodicalIF":1.3,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495939/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140917387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Characteristics of a Rule-Based Electronic Health Record Algorithm to Identify Patients with Gross and Microscopic Hematuria. 基于规则的电子健康记录算法识别肉眼和显微镜下血尿患者的性能特征。
IF 1.7 4区 医学
Methods of Information in Medicine Pub Date : 2023-12-01 Epub Date: 2023-09-04 DOI: 10.1055/a-2165-5552
Jasmine Kashkoush, Mudit Gupta, Matthew A Meissner, Matthew E Nielsen, H Lester Kirchner, Tullika Garg
{"title":"Performance Characteristics of a Rule-Based Electronic Health Record Algorithm to Identify Patients with Gross and Microscopic Hematuria.","authors":"Jasmine Kashkoush, Mudit Gupta, Matthew A Meissner, Matthew E Nielsen, H Lester Kirchner, Tullika Garg","doi":"10.1055/a-2165-5552","DOIUrl":"10.1055/a-2165-5552","url":null,"abstract":"<p><strong>Background: </strong>Two million patients per year are referred to urologists for hematuria, or blood in the urine. The American Urological Association recently adopted a risk-stratified hematuria evaluation guideline to limit multi-phase computed tomography to individuals at highest risk of occult malignancy.</p><p><strong>Objectives: </strong>To understand population-level hematuria evaluations, we developed an algorithm to accurately identify hematuria cases from electronic health records (EHRs).</p><p><strong>Methods: </strong>We used International Classification of Diseases (ICD)-9/ICD-10 diagnosis codes, urine color, and urine microscopy values to identify hematuria cases and to differentiate between gross and microscopic hematuria. Using an iterative process, we refined the ICD-9 algorithm on a gold standard, chart-reviewed cohort of 3,094 hematuria cases, and the ICD-10 algorithm on a 300 patient cohort. We applied the algorithm to Geisinger patients ≥35 years (<i>n</i> = 539,516) and determined performance by conducting chart review (<i>n</i> = 500).</p><p><strong>Results: </strong>After applying the hematuria algorithm, we identified 51,500 hematuria cases and 488,016 clean controls. Of the hematuria cases, 11,435 were categorized as gross, 26,658 as microscopic, 12,562 as indeterminate, and 845 were uncategorized. The positive predictive value (PPV) of identifying hematuria cases using the algorithm was 100% and the negative predictive value (NPV) was 99%. The gross hematuria algorithm had a PPV of 100% and NPV of 99%. The microscopic hematuria algorithm had lower PPV of 78% and NPV of 100%.</p><p><strong>Conclusion: </strong>We developed an algorithm utilizing diagnosis codes and urine laboratory values to accurately identify hematuria and categorize as gross or microscopic in EHRs. Applying the algorithm will help researchers to understand patterns of care for this common condition.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"183-192"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10153429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Current Trends and New Approaches in Participatory Health Informatics. 参与式健康信息学的当前趋势和新方法。
IF 1.7 4区 医学
Methods of Information in Medicine Pub Date : 2023-12-01 Epub Date: 2023-12-29 DOI: 10.1055/s-0043-1777732
Kerstin Denecke, Elia Gabarron, Carolyn Petersen
{"title":"Current Trends and New Approaches in Participatory Health Informatics.","authors":"Kerstin Denecke, Elia Gabarron, Carolyn Petersen","doi":"10.1055/s-0043-1777732","DOIUrl":"10.1055/s-0043-1777732","url":null,"abstract":"","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"151-153"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139075728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Use of Natural Language Processing to Identify Sexual and Reproductive Health Information in Clinical Text. 使用自然语言处理技术识别临床文本中的性健康和生殖健康信息。
IF 1.7 4区 医学
Methods of Information in Medicine Pub Date : 2023-12-01 Epub Date: 2023-12-20 DOI: 10.1055/a-2233-2736
Elizabeth I Harrison, Laura A Kirkpatrick, Patrick W Harrison, Traci M Kazmerski, Yoshimi Sogawa, Harry S Hochheiser
{"title":"Use of Natural Language Processing to Identify Sexual and Reproductive Health Information in Clinical Text.","authors":"Elizabeth I Harrison, Laura A Kirkpatrick, Patrick W Harrison, Traci M Kazmerski, Yoshimi Sogawa, Harry S Hochheiser","doi":"10.1055/a-2233-2736","DOIUrl":"10.1055/a-2233-2736","url":null,"abstract":"<p><strong>Objectives: </strong>This study aimed to enable clinical researchers without expertise in natural language processing (NLP) to extract and analyze information about sexual and reproductive health (SRH), or other sensitive health topics, from large sets of clinical notes.</p><p><strong>Methods: </strong>(1) We retrieved text from the electronic health record as individual notes. (2) We segmented notes into sentences using one of scispaCy's NLP toolkits. (3) We exported sentences to the labeling application Watchful and annotated subsets of these as relevant or irrelevant to various SRH categories by applying a combination of regular expressions and manual annotation. (4) The labeled sentences served as training data to create machine learning models for classifying text; specifically, we used spaCy's default text classification ensemble, comprising a bag-of-words model and a neural network with attention. (5) We applied each model to unlabeled sentences to identify additional references to SRH with novel relevant vocabulary. We used this information and repeated steps 3 to 5 iteratively until the models identified no new relevant sentences for each topic. Finally, we aggregated the labeled data for analysis.</p><p><strong>Results: </strong>This methodology was applied to 3,663 Child Neurology notes for 971 female patients. Our search focused on six SRH categories. We validated the approach using two subject matter experts, who independently labeled a sample of 400 sentences. Cohen's kappa values were calculated for each category between the reviewers (menstruation: 1, sexual activity: 0.9499, contraception: 0.9887, folic acid: 1, teratogens: 0.8864, pregnancy: 0.9499). After removing the sentences on which reviewers did not agree, we compared the reviewers' labels to those produced via our methodology, again using Cohen's kappa (menstruation: 1, sexual activity: 1, contraception: 0.9885, folic acid: 1, teratogens: 0.9841, pregnancy: 0.9871).</p><p><strong>Conclusion: </strong>Our methodology is reproducible, enables analysis of large amounts of text, and has produced results that are highly comparable to subject matter expert manual review.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":" ","pages":"193-201"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138832647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Report from the 68th GMDS Annual Meeting: Science. Close to People. 第 68 届 GMDS 年会报告:科学。贴近人类。
IF 1.7 4区 医学
Methods of Information in Medicine Pub Date : 2023-12-01 Epub Date: 2024-02-20 DOI: 10.1055/s-0043-1777733
Jonas Bienzeisler, Ariadna Perez-Garriga, Lea C Brandl, Ann-Kristin Kock-Schoppenhauer, Yasmin Hollenbenders, Maximilian Kurscheidt, Christina Schüttler
{"title":"Report from the 68th GMDS Annual Meeting: Science. Close to People.","authors":"Jonas Bienzeisler, Ariadna Perez-Garriga, Lea C Brandl, Ann-Kristin Kock-Schoppenhauer, Yasmin Hollenbenders, Maximilian Kurscheidt, Christina Schüttler","doi":"10.1055/s-0043-1777733","DOIUrl":"10.1055/s-0043-1777733","url":null,"abstract":"","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 5-06","pages":"202-205"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139913957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信