Ricardo Ahumada, Jocelyn Dunstan, Matías Rojas, Sergio Peñafiel, Inti Paredes, Pablo Báez
{"title":"用西班牙语自动检测放射学报告中的远处转移病灶。","authors":"Ricardo Ahumada, Jocelyn Dunstan, Matías Rojas, Sergio Peñafiel, Inti Paredes, Pablo Báez","doi":"10.1200/CCI.23.00130","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>A critical task in oncology is extracting information related to cancer metastasis from electronic health records. Metastasis-related information is crucial for planning treatment, evaluating patient prognoses, and cancer research. However, the unstructured way in which findings of distant metastasis are often written in radiology reports makes it difficult to extract information automatically. The main aim of this study was to extract distant metastasis findings from free-text imaging and nuclear medicine reports to classify the patient status according to the presence or absence of distant metastasis.</p><p><strong>Materials and methods: </strong>We created a distant metastasis annotated corpus using positron emission tomography-computed tomography and computed tomography reports of patients with prostate, colorectal, and breast cancers. Entities were labeled M1 or M0 according to affirmative or negative metastasis descriptions. We used a named entity recognition model on the basis of a bidirectional long short-term memory model and conditional random fields to identify entities. Mentions were subsequently used to classify whole reports into M1 or M0.</p><p><strong>Results: </strong>The model detected distant metastasis mentions with a weighted average <i>F</i><sub>1</sub> score performance of 0.84. Whole reports were classified with an <i>F</i><sub>1</sub> score of 0.92 for M0 documents and 0.90 for M1 documents.</p><p><strong>Conclusion: </strong>These results show the usefulness of the model in detecting distant metastasis findings in three different types of cancer and the consequent classification of reports. The relevance of this study is to generate structured distant metastasis information from free-text imaging reports in Spanish. In addition, the manually annotated corpus, annotation guidelines, and code are freely released to the research community.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300130"},"PeriodicalIF":3.3000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10793975/pdf/","citationCount":"0","resultStr":"{\"title\":\"Automatic Detection of Distant Metastasis Mentions in Radiology Reports in Spanish.\",\"authors\":\"Ricardo Ahumada, Jocelyn Dunstan, Matías Rojas, Sergio Peñafiel, Inti Paredes, Pablo Báez\",\"doi\":\"10.1200/CCI.23.00130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>A critical task in oncology is extracting information related to cancer metastasis from electronic health records. Metastasis-related information is crucial for planning treatment, evaluating patient prognoses, and cancer research. However, the unstructured way in which findings of distant metastasis are often written in radiology reports makes it difficult to extract information automatically. The main aim of this study was to extract distant metastasis findings from free-text imaging and nuclear medicine reports to classify the patient status according to the presence or absence of distant metastasis.</p><p><strong>Materials and methods: </strong>We created a distant metastasis annotated corpus using positron emission tomography-computed tomography and computed tomography reports of patients with prostate, colorectal, and breast cancers. Entities were labeled M1 or M0 according to affirmative or negative metastasis descriptions. We used a named entity recognition model on the basis of a bidirectional long short-term memory model and conditional random fields to identify entities. Mentions were subsequently used to classify whole reports into M1 or M0.</p><p><strong>Results: </strong>The model detected distant metastasis mentions with a weighted average <i>F</i><sub>1</sub> score performance of 0.84. Whole reports were classified with an <i>F</i><sub>1</sub> score of 0.92 for M0 documents and 0.90 for M1 documents.</p><p><strong>Conclusion: </strong>These results show the usefulness of the model in detecting distant metastasis findings in three different types of cancer and the consequent classification of reports. The relevance of this study is to generate structured distant metastasis information from free-text imaging reports in Spanish. In addition, the manually annotated corpus, annotation guidelines, and code are freely released to the research community.</p>\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":\"8 \",\"pages\":\"e2300130\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10793975/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI.23.00130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.23.00130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
目的:肿瘤学的一项关键任务是从电子健康记录中提取与癌症转移相关的信息。转移相关信息对于制定治疗计划、评估病人预后和癌症研究至关重要。然而,由于放射学报告中的远处转移发现通常采用非结构化的书写方式,因此很难自动提取信息。本研究的主要目的是从自由文本的影像学和核医学报告中提取远处转移的结果,并根据有无远处转移对患者状态进行分类:我们利用前列腺癌、结直肠癌和乳腺癌患者的正电子发射断层扫描-计算机断层扫描和计算机断层扫描报告创建了远处转移注释语料库。根据肯定或否定的转移描述,实体被标记为 M1 或 M0。我们在双向长短期记忆模型和条件随机场的基础上使用命名实体识别模型来识别实体。随后,我们使用实体识别模型将整个报告分为 M1 或 M0:结果:该模型检测到的远处转移提及加权平均 F1 分数为 0.84。对整个报告进行分类时,M0 文档的 F1 得分为 0.92,M1 文档的 F1 得分为 0.90:这些结果表明,该模型在检测三种不同类型癌症的远处转移结果以及随后对报告进行分类方面非常有用。这项研究的意义在于从西班牙语的自由文本成像报告中生成结构化的远处转移信息。此外,人工标注的语料库、标注指南和代码也免费向研究界发布。
Automatic Detection of Distant Metastasis Mentions in Radiology Reports in Spanish.
Purpose: A critical task in oncology is extracting information related to cancer metastasis from electronic health records. Metastasis-related information is crucial for planning treatment, evaluating patient prognoses, and cancer research. However, the unstructured way in which findings of distant metastasis are often written in radiology reports makes it difficult to extract information automatically. The main aim of this study was to extract distant metastasis findings from free-text imaging and nuclear medicine reports to classify the patient status according to the presence or absence of distant metastasis.
Materials and methods: We created a distant metastasis annotated corpus using positron emission tomography-computed tomography and computed tomography reports of patients with prostate, colorectal, and breast cancers. Entities were labeled M1 or M0 according to affirmative or negative metastasis descriptions. We used a named entity recognition model on the basis of a bidirectional long short-term memory model and conditional random fields to identify entities. Mentions were subsequently used to classify whole reports into M1 or M0.
Results: The model detected distant metastasis mentions with a weighted average F1 score performance of 0.84. Whole reports were classified with an F1 score of 0.92 for M0 documents and 0.90 for M1 documents.
Conclusion: These results show the usefulness of the model in detecting distant metastasis findings in three different types of cancer and the consequent classification of reports. The relevance of this study is to generate structured distant metastasis information from free-text imaging reports in Spanish. In addition, the manually annotated corpus, annotation guidelines, and code are freely released to the research community.