预处理对分布式表示的影响:以日本放射学报告为例

Taro Tada, Kazuhide Yamamoto
{"title":"预处理对分布式表示的影响:以日本放射学报告为例","authors":"Taro Tada, Kazuhide Yamamoto","doi":"10.1109/IALP48816.2019.9037678","DOIUrl":null,"url":null,"abstract":"A radiology report is a medical document based on an examination image in a hospital. However, the preparation of this report is a burden on busy physicians. To support them, a retrieval system of past documents to prepare radiology reports is required. In recent years, distributed representation has been used in various NLP tasks and its usefulness has been demonstrated. However, there is not much research about Japanese medical documents that use distributed representations. In this study, we investigate preprocessing on a retrieval system with a distributed representation of the radiology report, as a first step. As a result, we confirmed that in word segmentation using Morphological analyzer and dictionaries, medical terms in radiology reports are not handled as long nouns, but are more effective as shorter nouns like subwords. We also confirmed that text segmentation by SentencePiece to obtain sentence distributed representation reflects more sentence characteristics. Furthermore, by removing some phrases from the radiology report based on frequency, we were able to reflect the characteristics of the document and avoid unnecessary high similarity between documents. It was confirmed that preprocessing was effective in this task.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Effect of Preprocessing for Distributed Representations: Case Study of Japanese Radiology Reports\",\"authors\":\"Taro Tada, Kazuhide Yamamoto\",\"doi\":\"10.1109/IALP48816.2019.9037678\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A radiology report is a medical document based on an examination image in a hospital. However, the preparation of this report is a burden on busy physicians. To support them, a retrieval system of past documents to prepare radiology reports is required. In recent years, distributed representation has been used in various NLP tasks and its usefulness has been demonstrated. However, there is not much research about Japanese medical documents that use distributed representations. In this study, we investigate preprocessing on a retrieval system with a distributed representation of the radiology report, as a first step. As a result, we confirmed that in word segmentation using Morphological analyzer and dictionaries, medical terms in radiology reports are not handled as long nouns, but are more effective as shorter nouns like subwords. We also confirmed that text segmentation by SentencePiece to obtain sentence distributed representation reflects more sentence characteristics. Furthermore, by removing some phrases from the radiology report based on frequency, we were able to reflect the characteristics of the document and avoid unnecessary high similarity between documents. It was confirmed that preprocessing was effective in this task.\",\"PeriodicalId\":208066,\"journal\":{\"name\":\"2019 International Conference on Asian Language Processing (IALP)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Asian Language Processing (IALP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP48816.2019.9037678\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP48816.2019.9037678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

放射学报告是基于医院检查图像的医学文件。然而,这份报告的编写对忙碌的医生来说是一种负担。为了支持他们,需要一个过去文件的检索系统来准备放射学报告。近年来,分布式表示已被应用于各种NLP任务中,其实用性已得到证明。然而,关于使用分布式表示的日本医疗文件的研究并不多。在这项研究中,我们研究了一个检索系统的预处理与一个分布式表示的放射学报告,作为第一步。结果,我们证实了在使用形态学分析器和字典进行分词时,放射学报告中的医学术语不是作为长名词处理,而是作为子词等较短的名词更有效。我们还证实了通过sensenepece进行文本分割得到的句子分布式表示能够反映更多的句子特征。此外,通过基于频率从放射学报告中删除一些短语,我们能够反映文档的特征,并避免文档之间不必要的高相似性。验证了预处理在该任务中的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Effect of Preprocessing for Distributed Representations: Case Study of Japanese Radiology Reports
A radiology report is a medical document based on an examination image in a hospital. However, the preparation of this report is a burden on busy physicians. To support them, a retrieval system of past documents to prepare radiology reports is required. In recent years, distributed representation has been used in various NLP tasks and its usefulness has been demonstrated. However, there is not much research about Japanese medical documents that use distributed representations. In this study, we investigate preprocessing on a retrieval system with a distributed representation of the radiology report, as a first step. As a result, we confirmed that in word segmentation using Morphological analyzer and dictionaries, medical terms in radiology reports are not handled as long nouns, but are more effective as shorter nouns like subwords. We also confirmed that text segmentation by SentencePiece to obtain sentence distributed representation reflects more sentence characteristics. Furthermore, by removing some phrases from the radiology report based on frequency, we were able to reflect the characteristics of the document and avoid unnecessary high similarity between documents. It was confirmed that preprocessing was effective in this task.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信