Wang Zhao , Dongxiao Gu , Xuejie Yang , Meihuizi Jia , Changyong Liang , Xiaoyu Wang , Oleg Zolotarev
{"title":"MedT2T:针对新的医学文本到表格任务的自适应指针约束生成方法","authors":"Wang Zhao , Dongxiao Gu , Xuejie Yang , Meihuizi Jia , Changyong Liang , Xiaoyu Wang , Oleg Zolotarev","doi":"10.1016/j.future.2024.07.030","DOIUrl":null,"url":null,"abstract":"<div><p>Medical information extraction is a crucial task in the governance of healthcare data within medical information systems in the medical internet network, aimed at extracting vital information from existing content. However, structuring this key information into a table is currently a challenge, hindering the development of AI-driven smart health. In this study, we study the medical text-to-table task based on a new generative perspective. To address the challenges of ineffective numerical embedding, flexible table formats, and dense medical terminology and numerical entities in an end-to-end manner, we present the innovative medical text-to-table model called <strong>MedT2T</strong>. This model, built on the BART backbone, operates in an end-to-end manner and comprises three essential modules: Encoder, Decoder, and Adapter. The Encoder utilizes an innovative adaptive medical numerical constraint to facilitate precise embedding and generation of medical numerical data. The generated output of the Decoder adheres to relational constraints and table formats, ensuring the desired structure and organization. Additionally, the Adapter incorporates an adaptive pointer generation mechanism, allowing for dynamic referencing of medical terminology and numerical information either from the source text or generated through the vocabulary distribution of the Decoder. Our method outperforms existing baselines in terms of exact match, character level match, and BERTScore. We also proved that MedT2T can serve as an essential table extraction tool to bring informative gains for medical downstream classifiers and predictors. This study not only achieved accurate entity generation for tables from lengthy medical texts to improve physician efficiency in accessing critical information for decision-making, but also provided large-scale structured training table data for downstream tasks such as AI-driven smart healthcare.</p></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MedT2T: An adaptive pointer constrain generating method for a new medical text-to-table task\",\"authors\":\"Wang Zhao , Dongxiao Gu , Xuejie Yang , Meihuizi Jia , Changyong Liang , Xiaoyu Wang , Oleg Zolotarev\",\"doi\":\"10.1016/j.future.2024.07.030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Medical information extraction is a crucial task in the governance of healthcare data within medical information systems in the medical internet network, aimed at extracting vital information from existing content. However, structuring this key information into a table is currently a challenge, hindering the development of AI-driven smart health. In this study, we study the medical text-to-table task based on a new generative perspective. To address the challenges of ineffective numerical embedding, flexible table formats, and dense medical terminology and numerical entities in an end-to-end manner, we present the innovative medical text-to-table model called <strong>MedT2T</strong>. This model, built on the BART backbone, operates in an end-to-end manner and comprises three essential modules: Encoder, Decoder, and Adapter. The Encoder utilizes an innovative adaptive medical numerical constraint to facilitate precise embedding and generation of medical numerical data. The generated output of the Decoder adheres to relational constraints and table formats, ensuring the desired structure and organization. Additionally, the Adapter incorporates an adaptive pointer generation mechanism, allowing for dynamic referencing of medical terminology and numerical information either from the source text or generated through the vocabulary distribution of the Decoder. Our method outperforms existing baselines in terms of exact match, character level match, and BERTScore. We also proved that MedT2T can serve as an essential table extraction tool to bring informative gains for medical downstream classifiers and predictors. This study not only achieved accurate entity generation for tables from lengthy medical texts to improve physician efficiency in accessing critical information for decision-making, but also provided large-scale structured training table data for downstream tasks such as AI-driven smart healthcare.</p></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X24003923\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24003923","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
MedT2T: An adaptive pointer constrain generating method for a new medical text-to-table task
Medical information extraction is a crucial task in the governance of healthcare data within medical information systems in the medical internet network, aimed at extracting vital information from existing content. However, structuring this key information into a table is currently a challenge, hindering the development of AI-driven smart health. In this study, we study the medical text-to-table task based on a new generative perspective. To address the challenges of ineffective numerical embedding, flexible table formats, and dense medical terminology and numerical entities in an end-to-end manner, we present the innovative medical text-to-table model called MedT2T. This model, built on the BART backbone, operates in an end-to-end manner and comprises three essential modules: Encoder, Decoder, and Adapter. The Encoder utilizes an innovative adaptive medical numerical constraint to facilitate precise embedding and generation of medical numerical data. The generated output of the Decoder adheres to relational constraints and table formats, ensuring the desired structure and organization. Additionally, the Adapter incorporates an adaptive pointer generation mechanism, allowing for dynamic referencing of medical terminology and numerical information either from the source text or generated through the vocabulary distribution of the Decoder. Our method outperforms existing baselines in terms of exact match, character level match, and BERTScore. We also proved that MedT2T can serve as an essential table extraction tool to bring informative gains for medical downstream classifiers and predictors. This study not only achieved accurate entity generation for tables from lengthy medical texts to improve physician efficiency in accessing critical information for decision-making, but also provided large-scale structured training table data for downstream tasks such as AI-driven smart healthcare.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.