Xiaomei Zhou , Tao Zeng , Yibo Zhang , Yingying Liao , Jaime Smith , Lin Zhang , Chao Wang , Qinghai Li , Dongbo Wu , Yutian Chong , Xinhua Li
{"title":"用于真实世界慢性乙型肝炎队列研究的自动数据收集工具:利用 OCR 和 NLP 技术提高效率","authors":"Xiaomei Zhou , Tao Zeng , Yibo Zhang , Yingying Liao , Jaime Smith , Lin Zhang , Chao Wang , Qinghai Li , Dongbo Wu , Yutian Chong , Xinhua Li","doi":"10.1016/j.nmni.2024.101469","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Collecting and standardizing clinical research data is a very tedious task. This study is to develop an intelligent data collection tool, named CHB-EDC, for real-world cohort studies of chronic hepatitis B (CHB), which can assist in standardized and efficient data collection.</p></div><div><h3>Methods</h3><p>CHB_EDC is capable of automatically processing various formats of data, including raw data in image format, using internationally recognized data standards, OCR, and NLP models. It can automatically populate the data into eCRFs designed in the REDCap system, supporting the integration of patient data from electronic medical record systems through commonly used web application interfaces. This tool enables intelligent extraction and aggregation of data, as well as secure and anonymous data sharing.</p></div><div><h3>Results</h3><p>For non-electronic data collection, the average accuracy of manual collection was 98.65 %, with an average time of 63.64 min to collect information for one patient. The average accuracy CHB-EDC was 98.66 %, with an average time of 3.57 min to collect information for one patient. In the same data collection task, CHB-EDC achieved a comparable average accuracy to manual collection. However, in terms of time, CHB-EDC significantly outperformed manual collection (p < 0.05). Our research has significantly reduced the required collection time and lowered the cost of data collection while ensuring accuracy.</p></div><div><h3>Conclusion</h3><p>The tool has significantly improved the efficiency of data collection while ensuring accuracy, enabling standardized collection of real-world data.</p></div>","PeriodicalId":38074,"journal":{"name":"New Microbes and New Infections","volume":"62 ","pages":"Article 101469"},"PeriodicalIF":2.9000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2052297524002531/pdfft?md5=d192ae7ffffc801b31fe3c79f37b552b&pid=1-s2.0-S2052297524002531-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Automated data collection tool for real-world cohort studies of chronic hepatitis B: Leveraging OCR and NLP technologies for improved efficiency\",\"authors\":\"Xiaomei Zhou , Tao Zeng , Yibo Zhang , Yingying Liao , Jaime Smith , Lin Zhang , Chao Wang , Qinghai Li , Dongbo Wu , Yutian Chong , Xinhua Li\",\"doi\":\"10.1016/j.nmni.2024.101469\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Collecting and standardizing clinical research data is a very tedious task. This study is to develop an intelligent data collection tool, named CHB-EDC, for real-world cohort studies of chronic hepatitis B (CHB), which can assist in standardized and efficient data collection.</p></div><div><h3>Methods</h3><p>CHB_EDC is capable of automatically processing various formats of data, including raw data in image format, using internationally recognized data standards, OCR, and NLP models. It can automatically populate the data into eCRFs designed in the REDCap system, supporting the integration of patient data from electronic medical record systems through commonly used web application interfaces. This tool enables intelligent extraction and aggregation of data, as well as secure and anonymous data sharing.</p></div><div><h3>Results</h3><p>For non-electronic data collection, the average accuracy of manual collection was 98.65 %, with an average time of 63.64 min to collect information for one patient. The average accuracy CHB-EDC was 98.66 %, with an average time of 3.57 min to collect information for one patient. In the same data collection task, CHB-EDC achieved a comparable average accuracy to manual collection. However, in terms of time, CHB-EDC significantly outperformed manual collection (p < 0.05). Our research has significantly reduced the required collection time and lowered the cost of data collection while ensuring accuracy.</p></div><div><h3>Conclusion</h3><p>The tool has significantly improved the efficiency of data collection while ensuring accuracy, enabling standardized collection of real-world data.</p></div>\",\"PeriodicalId\":38074,\"journal\":{\"name\":\"New Microbes and New Infections\",\"volume\":\"62 \",\"pages\":\"Article 101469\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2052297524002531/pdfft?md5=d192ae7ffffc801b31fe3c79f37b552b&pid=1-s2.0-S2052297524002531-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"New Microbes and New Infections\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2052297524002531\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFECTIOUS DISEASES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Microbes and New Infections","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2052297524002531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
Automated data collection tool for real-world cohort studies of chronic hepatitis B: Leveraging OCR and NLP technologies for improved efficiency
Background
Collecting and standardizing clinical research data is a very tedious task. This study is to develop an intelligent data collection tool, named CHB-EDC, for real-world cohort studies of chronic hepatitis B (CHB), which can assist in standardized and efficient data collection.
Methods
CHB_EDC is capable of automatically processing various formats of data, including raw data in image format, using internationally recognized data standards, OCR, and NLP models. It can automatically populate the data into eCRFs designed in the REDCap system, supporting the integration of patient data from electronic medical record systems through commonly used web application interfaces. This tool enables intelligent extraction and aggregation of data, as well as secure and anonymous data sharing.
Results
For non-electronic data collection, the average accuracy of manual collection was 98.65 %, with an average time of 63.64 min to collect information for one patient. The average accuracy CHB-EDC was 98.66 %, with an average time of 3.57 min to collect information for one patient. In the same data collection task, CHB-EDC achieved a comparable average accuracy to manual collection. However, in terms of time, CHB-EDC significantly outperformed manual collection (p < 0.05). Our research has significantly reduced the required collection time and lowered the cost of data collection while ensuring accuracy.
Conclusion
The tool has significantly improved the efficiency of data collection while ensuring accuracy, enabling standardized collection of real-world data.