在没有英文翻译的土耳其急诊科使用n-Grams进行综合征监测:可行性研究。

Biomedical informatics insights Pub Date : 2013-04-25 Print Date: 2013-01-01 DOI:10.4137/BII.S11334

Sylvia Halász, Philip Brown, Cem Oktay, Arif Alper Cevik, Isa Kılıçaslan, Colin Goodall, Dennis G Cochrane, Thomas R Fowler, Guy Jacobson, Simon Tse, John R Allegra

{"title":"在没有英文翻译的土耳其急诊科使用n-Grams进行综合征监测:可行性研究。","authors":"Sylvia Halász, Philip Brown, Cem Oktay, Arif Alper Cevik, Isa Kılıçaslan, Colin Goodall, Dennis G Cochrane, Thomas R Fowler, Guy Jacobson, Simon Tse, John R Allegra","doi":"10.4137/BII.S11334","DOIUrl":null,"url":null,"abstract":"Introduction: Syndromic surveillance is designed for early detection of disease outbreaks. An important data source for syndromic surveillance is free-text chief complaints (CCs), which are generally recorded in the local language. For automated syndromic surveillance, CCs must be classified into predefined syndromic categories. The n-gram classifier is created by using text fragments to measure associations between chief complaints (CC) and a syndromic grouping of ICD codes.Objectives: The objective was to create a Turkish n-gram CC classifier for the respiratory syndrome and then compare daily volumes between the n-gram CC classifier and a respiratory ICD-10 code grouping on a test set of data.Methods: The design was a feasibility study based on retrospective cohort data. The setting was a university hospital emergency department (ED) in Turkey. Included were all ED visits in the 2002 database of this hospital. Two of the authors created a respiratory grouping of International Classification of Diseases, 10th Revision ICD-10-CM codes by consensus, chosen to be similar to a standard respiratory (RESP) grouping of ICD codes created by the Electronic Surveillance System for Early Notification of Community-based Epidemics (ESSENCE), a project of the Centers for Disease Control and Prevention. An n-gram method adapted from AT&T Labs' technologies was applied to the first 10 months of data as a training set to create a Turkish CC RESP classifier. The classifier was then tested on the subsequent 2 months of visits to generate a time series graph and determine the correlation with daily volumes measured by the CC classifier versus the RESP ICD-10 grouping.Results: The Turkish ED database contained 30,157 visits. The correlation (R (2)) of n-gram versus ICD-10 for the test set was 0.78.Conclusion: The n-gram method automatically created a CC RESP classifier of the Turkish CCs that performed similarly to the ICD-10 RESP grouping. The n-gram technique has the advantage of systematic, consistent, and rapid deployment as well as language independence.","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"6 ","pages":"29-33"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S11334","citationCount":"0","resultStr":"{\"title\":\"Using n-Grams for Syndromic Surveillance in a Turkish Emergency Department Without English Translation: A Feasibility Study.\",\"authors\":\"Sylvia Halász, Philip Brown, Cem Oktay, Arif Alper Cevik, Isa Kılıçaslan, Colin Goodall, Dennis G Cochrane, Thomas R Fowler, Guy Jacobson, Simon Tse, John R Allegra\",\"doi\":\"10.4137/BII.S11334\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Syndromic surveillance is designed for early detection of disease outbreaks. An important data source for syndromic surveillance is free-text chief complaints (CCs), which are generally recorded in the local language. For automated syndromic surveillance, CCs must be classified into predefined syndromic categories. The n-gram classifier is created by using text fragments to measure associations between chief complaints (CC) and a syndromic grouping of ICD codes.Objectives: The objective was to create a Turkish n-gram CC classifier for the respiratory syndrome and then compare daily volumes between the n-gram CC classifier and a respiratory ICD-10 code grouping on a test set of data.Methods: The design was a feasibility study based on retrospective cohort data. The setting was a university hospital emergency department (ED) in Turkey. Included were all ED visits in the 2002 database of this hospital. Two of the authors created a respiratory grouping of International Classification of Diseases, 10th Revision ICD-10-CM codes by consensus, chosen to be similar to a standard respiratory (RESP) grouping of ICD codes created by the Electronic Surveillance System for Early Notification of Community-based Epidemics (ESSENCE), a project of the Centers for Disease Control and Prevention. An n-gram method adapted from AT&T Labs' technologies was applied to the first 10 months of data as a training set to create a Turkish CC RESP classifier. The classifier was then tested on the subsequent 2 months of visits to generate a time series graph and determine the correlation with daily volumes measured by the CC classifier versus the RESP ICD-10 grouping.Results: The Turkish ED database contained 30,157 visits. The correlation (R (2)) of n-gram versus ICD-10 for the test set was 0.78.Conclusion: The n-gram method automatically created a CC RESP classifier of the Turkish CCs that performed similarly to the ICD-10 RESP grouping. The n-gram technique has the advantage of systematic, consistent, and rapid deployment as well as language independence.\",\"PeriodicalId\":88397,\"journal\":{\"name\":\"Biomedical informatics insights\",\"volume\":\"6 \",\"pages\":\"29-33\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.4137/BII.S11334\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical informatics insights\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4137/BII.S11334\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2013/1/1 0:00:00\",\"PubModel\":\"Print\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical informatics insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4137/BII.S11334","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2013/1/1 0:00:00","PubModel":"Print","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

引言：综合征监测是为早期发现疾病暴发而设计的。症状监测的一个重要数据来源是免费文本的主要投诉（CC），这些投诉通常用当地语言记录。对于自动症状监测，CC必须分类为预定义的症状类别。n-gram分类器是通过使用文本片段来测量主要抱怨（CC）和ICD代码的综合征分组之间的关联来创建的。目的：目的是为呼吸综合征创建一个土耳其n-gram CC分类器，然后在测试数据集上比较n-gram CC分类和呼吸ICD-10代码分组之间的每日容量。方法：该设计是一项基于回顾性队列数据的可行性研究。当时的场景是土耳其的一所大学医院急诊科。包括该医院2002年数据库中的所有急诊就诊。其中两位作者一致创建了国际疾病分类第10版ICD-10-CM代码的呼吸分组，选择与疾病控制和预防中心的项目社区流行病早期通知电子监测系统（ESSENCE）创建的ICD代码的标准呼吸（RESP）分组相似。将改编自AT&T实验室技术的n-gram方法应用于前10个月的数据，作为创建土耳其CC RESP分类器的训练集。然后在随后的2个月的访问中测试分类器，以生成时间序列图，并确定CC分类器与RESP ICD-10分组测量的每日体积的相关性。结果：土耳其ED数据库包含30157次访问。测试集的n-gram与ICD-10的相关性（R（2））为0.78。结论：n-gram方法自动创建了土耳其CC的CC RESP分类器，其表现类似于ICD-10 RESP分组。n-gram技术具有系统、一致、快速部署以及语言独立性的优点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Using n-Grams for Syndromic Surveillance in a Turkish Emergency Department Without English Translation: A Feasibility Study.

查看原文本刊更多论文

Using n-Grams for Syndromic Surveillance in a Turkish Emergency Department Without English Translation: A Feasibility Study.

Introduction: Syndromic surveillance is designed for early detection of disease outbreaks. An important data source for syndromic surveillance is free-text chief complaints (CCs), which are generally recorded in the local language. For automated syndromic surveillance, CCs must be classified into predefined syndromic categories. The n-gram classifier is created by using text fragments to measure associations between chief complaints (CC) and a syndromic grouping of ICD codes.

Objectives: The objective was to create a Turkish n-gram CC classifier for the respiratory syndrome and then compare daily volumes between the n-gram CC classifier and a respiratory ICD-10 code grouping on a test set of data.

Methods: The design was a feasibility study based on retrospective cohort data. The setting was a university hospital emergency department (ED) in Turkey. Included were all ED visits in the 2002 database of this hospital. Two of the authors created a respiratory grouping of International Classification of Diseases, 10th Revision ICD-10-CM codes by consensus, chosen to be similar to a standard respiratory (RESP) grouping of ICD codes created by the Electronic Surveillance System for Early Notification of Community-based Epidemics (ESSENCE), a project of the Centers for Disease Control and Prevention. An n-gram method adapted from AT&T Labs' technologies was applied to the first 10 months of data as a training set to create a Turkish CC RESP classifier. The classifier was then tested on the subsequent 2 months of visits to generate a time series graph and determine the correlation with daily volumes measured by the CC classifier versus the RESP ICD-10 grouping.

Results: The Turkish ED database contained 30,157 visits. The correlation (R (2)) of n-gram versus ICD-10 for the test set was 0.78.

Conclusion: The n-gram method automatically created a CC RESP classifier of the Turkish CCs that performed similarly to the ICD-10 RESP grouping. The n-gram technique has the advantage of systematic, consistent, and rapid deployment as well as language independence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical informatics insights

自引率

0.00%

发文量