Topic categorisation of statements in suicide notes with integrated rules and machine learning.

Biomedical informatics insights Pub Date : 2012-01-01 Epub Date: 2012-01-30 DOI:10.4137/BII.S8978
Aleksandar Kovačević, Azad Dehghan, John A Keane, Goran Nenadic
{"title":"Topic categorisation of statements in suicide notes with integrated rules and machine learning.","authors":"Aleksandar Kovačević, Azad Dehghan, John A Keane, Goran Nenadic","doi":"10.4137/BII.S8978","DOIUrl":null,"url":null,"abstract":"<p><p>We describe and evaluate an automated approach used as part of the i2b2 2011 challenge to identify and categorise statements in suicide notes into one of 15 topics, including Love, Guilt, Thankfulness, Hopelessness and Instructions. The approach combines a set of lexico-syntactic rules with a set of models derived by machine learning from a training dataset. The machine learning models rely on named entities, lexical, lexico-semantic and presentation features, as well as the rules that are applicable to a given statement. On a testing set of 300 suicide notes, the approach showed the overall best micro F-measure of up to 53.36%. The best precision achieved was 67.17% when only rules are used, whereas best recall of 50.57% was with integrated rules and machine learning. While some topics (eg, Sorrow, Anger, Blame) prove challenging, the performance for relatively frequent (eg, Love) and well-scoped categories (eg, Thankfulness) was comparatively higher (precision between 68% and 79%), suggesting that automated text mining approaches can be effective in topic categorisation of suicide notes.</p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"5 Suppl. 1","pages":"115-24"},"PeriodicalIF":0.0000,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3409492/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical informatics insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4137/BII.S8978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/1/30 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We describe and evaluate an automated approach used as part of the i2b2 2011 challenge to identify and categorise statements in suicide notes into one of 15 topics, including Love, Guilt, Thankfulness, Hopelessness and Instructions. The approach combines a set of lexico-syntactic rules with a set of models derived by machine learning from a training dataset. The machine learning models rely on named entities, lexical, lexico-semantic and presentation features, as well as the rules that are applicable to a given statement. On a testing set of 300 suicide notes, the approach showed the overall best micro F-measure of up to 53.36%. The best precision achieved was 67.17% when only rules are used, whereas best recall of 50.57% was with integrated rules and machine learning. While some topics (eg, Sorrow, Anger, Blame) prove challenging, the performance for relatively frequent (eg, Love) and well-scoped categories (eg, Thankfulness) was comparatively higher (precision between 68% and 79%), suggesting that automated text mining approaches can be effective in topic categorisation of suicide notes.

Abstract Image

利用综合规则和机器学习对自杀笔记中的陈述进行主题分类。
我们描述并评估了一种自动方法,该方法是 i2b2 2011 挑战赛的一部分,用于识别遗书中的语句并将其归类为 15 个主题之一,包括爱、内疚、感恩、无望和指示。该方法结合了一套词法-句法规则和一套从训练数据集中通过机器学习得出的模型。机器学习模型依赖于命名实体、词法、词义和表述特征,以及适用于特定语句的规则。在由 300 份自杀笔记组成的测试集上,该方法显示了高达 53.36% 的整体最佳微观 F-measure。在只使用规则的情况下,精确度达到了 67.17%,而在综合使用规则和机器学习的情况下,召回率达到了 50.57%。虽然某些主题(如悲伤、愤怒、自责)具有挑战性,但相对频繁的类别(如爱)和范围较广的类别(如感恩)的性能相对较高(精确度在 68% 到 79% 之间),这表明自动文本挖掘方法可以有效地对自杀笔记进行主题分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信