Three hybrid classifiers for the detection of emotions in suicide notes.

Biomedical informatics insights Pub Date : 2012-01-01 Epub Date: 2012-01-30 DOI:10.4137/BII.S8967

Maria Liakata, Jee-Hyub Kim, Shyamasree Saha, Janna Hastings, Dietrich Rebholz-Schuhmann

{"title":"Three hybrid classifiers for the detection of emotions in suicide notes.","authors":"Maria Liakata, Jee-Hyub Kim, Shyamasree Saha, Janna Hastings, Dietrich Rebholz-Schuhmann","doi":"10.4137/BII.S8967","DOIUrl":null,"url":null,"abstract":"<p><p>We describe our approach for creating a system able to detect emotions in suicide notes. Motivated by the sparse and imbalanced data as well as the complex annotation scheme, we have considered three hybrid approaches for distinguishing between the different categories. Each of the three approaches combines machine learning with manually derived rules, where the latter target very sparse emotion categories. The first approach considers the task as single label multi-class classification, where an SVM and a CRF classifier are trained to recognise fifteen different categories and their results are combined. Our second approach trains individual binary classifiers (SVM and CRF) for each of the fifteen sentence categories and returns the union of the classifiers as the final result. Finally, our third approach is a combination of binary and multi-class classifiers (SVM and CRF) trained on different subsets of the training data. We considered a number of different feature configurations. All three systems were tested on 300 unseen messages. Our second system had the best performance of the three, yielding an F1 score of 45.6% and a Precision of 60.1% whereas our best Recall (43.6%) was obtained using the third system.</p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"5 Suppl. 1","pages":"175-84"},"PeriodicalIF":0.0000,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S8967","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical informatics insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4137/BII.S8967","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/1/30 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

We describe our approach for creating a system able to detect emotions in suicide notes. Motivated by the sparse and imbalanced data as well as the complex annotation scheme, we have considered three hybrid approaches for distinguishing between the different categories. Each of the three approaches combines machine learning with manually derived rules, where the latter target very sparse emotion categories. The first approach considers the task as single label multi-class classification, where an SVM and a CRF classifier are trained to recognise fifteen different categories and their results are combined. Our second approach trains individual binary classifiers (SVM and CRF) for each of the fifteen sentence categories and returns the union of the classifiers as the final result. Finally, our third approach is a combination of binary and multi-class classifiers (SVM and CRF) trained on different subsets of the training data. We considered a number of different feature configurations. All three systems were tested on 300 unseen messages. Our second system had the best performance of the three, yielding an F1 score of 45.6% and a Precision of 60.1% whereas our best Recall (43.6%) was obtained using the third system.

Abstract Image

查看原文本刊更多论文

三种用于检测遗书情绪的混合分类器。

我们描述了创建一个能够检测遗书中情绪的系统的方法。由于数据的稀疏性和不平衡性以及标注方案的复杂性，我们考虑了三种混合方法来区分不同的类别。这三种方法中的每一种都将机器学习与人工导出的规则相结合，后者的目标是非常稀疏的情感类别。第一种方法将任务视为单标签多类分类，其中SVM和CRF分类器被训练以识别15种不同的类别，并将它们的结果结合起来。我们的第二种方法是为15个句子类别中的每一个训练单独的二元分类器(SVM和CRF)，并返回分类器的联合作为最终结果。最后，我们的第三种方法是在训练数据的不同子集上训练的二元分类器和多类分类器(SVM和CRF)的组合。我们考虑了许多不同的特性配置。所有三个系统都测试了300条看不见的消息。我们的第二个系统在三个系统中表现最好，F1得分为45.6%，Precision为60.1%，而我们的最佳召回率(43.6%)是使用第三个系统获得的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biomedical informatics insights

自引率

0.00%

发文量