Aleksandar Kovačević, Azad Dehghan, John A Keane, Goran Nenadic
{"title":"Topic categorisation of statements in suicide notes with integrated rules and machine learning.","authors":"Aleksandar Kovačević, Azad Dehghan, John A Keane, Goran Nenadic","doi":"10.4137/BII.S8978","DOIUrl":"10.4137/BII.S8978","url":null,"abstract":"<p><p>We describe and evaluate an automated approach used as part of the i2b2 2011 challenge to identify and categorise statements in suicide notes into one of 15 topics, including Love, Guilt, Thankfulness, Hopelessness and Instructions. The approach combines a set of lexico-syntactic rules with a set of models derived by machine learning from a training dataset. The machine learning models rely on named entities, lexical, lexico-semantic and presentation features, as well as the rules that are applicable to a given statement. On a testing set of 300 suicide notes, the approach showed the overall best micro F-measure of up to 53.36%. The best precision achieved was 67.17% when only rules are used, whereas best recall of 50.57% was with integrated rules and machine learning. While some topics (eg, Sorrow, Anger, Blame) prove challenging, the performance for relatively frequent (eg, Love) and well-scoped categories (eg, Thankfulness) was comparatively higher (precision between 68% and 79%), suggesting that automated text mining approaches can be effective in topic categorisation of suicide notes.</p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"5 Suppl. 1","pages":"115-24"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3409492/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30824241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James A McCart, Dezon K Finch, Jay Jarman, Edward Hickling, Jason D Lind, Matthew R Richardson, Donald J Berndt, Stephen L Luther
{"title":"Using ensemble models to classify the sentiment expressed in suicide notes.","authors":"James A McCart, Dezon K Finch, Jay Jarman, Edward Hickling, Jason D Lind, Matthew R Richardson, Donald J Berndt, Stephen L Luther","doi":"10.4137/BII.S8931","DOIUrl":"https://doi.org/10.4137/BII.S8931","url":null,"abstract":"<p><p>In 2007, suicide was the tenth leading cause of death in the U.S. Given the significance of this problem, suicide was the focus of the 2011 Informatics for Integrating Biology and the Bedside (i2b2) Natural Language Processing (NLP) shared task competition (track two). Specifically, the challenge concentrated on sentiment analysis, predicting the presence or absence of 15 emotions (labels) simultaneously in a collection of suicide notes spanning over 70 years. Our team explored multiple approaches combining regular expression-based rules, statistical text mining (STM), and an approach that applies weights to text while accounting for multiple labels. Our best submission used an ensemble of both rules and STM models to achieve a micro-averaged F(1) score of 0.5023, slightly above the mean from the 26 teams that competed (0.4875).</p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"5 Suppl. 1","pages":"77-85"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S8931","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30824237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Irena Spasić, Pete Burnap, Mark Greenwood, Michael Arribas-Ayllon
{"title":"A naïve bayes approach to classifying topics in suicide notes.","authors":"Irena Spasić, Pete Burnap, Mark Greenwood, Michael Arribas-Ayllon","doi":"10.4137/BII.S8945","DOIUrl":"https://doi.org/10.4137/BII.S8945","url":null,"abstract":"<p><p>The authors present a system developed for the 2011 i2b2 Challenge on Sentiment Classification, whose aim was to automatically classify sentences in suicide notes using a scheme of 15 topics, mostly emotions. The system combines machine learning with a rule-based methodology. The features used to represent a problem were based on lexico-semantic properties of individual words in addition to regular expressions used to represent patterns of word usage across different topics. A naïve Bayes classifier was trained using the features extracted from the training data consisting of 600 manually annotated suicide notes. Classification was then performed using the naïve Bayes classifier as well as a set of pattern-matching rules. The classification performance was evaluated against a manually prepared gold standard consisting of 300 suicide notes, in which 1,091 out of a total of 2,037 sentences were associated with a total of 1,272 annotations. The competing systems were ranked using the micro-averaged F-measure as the primary evaluation metric. Our system achieved the F-measure of 53% (with 55% precision and 52% recall), which was significantly better than the average performance of 48.75% achieved by the 26 participating teams.</p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"5 Suppl. 1","pages":"87-97"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S8945","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30824238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Binary classifiers and latent sequence models for emotion detection in suicide notes.","authors":"Colin Cherry, Saif M Mohammad, Berry de Bruijn","doi":"10.4137/BII.S8933","DOIUrl":"10.4137/BII.S8933","url":null,"abstract":"<p><p>This paper describes the National Research Council of Canada's submission to the 2011 i2b2 NLP challenge on the detection of emotions in suicide notes. In this task, each sentence of a suicide note is annotated with zero or more emotions, making it a multi-label sentence classification task. We employ two distinct large-margin models capable of handling multiple labels. The first uses one classifier per emotion, and is built to simplify label balance issues and to allow extremely fast development. This approach is very effective, scoring an F-measure of 55.22 and placing fourth in the competition, making it the best system that does not use web-derived statistics or re-annotated training data. Second, we present a latent sequence model, which learns to segment the sentence into a number of emotion regions. This model is intended to gracefully handle sentences that convey multiple thoughts and emotions. Preliminary work with the latent sequence model shows promise, resulting in comparable performance using fewer features.</p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"5 Suppl. 1","pages":"147-54"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3409480/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30824823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of LDA and SPRT on Clinical Dataset Classifications.","authors":"Chih Lee, Brittany Nkounkou, Chun-Hsi Huang","doi":"10.4137/BII.S6935","DOIUrl":"https://doi.org/10.4137/BII.S6935","url":null,"abstract":"<p><p>In this work, we investigate the well-known classification algorithm LDA as well as its close relative SPRT. SPRT affords many theoretical advantages over LDA. It allows specification of desired classification error rates α and β and is expected to be faster in predicting the class label of a new instance. However, SPRT is not as widely used as LDA in the pattern recognition and machine learning community. For this reason, we investigate LDA, SPRT and a modified SPRT (MSPRT) empirically using clinical datasets from Parkinson's disease, colon cancer, and breast cancer. We assume the same normality assumption as LDA and propose variants of the two SPRT algorithms based on the order in which the components of an instance are sampled. Leave-one-out cross-validation is used to assess and compare the performance of the methods. The results indicate that two variants, SPRT-ordered and MSPRT-ordered, are superior to LDA in terms of prediction accuracy. Moreover, on average SPRT-ordered and MSPRT-ordered examine less components than LDA before arriving at a decision. These advantages imply that SPRT-ordered and MSPRT-ordered are the preferred algorithms over LDA when the normality assumption can be justified for a dataset.</p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"4 ","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2011-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S6935","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30167881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differential Transcriptional Changes in Mice Exposed to Chemically Distinct Diesel Samples.","authors":"Tina Stevens, Susan Hester, M Ian Gilmour","doi":"10.4137/bii.s5363","DOIUrl":"https://doi.org/10.4137/bii.s5363","url":null,"abstract":"<p><p>Epidemiological studies have linked exposure to ambient particulate matter (PM) with increased asthmatic symptoms. Diesel exhaust particles (DEP) are a predominant source of vehicle derived ambient PM, and experimental studies have demonstrated that they may have adjuvant potential when given with an antigen. We previously compared 3 DEP samples: N-DEP, A-DEP, and C-DEP in a murine ovalbumin (OVA) mucosal sensitization model and reported the adjuvant activity to be: C-DEP ≈ A-DEP > N-DEP. The present study analyzed gene expression changes from the lungs of these mice. Transcription profiling demonstrated that all the DEP samples altered cytokine and toll-like receptor pathways regardless of type, with or without antigen sensitization. Further analysis of DEP exposure with OVA showed that all DEP treatments altered networks involved in immune and inflammatory responses. The A- and C-DEP/OVA treatments induced differential expression of apoptosis pathways in association with stronger adjuvant responses, while expression of cell cycle control and DNA damage pathways were also altered in the C-DEP/OVA treatment. This comprehensive approach using gene expression analysis to examine changes at a pathway level provides detailed information on events occurring in the lung after DEP exposure, and confirms that the most bioactive sample induced many more individual genes and changes in immunoregulatory and homeostatic pathways. </p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"3 ","pages":"29-52"},"PeriodicalIF":0.0,"publicationDate":"2010-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/bii.s5363","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34605801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Pestian, Henry Nasrallah, Pawel Matykiewicz, Aurora Bennett, Antoon Leenaars
{"title":"Suicide Note Classification Using Natural Language Processing: A Content Analysis.","authors":"John Pestian, Henry Nasrallah, Pawel Matykiewicz, Aurora Bennett, Antoon Leenaars","doi":"10.4137/bii.s4706","DOIUrl":"10.4137/bii.s4706","url":null,"abstract":"<p><p>Suicide is the second leading cause of death among 25-34 year olds and the third leading cause of death among 15-25 year olds in the United States. In the Emergency Department, where suicidal patients often present, estimating the risk of repeated attempts is generally left to clinical judgment. This paper presents our second attempt to determine the role of computational algorithms in understanding a suicidal patient's thoughts, as represented by suicide notes. We focus on developing methods of natural language processing that distinguish between genuine and elicited suicide notes. We hypothesize that machine learning algorithms can categorize suicide notes as well as mental health professionals and psychiatric physician trainees do. The data used are comprised of suicide notes from 33 suicide completers and matched to 33 elicited notes from healthy control group members. Eleven mental health professionals and 31 psychiatric trainees were asked to decide if a note was genuine or elicited. Their decisions were compared to nine different machine-learning algorithms. The results indicate that trainees accurately classified notes 49% of the time, mental health professionals accurately classified notes 63% of the time, and the best machine learning algorithm accurately classified the notes 78% of the time. This is an important step in developing an evidence-based predictor of repeated suicide attempts because it shows that natural language processing can aid in distinguishing between classes of suicidal notes.</p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"2010 3","pages":"19-28"},"PeriodicalIF":0.0,"publicationDate":"2010-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/bii.s4706","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30218837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Risk Judgment by General Dental practitioners: Rational but Uninformed.","authors":"Eva Ellervall, Berndt Brehmer, Kerstin Knutsson","doi":"10.4137/bii.s4067","DOIUrl":"https://doi.org/10.4137/bii.s4067","url":null,"abstract":"<p><strong>Background: </strong>Decisions by dentists to administer antibiotic prophylaxis to prevent infectious complications in patients involves professional risk assessment. While recommendations for rational use have been published, several studies have shown that dentists have low adherence to these recommendations.</p><p><strong>Objective: </strong>To examine general dental practitioners' (GDPs') assessments of the risk of complications if not administering antibiotic prophylaxis in connection with dental procedures in patients with specific medical conditions.</p><p><strong>Methods: </strong>Postal questionnaires in combination with telephone interviews. Risk assessments were made using visual analogue scales (VAS), where zero represented \"insignificant risk\" and 100 represented a \"very significant risk\".</p><p><strong>Results: </strong>Response rate: 51%. The mean risk assessments were higher for GDPs who administered antibiotics (mean = 54, SD = 23, range 26-72 mm on the VAS) than those who did not (mean = 14, SD = 12, range 7-31 mm) (P < 0.05). Generally, GDPs made higher risk assessments for patients with medical conditions that are included in recommendations than those with conditions that are not included. Overall, risk assessments were higher for tooth removal than for scaling or root canal treatment, even though the risk assessments should be considered equal for these interventions.</p><p><strong>Conclusions: </strong>GDPs' risk assessments were rational but uninformed. They administered antibiotics in a manner that was consistent with their risk assessments. Their risk assessments, however, were overestimated. Inaccurate judgments of risk should not be expected to disappear in the presence of new information. To achieve change, clinicians must be motivated to improve behaviour and an evidence-based implementation strategy is required.</p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"3 ","pages":"11-7"},"PeriodicalIF":0.0,"publicationDate":"2010-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/bii.s4067","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34605800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Chen, Rebekah Wu, James Felton, David M Rocke, Anu Chakicherla
{"title":"A Method to Detect Differential Gene expression in Cross-Species Hybridization Experiments at Gene and Probe Level.","authors":"Ying Chen, Rebekah Wu, James Felton, David M Rocke, Anu Chakicherla","doi":"10.4137/BII.S3846","DOIUrl":"https://doi.org/10.4137/BII.S3846","url":null,"abstract":"<p><strong>Motivation: </strong>Whole genome microarrays are increasingly becoming the method of choice to study responses in model organisms to disease, stressors or other stimuli. However, whole genome sequences are available for only some model organisms, and there are still many species whose genome sequences are not yet available. Cross-species studies, where arrays developed for one species are used to study gene expression in a closely related species, have been used to address this gap, with some promising results. Current analytical methods have included filtration of some probes or genes that showed low hybridization activities. But consensus filtration schemes are still not available.</p><p><strong>Results: </strong>A novel masking procedure is proposed based on currently available target species sequences to filter out probes and study a cross-species data set using this masking procedure and gene-set analysis. Gene-set analysis evaluates the association of some priori defined gene groups with a phenotype of interest. Two methods, Gene Set Enrichment Analysis (GSEA) and Test of Test Statistics (ToTS) were investigated. The results showed that masking procedure together with ToTS method worked well in our data set. The results from an alternative way to study cross-species hybridization experiments without masking are also presented. We hypothesize that the multi-probes structure of Affymetrix microarrays makes it possible to aggregate the effects of both well-hybridized and poorly-hybridized probes to study a group of genes. The principles of gene-set analysis were applied to the probe-level data instead of gene-level data. The results showed that ToTS can give valuable information and thus can be used as a powerful technique for analyzing cross-species hybridization experiments.</p><p><strong>Availability: </strong>Software in the form of R code is available at http://anson.ucdavis.edu/~ychen/cross-species.html.</p><p><strong>Supplementary data: </strong>Supplementary data are available at http://anson.ucdavis.edu/~ychen/cross-species.html.</p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"3 ","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2010-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S3846","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"29267556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nathaniel R Tabernero, Wayne A Loschen, Joel Jorgensen, Joshua Suereth, Jacqueline S Coberly, Rekha S Holtry, Marvin L Sikes, Steven M Babin, Sheryl L Happel Lewis
{"title":"Enhancing Disease Surveillance Event Communication Among Jurisdictions.","authors":"Nathaniel R Tabernero, Wayne A Loschen, Joel Jorgensen, Joshua Suereth, Jacqueline S Coberly, Rekha S Holtry, Marvin L Sikes, Steven M Babin, Sheryl L Happel Lewis","doi":"10.4137/bii.s3523","DOIUrl":"10.4137/bii.s3523","url":null,"abstract":"<p><p>Automated disease surveillance systems are becoming widely used by the public health community. However, communication among non-collocated and widely dispersed users still needs improvement. A web-based software tool for enhancing user communications was completely integrated into an existing automated disease surveillance system and was tested during two simulated exercises and operational use involving multiple jurisdictions. Evaluation of this tool was conducted by user meetings, anonymous surveys, and web logs. Public health officials found this tool to be useful, and the tool has been modified further to incorporate features suggested by user responses. Features of the automated disease surveillance system, such as alerts and time series plots, can be specifically referenced by user comments. The user may also indicate the alert response being considered by adding a color indicator to their comment. The web-based event communication tool described in this article provides a common ground for collaboration and communication among public health officials at different locations. </p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"2 ","pages":"31-41"},"PeriodicalIF":0.0,"publicationDate":"2010-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4909157/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34596443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}