Masoud Rouhizadeh, A. Magge, A. Klein, A. Sarker, Graciela Gonzalez
{"title":"A Rule-based Approach to Determining Pregnancy Timeframe from Contextual Social Media Postings","authors":"Masoud Rouhizadeh, A. Magge, A. Klein, A. Sarker, Graciela Gonzalez","doi":"10.1145/3194658.3194679","DOIUrl":null,"url":null,"abstract":"Recent advances in social media mining have opened the door to observational studies that are limited only by the capacity of systems deployed to collect and analyze the data. The significance of this power becomes important when studying specific cohorts not typically found in clinical trials or other health-related research, such as pregnant women, who are generally excluded from participating in particular studies for safety concerns. A major challenge of pregnancy studies in social media is determining the pregnancy timeframe, given that the significance of some events (e.g., medication exposure) may depend on the trimester when it occurred. Existing systems that mine pregnancy data from social media have limited coverage and generalizability and have not addressed the problem of automatically determining the estimated beginning and end of pregnancy, and general-purpose temporal taggers deployed on this dataset generate ambiguous results. We present here a rule-based system to automatically identify pregnancy timeframe based on linguistic clues about the progress of pregnancy in users» tweets. In addition, we demonstrate that we could also use this system to find and filter bots and other that repost or quote such expressions.","PeriodicalId":216658,"journal":{"name":"Proceedings of the 2018 International Conference on Digital Health","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Digital Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3194658.3194679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Recent advances in social media mining have opened the door to observational studies that are limited only by the capacity of systems deployed to collect and analyze the data. The significance of this power becomes important when studying specific cohorts not typically found in clinical trials or other health-related research, such as pregnant women, who are generally excluded from participating in particular studies for safety concerns. A major challenge of pregnancy studies in social media is determining the pregnancy timeframe, given that the significance of some events (e.g., medication exposure) may depend on the trimester when it occurred. Existing systems that mine pregnancy data from social media have limited coverage and generalizability and have not addressed the problem of automatically determining the estimated beginning and end of pregnancy, and general-purpose temporal taggers deployed on this dataset generate ambiguous results. We present here a rule-based system to automatically identify pregnancy timeframe based on linguistic clues about the progress of pregnancy in users» tweets. In addition, we demonstrate that we could also use this system to find and filter bots and other that repost or quote such expressions.