Khaled Dawoud, Shang Gao, Ala Qabaja, P. Karampelas, R. Alhajj
{"title":"结合信息提取和文本挖掘的癌症生物标志物检测","authors":"Khaled Dawoud, Shang Gao, Ala Qabaja, P. Karampelas, R. Alhajj","doi":"10.1145/2492517.2500281","DOIUrl":null,"url":null,"abstract":"Information technology is advancing faster than anticipated. The amount of data captured and stored in electronic form by far exceeds the capabilities available for comprehensive analysis and effective knowledge discovery. There is always a need for new sophisticated techniques that could extract more of the knowledge hidden in the raw data collected continuously in huge repositories. Biomedicine and computational biology is one of the domains overwhelmed with huge amounts of data that should be carefully analyzed for valuable knowledge that may help uncovering many of the still unknown information related to various diseases threatening the human body. Biomarker detection is one of the areas which have received considerable attention in the research community. There are two sources of data that could be analyzed for biomarker detection, namely gene expression data and the rich literature related to the domain. Our research group has reported achievements analyzing both domains. In this paper, we concentrate on the latter domain by describing a powerful tool which is capable of extracting from the content of a repository (like PubMed) the parts related to a given specific domain like cancer, analyze the retrieved text to extract the key terms with high frequency, present the extracted terms to domain experts for selecting those most relevant to the investigated domain, retrieve from the analyzed text molecules related to the domain by considering the relevant terms, derive the network which will be analyzed to identify potential biomarkers. For the work described in this paper, we considered PubMed and extracted abstracts related to prostate and breast cancer. The reported results are promising; they demonstrate the effectiveness and applicability of the proposed approach.","PeriodicalId":442230,"journal":{"name":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Combining information extraction and text mining for cancer biomarker detection\",\"authors\":\"Khaled Dawoud, Shang Gao, Ala Qabaja, P. Karampelas, R. Alhajj\",\"doi\":\"10.1145/2492517.2500281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information technology is advancing faster than anticipated. The amount of data captured and stored in electronic form by far exceeds the capabilities available for comprehensive analysis and effective knowledge discovery. There is always a need for new sophisticated techniques that could extract more of the knowledge hidden in the raw data collected continuously in huge repositories. Biomedicine and computational biology is one of the domains overwhelmed with huge amounts of data that should be carefully analyzed for valuable knowledge that may help uncovering many of the still unknown information related to various diseases threatening the human body. Biomarker detection is one of the areas which have received considerable attention in the research community. There are two sources of data that could be analyzed for biomarker detection, namely gene expression data and the rich literature related to the domain. Our research group has reported achievements analyzing both domains. In this paper, we concentrate on the latter domain by describing a powerful tool which is capable of extracting from the content of a repository (like PubMed) the parts related to a given specific domain like cancer, analyze the retrieved text to extract the key terms with high frequency, present the extracted terms to domain experts for selecting those most relevant to the investigated domain, retrieve from the analyzed text molecules related to the domain by considering the relevant terms, derive the network which will be analyzed to identify potential biomarkers. For the work described in this paper, we considered PubMed and extracted abstracts related to prostate and breast cancer. The reported results are promising; they demonstrate the effectiveness and applicability of the proposed approach.\",\"PeriodicalId\":442230,\"journal\":{\"name\":\"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2492517.2500281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2492517.2500281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Combining information extraction and text mining for cancer biomarker detection
Information technology is advancing faster than anticipated. The amount of data captured and stored in electronic form by far exceeds the capabilities available for comprehensive analysis and effective knowledge discovery. There is always a need for new sophisticated techniques that could extract more of the knowledge hidden in the raw data collected continuously in huge repositories. Biomedicine and computational biology is one of the domains overwhelmed with huge amounts of data that should be carefully analyzed for valuable knowledge that may help uncovering many of the still unknown information related to various diseases threatening the human body. Biomarker detection is one of the areas which have received considerable attention in the research community. There are two sources of data that could be analyzed for biomarker detection, namely gene expression data and the rich literature related to the domain. Our research group has reported achievements analyzing both domains. In this paper, we concentrate on the latter domain by describing a powerful tool which is capable of extracting from the content of a repository (like PubMed) the parts related to a given specific domain like cancer, analyze the retrieved text to extract the key terms with high frequency, present the extracted terms to domain experts for selecting those most relevant to the investigated domain, retrieve from the analyzed text molecules related to the domain by considering the relevant terms, derive the network which will be analyzed to identify potential biomarkers. For the work described in this paper, we considered PubMed and extracted abstracts related to prostate and breast cancer. The reported results are promising; they demonstrate the effectiveness and applicability of the proposed approach.