MiningBreastCancer: Selection of Candidate Gene Associated with Breast Cancer via Comparison between Data Mining of TCGA and Text Mining of PubMed

Chou-Cheng Chen, Y. Kuo, Chi-Hui Chiang
{"title":"MiningBreastCancer: Selection of Candidate Gene Associated with Breast Cancer via Comparison between Data Mining of TCGA and Text Mining of PubMed","authors":"Chou-Cheng Chen, Y. Kuo, Chi-Hui Chiang","doi":"10.1145/3440943.3444718","DOIUrl":null,"url":null,"abstract":"In 2016, 12,676 new cases of breast cancer were diagnosed among Taiwan women. In 2018 the standardized death rate of breast cancer was 12.5 per 100,000 persons. Previous studies have integrated data and text mining to yield fusion genes, identify genetic factors for breast cancer and select single-gene feature sets for colon cancer discrimination. However, our study is the first to select significantly different expression between breast normal tissue and cancer using TCGA data and biostatistics, excluding know genes using abstracts from PubMed and natural language processing. The top twenty genes for research potential from the selection of Mining-BreastCancer are EML3, ABCB9, GRASP, KANK3, GPR146, ZNF623, CCDC9, ADCY4, DLL1, ADAM33, GRRP1, LRRN4CL, C14orf180, ABCD4, ABCC6P1, PEAR1, FAM43A, C20orf160, KIF21A and PP-FIA3. Few studies for these genes exist, but they hold significantly different expressions between breast cancer and normal tissue, each pathologic tumor and lymph node, or between each pathologic metastasis. These results show that MiningBreastCancer can help scientists select genes for research potential. MiningBreastCancer is available through http://bio.yungyun.com.tw/MiningBreastCancer.aspx.","PeriodicalId":310247,"journal":{"name":"Proceedings of the 2020 ACM International Conference on Intelligent Computing and its Emerging Applications","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM International Conference on Intelligent Computing and its Emerging Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3440943.3444718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In 2016, 12,676 new cases of breast cancer were diagnosed among Taiwan women. In 2018 the standardized death rate of breast cancer was 12.5 per 100,000 persons. Previous studies have integrated data and text mining to yield fusion genes, identify genetic factors for breast cancer and select single-gene feature sets for colon cancer discrimination. However, our study is the first to select significantly different expression between breast normal tissue and cancer using TCGA data and biostatistics, excluding know genes using abstracts from PubMed and natural language processing. The top twenty genes for research potential from the selection of Mining-BreastCancer are EML3, ABCB9, GRASP, KANK3, GPR146, ZNF623, CCDC9, ADCY4, DLL1, ADAM33, GRRP1, LRRN4CL, C14orf180, ABCD4, ABCC6P1, PEAR1, FAM43A, C20orf160, KIF21A and PP-FIA3. Few studies for these genes exist, but they hold significantly different expressions between breast cancer and normal tissue, each pathologic tumor and lymph node, or between each pathologic metastasis. These results show that MiningBreastCancer can help scientists select genes for research potential. MiningBreastCancer is available through http://bio.yungyun.com.tw/MiningBreastCancer.aspx.
挖掘乳腺癌:通过TCGA数据挖掘和PubMed文本挖掘的比较选择乳腺癌相关候选基因
2016年,台湾女性确诊乳腺癌新病例12676例。2018年,乳腺癌标准化死亡率为12.5 / 10万人。先前的研究将数据和文本挖掘结合起来,以产生融合基因,确定乳腺癌的遗传因素,并选择单基因特征集用于结肠癌的区分。然而,我们的研究是第一个使用TCGA数据和生物统计学来选择乳腺正常组织和癌症之间显著不同的表达,排除了使用PubMed摘要和自然语言处理的已知基因。在采矿型乳腺癌的筛选中,具有研究潜力的前20个基因分别是EML3、ABCB9、GRASP、KANK3、GPR146、ZNF623、CCDC9、ADCY4、DLL1、ADAM33、GRRP1、LRRN4CL、C14orf180、ABCD4、ABCC6P1、PEAR1、FAM43A、C20orf160、KIF21A和PP-FIA3。对这些基因的研究很少,但它们在乳腺癌与正常组织、各病理性肿瘤与淋巴结、各病理性转移之间的表达存在显著差异。这些结果表明,挖掘乳腺癌可以帮助科学家选择具有研究潜力的基因。MiningBreastCancer可通过http://bio.yungyun.com.tw/MiningBreastCancer.aspx获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信