{"title":"A comprehensive tool for text categorization and text summarization in bioinformatics","authors":"Mustofa Kamal, Kazi Zakia Sultana","doi":"10.1109/ICCITECHN.2012.6509764","DOIUrl":null,"url":null,"abstract":"The work focuses on the integration of text categorization and text summarization tasks based on some existing algorithms. We primarily employ the method for bioinformatics literatures to categorize them in relevant domains of bioinformatics and then get a summarized overview of each of the documents in the domain. For text categorization we have chosen three different and core domains of bioinformatics: Protein-Protein Interaction, Disease-Drug Relevance and Pathway-Process Involvement. The method uses TF-IDF based technology for the categorization task and then after categorization it summarizes the key contents of each document using some existing features. The system plays important role in automatically reducing review spaces for the researchers as they do not need to manually select their relevant texts. It also saves time by providing ranked and significantly relevant lines of the documents. Our method outperforms other existing summarization tools in the sense that it optimizes summarization by first categorizing the documents on the basis of TF-IDF technology and then avoids redundant information by properly ranking the sentences using existing score.","PeriodicalId":127060,"journal":{"name":"2012 15th International Conference on Computer and Information Technology (ICCIT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 15th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2012.6509764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The work focuses on the integration of text categorization and text summarization tasks based on some existing algorithms. We primarily employ the method for bioinformatics literatures to categorize them in relevant domains of bioinformatics and then get a summarized overview of each of the documents in the domain. For text categorization we have chosen three different and core domains of bioinformatics: Protein-Protein Interaction, Disease-Drug Relevance and Pathway-Process Involvement. The method uses TF-IDF based technology for the categorization task and then after categorization it summarizes the key contents of each document using some existing features. The system plays important role in automatically reducing review spaces for the researchers as they do not need to manually select their relevant texts. It also saves time by providing ranked and significantly relevant lines of the documents. Our method outperforms other existing summarization tools in the sense that it optimizes summarization by first categorizing the documents on the basis of TF-IDF technology and then avoids redundant information by properly ranking the sentences using existing score.