{"title":"Automatic Summarization and Keyword Extraction from Web Page or Text File","authors":"Xiangdong You","doi":"10.1109/CCET48361.2019.8989315","DOIUrl":null,"url":null,"abstract":"In this paper, we study the automatic summarization and keyword extraction techniques for web page and text file. First, we use the Readability algorithm to extract the text of the web page, and study the PageRank algorithm and TextRank algorithm, and then use the TextRank algorithm to extract keywords, key sentences and abstracts. We also develop the web application that processes web page and text file. The application can input URL, text file, or text paragraph, then application can complete the extraction of main content, abstract, keywords and key sentences.","PeriodicalId":231425,"journal":{"name":"2019 IEEE 2nd International Conference on Computer and Communication Engineering Technology (CCET)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 2nd International Conference on Computer and Communication Engineering Technology (CCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCET48361.2019.8989315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this paper, we study the automatic summarization and keyword extraction techniques for web page and text file. First, we use the Readability algorithm to extract the text of the web page, and study the PageRank algorithm and TextRank algorithm, and then use the TextRank algorithm to extract keywords, key sentences and abstracts. We also develop the web application that processes web page and text file. The application can input URL, text file, or text paragraph, then application can complete the extraction of main content, abstract, keywords and key sentences.