{"title":"ThEconSum: an Economics-domained Dataset for Thai Text Summarization and Baseline Models","authors":"Sawittree Jumpathong, Akkharawoot Takhom, P. Boonkwan, Vipas Sutantayawalee, Peerachet Porkaew, Sitthaa Phaholphinyo, Charun Phrombut, T. Supnithi, Khemarath Choke-Mangmi, Saran Yamasathien, Nattachai Tretasayuth, Kasidis Kanwatchara, Atiwat Aiemleuk","doi":"10.1109/iSAI-NLP56921.2022.9960271","DOIUrl":null,"url":null,"abstract":"Language resources as datasets are an essential component in developing an effective automatic text summarization (ATS) system. Some public datasets are relatively uncommon when compared with popular languages, due to the complexity of language preprocessing resulting in a labor-intensive annotation by linguists. ATS techniques are to condense the size of text into a shorter output and reduce the time for finding the information from the huge textual data. This paper presents the Thai ATS construction with Economics-domain data, called ThEconSum, which manifests some linguistic challenges for Thai summarization. Existing public public datasets were employed for developing the ATS system in Thai economic news articles.","PeriodicalId":399019,"journal":{"name":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP56921.2022.9960271","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Language resources as datasets are an essential component in developing an effective automatic text summarization (ATS) system. Some public datasets are relatively uncommon when compared with popular languages, due to the complexity of language preprocessing resulting in a labor-intensive annotation by linguists. ATS techniques are to condense the size of text into a shorter output and reduce the time for finding the information from the huge textual data. This paper presents the Thai ATS construction with Economics-domain data, called ThEconSum, which manifests some linguistic challenges for Thai summarization. Existing public public datasets were employed for developing the ATS system in Thai economic news articles.