{"title":"经济观察者调查为日本金融领域提供数据集和任务","authors":"Masahiro Suzuki, Hiroki Sakaji","doi":"arxiv-2407.14727","DOIUrl":null,"url":null,"abstract":"Many natural language processing (NLP) tasks in English or general domains\nare widely available and are often used to evaluate pre-trained language\nmodels. In contrast, there are fewer tasks available for languages other than\nEnglish and for the financial domain. In particular, tasks in Japanese and the\nfinancial domain are limited. We construct two large datasets using materials\npublished by a Japanese central government agency. The datasets provide three\nJapanese financial NLP tasks, which include a 3-class and 12-class\nclassification for categorizing sentences, as well as a 5-class classification\ntask for sentiment analysis. Our datasets are designed to be comprehensive and\nup-to-date, leveraging an automatic update framework that ensures the latest\ntask datasets are publicly available anytime.","PeriodicalId":501309,"journal":{"name":"arXiv - CS - Computational Engineering, Finance, and Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Economy Watchers Survey provides Datasets and Tasks for Japanese Financial Domain\",\"authors\":\"Masahiro Suzuki, Hiroki Sakaji\",\"doi\":\"arxiv-2407.14727\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many natural language processing (NLP) tasks in English or general domains\\nare widely available and are often used to evaluate pre-trained language\\nmodels. In contrast, there are fewer tasks available for languages other than\\nEnglish and for the financial domain. In particular, tasks in Japanese and the\\nfinancial domain are limited. We construct two large datasets using materials\\npublished by a Japanese central government agency. The datasets provide three\\nJapanese financial NLP tasks, which include a 3-class and 12-class\\nclassification for categorizing sentences, as well as a 5-class classification\\ntask for sentiment analysis. Our datasets are designed to be comprehensive and\\nup-to-date, leveraging an automatic update framework that ensures the latest\\ntask datasets are publicly available anytime.\",\"PeriodicalId\":501309,\"journal\":{\"name\":\"arXiv - CS - Computational Engineering, Finance, and Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computational Engineering, Finance, and Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.14727\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Engineering, Finance, and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.14727","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Economy Watchers Survey provides Datasets and Tasks for Japanese Financial Domain
Many natural language processing (NLP) tasks in English or general domains
are widely available and are often used to evaluate pre-trained language
models. In contrast, there are fewer tasks available for languages other than
English and for the financial domain. In particular, tasks in Japanese and the
financial domain are limited. We construct two large datasets using materials
published by a Japanese central government agency. The datasets provide three
Japanese financial NLP tasks, which include a 3-class and 12-class
classification for categorizing sentences, as well as a 5-class classification
task for sentiment analysis. Our datasets are designed to be comprehensive and
up-to-date, leveraging an automatic update framework that ensures the latest
task datasets are publicly available anytime.