{"title":"B-TTDb: A Database of Turkish Tweets for Predicting the Top One Hundred Emojis","authors":"Y. Bi̇ti̇ri̇m","doi":"10.1145/3681783","DOIUrl":null,"url":null,"abstract":"Emoji prediction is an important research task that focuses on finding the most appropriate emoji(s) quickly and effortlessly for a specific text. Now that Turkish is on the list of the top 20 most spoken languages in the world and there are a considerable number of Turkish-speaking social media users, studying emoji prediction in Turkish holds significant value. In this study, a Turkish tweets database, named Bitirim's Turkish Tweets Database (B-TTDb), was constructed for academic and industrial studies based on the prediction of the top 100 emojis. B-TTDb consists of four datasets. The first dataset includes raw tweets, the second dataset is the organized version of the first dataset, the third dataset is the pre-processed version of the second dataset, and the last one is the organized version of the third dataset. The last one is the final version and it is named Bitirim's Dataset (B-D). It includes a total of 158,201 unique tweets belonging to the top 100 emoji classes. For database validation, experiments were conducted on B-D with popular machine learning algorithms for the top 10, 20, 50, and 100 emojis. This study could be considered as the first study that contributes to the literature by the first validated large database of Turkish tweets that includes such a large number of emojis. In addition, B-TTDb could be a basis as well as motivation for various further studies.","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on the Web","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3681783","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Emoji prediction is an important research task that focuses on finding the most appropriate emoji(s) quickly and effortlessly for a specific text. Now that Turkish is on the list of the top 20 most spoken languages in the world and there are a considerable number of Turkish-speaking social media users, studying emoji prediction in Turkish holds significant value. In this study, a Turkish tweets database, named Bitirim's Turkish Tweets Database (B-TTDb), was constructed for academic and industrial studies based on the prediction of the top 100 emojis. B-TTDb consists of four datasets. The first dataset includes raw tweets, the second dataset is the organized version of the first dataset, the third dataset is the pre-processed version of the second dataset, and the last one is the organized version of the third dataset. The last one is the final version and it is named Bitirim's Dataset (B-D). It includes a total of 158,201 unique tweets belonging to the top 100 emoji classes. For database validation, experiments were conducted on B-D with popular machine learning algorithms for the top 10, 20, 50, and 100 emojis. This study could be considered as the first study that contributes to the literature by the first validated large database of Turkish tweets that includes such a large number of emojis. In addition, B-TTDb could be a basis as well as motivation for various further studies.
期刊介绍:
Transactions on the Web (TWEB) is a journal publishing refereed articles reporting the results of research on Web content, applications, use, and related enabling technologies. Topics in the scope of TWEB include but are not limited to the following: Browsers and Web Interfaces; Electronic Commerce; Electronic Publishing; Hypertext and Hypermedia; Semantic Web; Web Engineering; Web Services; and Service-Oriented Computing XML.
In addition, papers addressing the intersection of the following broader technologies with the Web are also in scope: Accessibility; Business Services Education; Knowledge Management and Representation; Mobility and pervasive computing; Performance and scalability; Recommender systems; Searching, Indexing, Classification, Retrieval and Querying, Data Mining and Analysis; Security and Privacy; and User Interfaces.
Papers discussing specific Web technologies, applications, content generation and management and use are within scope. Also, papers describing novel applications of the web as well as papers on the underlying technologies are welcome.