International Journal of Information Management Data Insights最新文献

筛选
英文 中文
Blockchain technology to improve traceability in the coffee supply chain: A systematic literature review 区块链技术提高咖啡供应链的可追溯性:系统的文献综述
International Journal of Information Management Data Insights Pub Date : 2025-07-24 DOI: 10.1016/j.jjimei.2025.100359
Christian Gómez, Benoit Garbinato
{"title":"Blockchain technology to improve traceability in the coffee supply chain: A systematic literature review","authors":"Christian Gómez,&nbsp;Benoit Garbinato","doi":"10.1016/j.jjimei.2025.100359","DOIUrl":"10.1016/j.jjimei.2025.100359","url":null,"abstract":"<div><div>Coffee is consumed worldwide, with its supply chain starting with coffee growers, who benefit least from it. Across its production, distribution, and commercialization processes, there are risks and issues that could damage the safety and authenticity of this product. Therefore, the coffee industry is looking for innovative technologies that allow traceability in the coffee supply chain. In this context, blockchain technology offers a promising solution as it supports traceability via a decentralized system that allows immutable records and transparent access; it also promotes collaborative work and removes intermediaries by generating trust between participants. This systematic literature review describes the state-of-the-art in research and development about the use of blockchain technology to improve traceability in the coffee supply chain. We also outline the open challenges that remain to be addressed in this field. We use the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) methodology to achieve this goal. Our findings suggest that the developments are mainly conceptual designs and prototypes, focusing on tracing products and verifying their authenticity using the Ethereum or Hyperledger blockchains. Also, our results show various challenges on the technology side, like efficiency improvements, integration with other technologies, infrastructure, and a lack of standards. There are also challenges at the management level, like the necessity of agreements for traceability processes, data governance, willingness to invest and pay, education, and support to deploy the technology on farms. After overcoming these open challenges, blockchain technology can improve traceability and increase value for stakeholders in the coffee supply chain.</div></div>","PeriodicalId":100699,"journal":{"name":"International Journal of Information Management Data Insights","volume":"5 2","pages":"Article 100359"},"PeriodicalIF":0.0,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144695251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentiment analysis for depression detection: A stacking ensemble-based deep learning approach 抑郁检测的情感分析:基于叠加集成的深度学习方法
International Journal of Information Management Data Insights Pub Date : 2025-07-21 DOI: 10.1016/j.jjimei.2025.100358
Kinza Noor , Mariam Rehman , Maria Anjum , Afzaal Hussain , Rabia Saleem
{"title":"Sentiment analysis for depression detection: A stacking ensemble-based deep learning approach","authors":"Kinza Noor ,&nbsp;Mariam Rehman ,&nbsp;Maria Anjum ,&nbsp;Afzaal Hussain ,&nbsp;Rabia Saleem","doi":"10.1016/j.jjimei.2025.100358","DOIUrl":"10.1016/j.jjimei.2025.100358","url":null,"abstract":"<div><div>Depression is one of the most common mental health issues that seriously affect people's quality of life. The World Health Organization reported that depression overwhelms about 300 million people across the globe. Due to the widespread prevalence of this disorder in society, novel and efficient methods must be developed for effective detection and treatment. In the modern era of social media, individuals often reveal their emotional states by providing daily posts on platforms like X (previously Twitter) and Facebook. The information can be utilized as an essential input for determining whether a person has depression based on their writing content. The disclosure of transformer-based deep learning models provides an opportunity to use pre-trained models to successfully capture complex patterns and nuances in the textual data. This study proposes a novel depression detection method through sentiment analysis by developing a Stacking ENSemble-based Deep learning (SENSDeep) model. The proposed model integrates the capabilities of six pre-trained cutting-edge models, including BERT, RoBERTa, AlBERT, DistilBERT, XLNet, and BART, through stacking ensemble to enhance the predicted performance of the proposed model. The SENSDeep model is evaluated by precision, recall, F1-score, and accuracy. In contrast to other models, the SENSDeep model excels with 96.93 % precision, 97.50 % recall, 97.22 % F1-Score, and 97.21 % accuracy. To our knowledge, SENSDeep is the first deep-learning ensemble model that leverages the capabilities of cutting-edge pre-trained transformer models via stacking, specifically for detecting depression from the textual data.</div></div>","PeriodicalId":100699,"journal":{"name":"International Journal of Information Management Data Insights","volume":"5 2","pages":"Article 100358"},"PeriodicalIF":0.0,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144672107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Securing the metaverse: Machine learning–based perspectives on risk, trust, and governance 保护元环境:基于机器学习的风险、信任和治理视角
International Journal of Information Management Data Insights Pub Date : 2025-07-21 DOI: 10.1016/j.jjimei.2025.100356
Krishnashree Achuthan , Sasangan Ramanathan , Raghu Raman
{"title":"Securing the metaverse: Machine learning–based perspectives on risk, trust, and governance","authors":"Krishnashree Achuthan ,&nbsp;Sasangan Ramanathan ,&nbsp;Raghu Raman","doi":"10.1016/j.jjimei.2025.100356","DOIUrl":"10.1016/j.jjimei.2025.100356","url":null,"abstract":"<div><div>The rapid expansion of the metaverse presents significant cybersecurity and privacy challenges, requiring structured, data-driven analysis. This study applies the ADO-TCM framework and BERTopic modeling to examine drivers of cybersecurity risk, theoretical responses, and interdisciplinary research gaps. Using PRISMA guidelines, 86 peer-reviewed studies were analyzed to identify key antecedents—technological vulnerabilities, user behavior, regulatory fragmentation, economic incentives, and cultural factors—shaping decisions in compliance, deployment, and education. These, in turn, influence outcomes like trust, threat mitigation, and scalability. The review identifies five latent themes: secure identity, privacy, trust, governance, and AI’s role in shaping risk. The study maps diverse theoretical lenses—cognitive, behavioral, strategic, and technological—used to interpret immersive threats and decision-making in metaverse contexts. Contributing a novel, empirically grounded synthesis, this research advances the information management literature and proposes a forward-looking agenda focused on adaptive security, ethical AI, interoperability, regulatory convergence, and intelligent, user-centric architecture for immersive ecosystems.</div></div>","PeriodicalId":100699,"journal":{"name":"International Journal of Information Management Data Insights","volume":"5 2","pages":"Article 100356"},"PeriodicalIF":0.0,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144679947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ABERT: Adapting BERT model for efficient detection of human and AI-generated fake news BERT:采用BERT模型有效检测人工和人工智能生成的假新闻
International Journal of Information Management Data Insights Pub Date : 2025-07-14 DOI: 10.1016/j.jjimei.2025.100353
Jawaher Alghamdi , Yuqing Lin , Suhuai Luo
{"title":"ABERT: Adapting BERT model for efficient detection of human and AI-generated fake news","authors":"Jawaher Alghamdi ,&nbsp;Yuqing Lin ,&nbsp;Suhuai Luo","doi":"10.1016/j.jjimei.2025.100353","DOIUrl":"10.1016/j.jjimei.2025.100353","url":null,"abstract":"<div><div>The proliferation of fake news in digital media poses a significant challenge to the dissemination of accurate information. Transfer learning, particularly with pre-trained language models (PLMs) like BERT, has demonstrated exceptional performance in natural language processing (NLP) tasks. However, the computational expense of fine-tuning the entire model for domain-specific tasks remains a limitation. In this study, we propose a novel approach, Adapt-BERT (ABERT), for the detection of both human and artificial intelligence (AI)-generated fake news. ABERT includes parameter-efficient adapter that enables efficient detection. By freezing the pre-trained BERT network and incorporating lightweight adapter, ABERT achieves comparable performance to fully fine-tuned BERT while reducing the number of trainable parameters by approximately 67.7%. ABERT strikes a balance between performance and computational efficiency, offering a scalable solution to combat the dissemination of fake news in digital media. Experimental evaluations on diverse datasets showcase the effectiveness of the proposed parameter-efficient approach in achieving comparable performance to state-of-the-art (SOTA) methods in the task of fake news detection (FND).</div></div>","PeriodicalId":100699,"journal":{"name":"International Journal of Information Management Data Insights","volume":"5 2","pages":"Article 100353"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144614277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Named Entity Recognition approach of Indonesian fake news using part of speech and BERT model on presidential election 基于词性和BERT模型的印尼总统选举假新闻命名实体识别方法
International Journal of Information Management Data Insights Pub Date : 2025-07-14 DOI: 10.1016/j.jjimei.2025.100354
Puji Winar Cahyo , Ulfi Saidata Aesyi , Widodo Agus Setianto , Tatang Sulaiman
{"title":"A Novel Named Entity Recognition approach of Indonesian fake news using part of speech and BERT model on presidential election","authors":"Puji Winar Cahyo ,&nbsp;Ulfi Saidata Aesyi ,&nbsp;Widodo Agus Setianto ,&nbsp;Tatang Sulaiman","doi":"10.1016/j.jjimei.2025.100354","DOIUrl":"10.1016/j.jjimei.2025.100354","url":null,"abstract":"<div><div>Fake news often spreads rapidly and can mislead readers, which makes it important to approach such information with caution. In text-based information, content extraction can be used to determine the meaning and intent of the message. Therefore, this research aims to develop a novel approach for entity detection in Indonesian-language fake news texts by applying BiLSTM-CRF, BiGRU, and BERT models. The novelty of this study lies in the integration of Part-of-Speech (PoS) tagging before processing words for entity detection. Words tagged as Noun (NN) and Proper Noun (NNP) are transformed into entity labels such as ORG for organizations, PER for people, and LOC for locations. Meanwhile, words labeled as Verb (VB) are converted into the ACT entity to represent actions. Evaluations were conducted by integrating PoS tagging with entity detection using the BiLSTM-CRF model, which achieved an F1-Score of 81.26%. The BiGRU-based model achieved an F1-Score of 79.46%, while the BERT-based model achieved the highest F1-Score of 87.38%. These results demonstrate that the BERT model, when combined with PoS tagging, provides the best performance and can effectively be used to detect entities in fake news. The entity detection process was further applied to identify fake news during the 2024 Indonesian presidential and vice-presidential election period. By counting the number of mentions of each candidate and their running mate labeled as PER entities, it has result the Prabowo Subianto–Gibran Rakabuming Raka pair appeared in 49 fake news articles. This was followed by the Ganjar Pranowo–Mahfud MD pair with 14 fake news articles, and the Anies Baswedan–Muhaimin Iskandar pair with 13 articles. All identified data have been filtered to retain only unique entries.</div></div>","PeriodicalId":100699,"journal":{"name":"International Journal of Information Management Data Insights","volume":"5 2","pages":"Article 100354"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144631524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and integration of human-AI interactions in service applications: Conceptual framework and review 服务应用中人机交互的开发与集成:概念框架与回顾
International Journal of Information Management Data Insights Pub Date : 2025-07-14 DOI: 10.1016/j.jjimei.2025.100357
Nick Tugarin, Christian van Husen
{"title":"Development and integration of human-AI interactions in service applications: Conceptual framework and review","authors":"Nick Tugarin,&nbsp;Christian van Husen","doi":"10.1016/j.jjimei.2025.100357","DOIUrl":"10.1016/j.jjimei.2025.100357","url":null,"abstract":"<div><div>The rapid advancement of AI technologies has created opportunities and challenges across industries, particularly in service sectors where human interaction is crucial. This study systematically reviews the literature on human-AI interaction in service applications, aiming to understand how these interactions can be effectively designed, economically viable, and human-centered. A systematic literature review covering publications from 2015 to 2024 was conducted, following a structured search protocol that resulted in 90 selected articles. The review identifies key dimensions and interconnected elements of human-AI interaction, which are synthesized into a conceptual framework. This framework outlines the fundamental relationships between these dimensions and provides practical implications for the service industry and directions for future research. Furthermore, real-world application examples illustrate how these findings can be translated into practice. They demonstrate adaptability across different service domains and their potential to inspire innovative, scalable, and human-centric AI solutions.</div></div>","PeriodicalId":100699,"journal":{"name":"International Journal of Information Management Data Insights","volume":"5 2","pages":"Article 100357"},"PeriodicalIF":0.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144614278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multimodal framework for enhancing E-commerce information management using vision transformers and large language models 使用视觉转换器和大型语言模型增强电子商务信息管理的多模态框架
International Journal of Information Management Data Insights Pub Date : 2025-07-09 DOI: 10.1016/j.jjimei.2025.100355
Anitha Balachandran , Mohammad Masum
{"title":"A multimodal framework for enhancing E-commerce information management using vision transformers and large language models","authors":"Anitha Balachandran ,&nbsp;Mohammad Masum","doi":"10.1016/j.jjimei.2025.100355","DOIUrl":"10.1016/j.jjimei.2025.100355","url":null,"abstract":"<div><div>In the rapidly advancing field of visual search technology, traditional methods that rely only on visual features often struggle with accuracy and relevance. This challenge is particularly evident in e-commerce, where precise product recommendations are critical, and is further complicated by keyword stuffing in product descriptions. To address these limitations, this study introduces BiLens, a multimodal recommendation framework that integrates both visual and textual information. BiLens leverages large language models (LLMs) to generate descriptive captions from image queries, which are transformed into word embeddings, and extracts visual features using Vision Transformers (ViT). The visual and textual representations are integrated using an early fusion strategy and compared using cosine similarity, enabling deeper contextual understanding and enhancing the accuracy and relevance of product recommendations in capturing customer intent. A comprehensive evaluation was conducted using Amazon product data across five categories, testing various image captioning models and embedding methods—including BLIP-2, ViT-GPT2, BLIP-Image-Captioning-Large, Florence-2-large, GIT (microsoft/git-base-coco), Word2Vec, GloVe, BERT, and ELMo. The combination of Florence-2-large and BERT emerged as the most effective, achieving a <span><math><mrow><mi>p</mi><mi>r</mi><mi>e</mi><mi>c</mi><mi>i</mi><mi>s</mi><mi>i</mi><mi>o</mi><mi>n</mi></mrow></math></span> of <span><math><mrow><mn>0.81</mn><mspace></mspace><mo>±</mo><mspace></mspace><mn>0.14</mn></mrow></math></span> and <span><math><mrow><mi>F</mi><mn>1</mn></mrow></math></span> score of <span><math><mrow><mn>0.49</mn><mspace></mspace><mo>±</mo><mspace></mspace><mn>0.16</mn></mrow></math></span>. This setup was further validated on the Myntra dataset, showing generalizability with <span><math><mrow><mi>p</mi><mi>r</mi><mi>e</mi><mi>c</mi><mi>i</mi><mi>s</mi><mi>i</mi><mi>o</mi><mi>n</mi></mrow></math></span> of <span><math><mrow><mn>0.59</mn><mspace></mspace><mo>±</mo><mspace></mspace><mn>0.27</mn></mrow></math></span>, <span><math><mrow><mi>r</mi><mi>e</mi><mi>c</mi><mi>a</mi><mi>l</mi><mi>l</mi></mrow></math></span> of <span><math><mrow><mn>0.47</mn><mspace></mspace><mo>±</mo><mspace></mspace><mn>0.25</mn></mrow></math></span>, and <span><math><mrow><mi>F</mi><mn>1</mn></mrow></math></span> score of <span><math><mrow><mn>0.52</mn><mspace></mspace><mo>±</mo><mspace></mspace><mn>0.24</mn></mrow></math></span>. Comparisons with image-only and text-only baselines confirmed the superiority of the fusion-based approach, with statistically significant improvements in F1 scores, underscoring BiLens’s ability to deliver more accurate, context-aware product recommendations.</div></div>","PeriodicalId":100699,"journal":{"name":"International Journal of Information Management Data Insights","volume":"5 2","pages":"Article 100355"},"PeriodicalIF":0.0,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144580826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic review and future agenda on continuance intentions in mobile apps 移动应用程序中持续性意图的系统回顾和未来议程
International Journal of Information Management Data Insights Pub Date : 2025-07-03 DOI: 10.1016/j.jjimei.2025.100352
Suryati Veronika , Michael S.W. Lee , Bodo Lang , Pragea Putra
{"title":"A systematic review and future agenda on continuance intentions in mobile apps","authors":"Suryati Veronika ,&nbsp;Michael S.W. Lee ,&nbsp;Bodo Lang ,&nbsp;Pragea Putra","doi":"10.1016/j.jjimei.2025.100352","DOIUrl":"10.1016/j.jjimei.2025.100352","url":null,"abstract":"<div><div>Technology changes at ever increasing speeds. Therefore, it is crucial for practitioners and academics to understand why users’ intend to continue or discontinue their usage. This paper presents a current and comprehensive systematic literature review on continuance intentions for mobile applications. The review analyzes 119 studies from the Scopus database (January 2019–December 2023) using the PRISMA, SPAR, and TCCM frameworks. It identifies key theoretical models, determinants of mobile app continuance intention, research methods, existing gaps, and future research directions. Findings reveal that several well-recognised theoretical models are frequently applied in the literature on continuance intention. Consequently, the variables derived from these models are among the most commonly measured by researchers. Additionally, the majority of studies in this area employ quantitative methods, with structural equation modelling being most widely used. This review categorises the literature based on mobile application classifications and six distinct sets of factors influencing continuance intention: psychological, technical, social, behavioural, contextual, and barriers. Furthermore, it explores the outcomes associated with continuance intention. The paper identifies two primary areas for future research: the development of a conceptual framework and research design. It also highlights research opportunities related to emerging technologies and the gap between intentions and actual behaviours.</div></div>","PeriodicalId":100699,"journal":{"name":"International Journal of Information Management Data Insights","volume":"5 2","pages":"Article 100352"},"PeriodicalIF":0.0,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144534885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-variate LSTM with attention mechanism for the Indian stock market 具有关注机制的印度股市多变量LSTM
International Journal of Information Management Data Insights Pub Date : 2025-06-21 DOI: 10.1016/j.jjimei.2025.100350
Ashy Sebastian, Dr. Veerta Tantia
{"title":"Multi-variate LSTM with attention mechanism for the Indian stock market","authors":"Ashy Sebastian,&nbsp;Dr. Veerta Tantia","doi":"10.1016/j.jjimei.2025.100350","DOIUrl":"10.1016/j.jjimei.2025.100350","url":null,"abstract":"<div><div>The advent of attention mechanism has surpassed numerous benchmarks and enabled widespread progress in the realm of natural language processing (NLP). Nevertheless, they have not been adequately leveraged in a time-series context. Accordingly, this paper aims to address this issue by proposing a hybrid, deep-learning model that integrates attention mechanisms and multi-variate long short-term memory (LSTM) for financial forecasting in the Indian stock market. Our model yields superior results as compared to baseline and state-of-the-art models evaluated using MAE and RMSE. Moreover, we employed a modern evaluation criterion based on the methodology advocated by Diebold–Mariano, known as the Diebold–Mariano test (DM test), as a new criterion for evaluation based on statistical hypothesis tests. DM test has been applied in this study to distinguish the significant differences in forecasting accuracy between LSTM with attention and other models. From the results and according to DM-test it is observed that the differences between the forecasting performances of models are significant and that attention mechanism could enhance the accuracy in predicting stock prices by allowing the model to prioritize and concentrate on the most important features and patterns in the data while avoiding overfitting and noise.</div></div>","PeriodicalId":100699,"journal":{"name":"International Journal of Information Management Data Insights","volume":"5 2","pages":"Article 100350"},"PeriodicalIF":0.0,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144330579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-based model for moroccan Arabizi-to-Arabic transliteration using a semi-automatic annotated dataset 基于转换器的摩洛哥阿拉伯语到阿拉伯语转写模型,使用半自动注释数据集
International Journal of Information Management Data Insights Pub Date : 2025-06-19 DOI: 10.1016/j.jjimei.2025.100351
Soufiane Hajbi , Omayma Amezian , Mouhssine Ziyad , Issame El Kaime , Redouan Korchyine , Younes Chihab
{"title":"Transformer-based model for moroccan Arabizi-to-Arabic transliteration using a semi-automatic annotated dataset","authors":"Soufiane Hajbi ,&nbsp;Omayma Amezian ,&nbsp;Mouhssine Ziyad ,&nbsp;Issame El Kaime ,&nbsp;Redouan Korchyine ,&nbsp;Younes Chihab","doi":"10.1016/j.jjimei.2025.100351","DOIUrl":"10.1016/j.jjimei.2025.100351","url":null,"abstract":"<div><div>Language models have recently achieved state-of-the-art results in tasks such as translation, sentiment analysis, and text classification for high-resource languages. However, dedicated models for low-resource languages remain scarce, largely due to a lack of annotated data and linguistic resources. Most efforts focus on fine-tuning models trained on high-resource languages using limited data, resulting in a substantial performance gap. Moroccan Darija (MD), widely spoken in Morocco, lacks language resources and dedicated models. Additionally, MD texts often employ the Arabizi writing form, which combines Latin characters and numbers with Arabic script, further complicating Natural Language Processing (NLP) tasks. This work presents the first transformer-based model designed specifically for transliterating Moroccan Arabizi to Arabic. The approach leverages a character-level modeling architecture and a semi-automatically generated dataset containing over 33k word pairs, capturing significant linguistic diversity. The model achieves a state-of-the-art word transliteration accuracy (WTA) of 93 % and a character error rate (CER) of 4.73 % on unseen Moroccan Arabizi data, highlighting the potential of transformer models to improve transliteration accuracy for low-resource languages, particularly MD.</div></div>","PeriodicalId":100699,"journal":{"name":"International Journal of Information Management Data Insights","volume":"5 2","pages":"Article 100351"},"PeriodicalIF":0.0,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144313205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信