Nikit Srivastava, Denis Kuchelev, Tatiana Moteu, Kshitij Shetty, Michael Roeder, Diego Moussallem, Hamada Zahera, Axel-Cyrille Ngonga Ngomo
{"title":"LOLA -- An Open-Source Massively Multilingual Large Language Model","authors":"Nikit Srivastava, Denis Kuchelev, Tatiana Moteu, Kshitij Shetty, Michael Roeder, Diego Moussallem, Hamada Zahera, Axel-Cyrille Ngonga Ngomo","doi":"arxiv-2409.11272","DOIUrl":null,"url":null,"abstract":"This paper presents LOLA, a massively multilingual large language model\ntrained on more than 160 languages using a sparse Mixture-of-Experts\nTransformer architecture. Our architectural and implementation choices address\nthe challenge of harnessing linguistic diversity while maintaining efficiency\nand avoiding the common pitfalls of multilinguality. Our analysis of the\nevaluation results shows competitive performance in natural language generation\nand understanding tasks. Additionally, we demonstrate how the learned\nexpert-routing mechanism exploits implicit phylogenetic linguistic patterns to\npotentially alleviate the curse of multilinguality. We provide an in-depth look\nat the training process, an analysis of the datasets, and a balanced\nexploration of the model's strengths and limitations. As an open-source model,\nLOLA promotes reproducibility and serves as a robust foundation for future\nresearch. Our findings enable the development of compute-efficient multilingual\nmodels with strong, scalable performance across languages.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11272","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents LOLA, a massively multilingual large language model
trained on more than 160 languages using a sparse Mixture-of-Experts
Transformer architecture. Our architectural and implementation choices address
the challenge of harnessing linguistic diversity while maintaining efficiency
and avoiding the common pitfalls of multilinguality. Our analysis of the
evaluation results shows competitive performance in natural language generation
and understanding tasks. Additionally, we demonstrate how the learned
expert-routing mechanism exploits implicit phylogenetic linguistic patterns to
potentially alleviate the curse of multilinguality. We provide an in-depth look
at the training process, an analysis of the datasets, and a balanced
exploration of the model's strengths and limitations. As an open-source model,
LOLA promotes reproducibility and serves as a robust foundation for future
research. Our findings enable the development of compute-efficient multilingual
models with strong, scalable performance across languages.