Adekunle Ajibode, Abdul Ali Bangash, Filipe Roseiro Cogo, Bram Adams, Ahmed E. Hassan
{"title":"Towards Semantic Versioning of Open Pre-trained Language Model Releases on Hugging Face","authors":"Adekunle Ajibode, Abdul Ali Bangash, Filipe Roseiro Cogo, Bram Adams, Ahmed E. Hassan","doi":"arxiv-2409.10472","DOIUrl":null,"url":null,"abstract":"The proliferation of open Pre-trained Language Models (PTLMs) on model\nregistry platforms like Hugging Face (HF) presents both opportunities and\nchallenges for companies building products around them. Similar to traditional\nsoftware dependencies, PTLMs continue to evolve after a release. However, the\ncurrent state of release practices of PTLMs on model registry platforms are\nplagued by a variety of inconsistencies, such as ambiguous naming conventions\nand inaccessible model training documentation. Given the knowledge gap on\ncurrent PTLM release practices, our empirical study uses a mixed-methods\napproach to analyze the releases of 52,227 PTLMs on the most well-known model\nregistry, HF. Our results reveal 148 different naming practices for PTLM\nreleases, with 40.87% of changes to model weight files not represented in the\nadopted name-based versioning practice or their documentation. In addition, we\nidentified that the 52,227 PTLMs are derived from only 299 different base\nmodels (the modified original models used to create 52,227 PTLMs), with\nFine-tuning and Quantization being the most prevalent modification methods\napplied to these base models. Significant gaps in release transparency, in\nterms of training dataset specifications and model card availability, still\nexist, highlighting the need for standardized documentation. While we\nidentified a model naming practice explicitly differentiating between major and\nminor PTLM releases, we did not find any significant difference in the types of\nchanges that went into either type of releases, suggesting that major/minor\nversion numbers for PTLMs often are chosen arbitrarily. Our findings provide\nvaluable insights to improve PTLM release practices, nudging the field towards\nmore formal semantic versioning practices.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The proliferation of open Pre-trained Language Models (PTLMs) on model
registry platforms like Hugging Face (HF) presents both opportunities and
challenges for companies building products around them. Similar to traditional
software dependencies, PTLMs continue to evolve after a release. However, the
current state of release practices of PTLMs on model registry platforms are
plagued by a variety of inconsistencies, such as ambiguous naming conventions
and inaccessible model training documentation. Given the knowledge gap on
current PTLM release practices, our empirical study uses a mixed-methods
approach to analyze the releases of 52,227 PTLMs on the most well-known model
registry, HF. Our results reveal 148 different naming practices for PTLM
releases, with 40.87% of changes to model weight files not represented in the
adopted name-based versioning practice or their documentation. In addition, we
identified that the 52,227 PTLMs are derived from only 299 different base
models (the modified original models used to create 52,227 PTLMs), with
Fine-tuning and Quantization being the most prevalent modification methods
applied to these base models. Significant gaps in release transparency, in
terms of training dataset specifications and model card availability, still
exist, highlighting the need for standardized documentation. While we
identified a model naming practice explicitly differentiating between major and
minor PTLM releases, we did not find any significant difference in the types of
changes that went into either type of releases, suggesting that major/minor
version numbers for PTLMs often are chosen arbitrarily. Our findings provide
valuable insights to improve PTLM release practices, nudging the field towards
more formal semantic versioning practices.