{"title":"IP-MIML: A multi-instance multi-label learning framework for predicting protein subcellular localization from biological images","authors":"Xinyue Chen , Hang Shi , Shi-bing Guan , Wei Shao","doi":"10.1016/j.displa.2025.103220","DOIUrl":null,"url":null,"abstract":"<div><div>Recent studies indicate that the localization of proteins within a cell is essential for determining their functions and gaining insights into various cellular processes. With advances in microscopic imaging, accurate classification of bioimage-based protein subcellular localization patterns has attracted as much attention as ever. However, most bioimage-based protein subcellular location predictors are designed to allocate the protein image to one location, which overlooks the case that a protein may colocalize in different cellular compartments that deserve special attention. On the other hand, we could observe a protein expressed in multiple biological images derived from different tissues, it is still a challenge to summarize the localization patterns of that protein across all related images. Based on the above considerations, we propose a multi-instance multi-label learning framework to determine the subcellular localization of proteins from biological images (<em>i.e.,</em> IP-MIML). Specifically, we first treat one protein as a bag and all images belonging to it as instances and introduce the self-attention mechanism to learn instance-level representation by considering their correlations. Then, a bag-concept layer is developed to discover the latent relation between the inputs and the output semantic labels. In addition, we also incorporate an optimal transport (OT) based formulation to learn the label distribution and exploit label correlations, simultaneously. Finally, a dynamic threshold method is utilized for adjusting the multi-label prediction results. We evaluated our method on normal and cancer protein bioimages, and the experimental results indicate that the proposed IP-MIML not only can achieve higher accuracy in predicting the cellular compartments of proteins with multiple localizations, but also can detect potential cancer biomarker proteins that have significant localization differences between normal and cancer tissues.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103220"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225002574","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Recent studies indicate that the localization of proteins within a cell is essential for determining their functions and gaining insights into various cellular processes. With advances in microscopic imaging, accurate classification of bioimage-based protein subcellular localization patterns has attracted as much attention as ever. However, most bioimage-based protein subcellular location predictors are designed to allocate the protein image to one location, which overlooks the case that a protein may colocalize in different cellular compartments that deserve special attention. On the other hand, we could observe a protein expressed in multiple biological images derived from different tissues, it is still a challenge to summarize the localization patterns of that protein across all related images. Based on the above considerations, we propose a multi-instance multi-label learning framework to determine the subcellular localization of proteins from biological images (i.e., IP-MIML). Specifically, we first treat one protein as a bag and all images belonging to it as instances and introduce the self-attention mechanism to learn instance-level representation by considering their correlations. Then, a bag-concept layer is developed to discover the latent relation between the inputs and the output semantic labels. In addition, we also incorporate an optimal transport (OT) based formulation to learn the label distribution and exploit label correlations, simultaneously. Finally, a dynamic threshold method is utilized for adjusting the multi-label prediction results. We evaluated our method on normal and cancer protein bioimages, and the experimental results indicate that the proposed IP-MIML not only can achieve higher accuracy in predicting the cellular compartments of proteins with multiple localizations, but also can detect potential cancer biomarker proteins that have significant localization differences between normal and cancer tissues.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.