Zhizhen Zhong, Weiyang Wang, M. Ghobadi, Alexander Sludds, R. Hamerly, Liane Bernstein, D. Englund
{"title":"IOI:网络内光推断","authors":"Zhizhen Zhong, Weiyang Wang, M. Ghobadi, Alexander Sludds, R. Hamerly, Liane Bernstein, D. Englund","doi":"10.1145/3473938.3474508","DOIUrl":null,"url":null,"abstract":"We present In-network Optical Inference (IOI), a system providing low-latency machine learning inference by leveraging programmable switches and optical matrix multiplication. IOI consists of a novel transceiver module designed specifically to perform linear operations such as matrix multiplication in the optical domain. IOI's transceivers are plugged into programmable switches to perform non-linear activation and respond to inference queries. We demonstrate how to process inference queries inside the network, without the need to send the queries to cloud or edge inference servers, thus significantly reducing end-to-end inference latency experienced by users. We believe IOI is the next frontier for exploring real-time machine learning systems and opens up exciting new opportunities for low-latency in-network inference.","PeriodicalId":302760,"journal":{"name":"Proceedings of the ACM SIGCOMM 2021 Workshop on Optical Systems","volume":"216 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"IOI: In-network Optical Inference\",\"authors\":\"Zhizhen Zhong, Weiyang Wang, M. Ghobadi, Alexander Sludds, R. Hamerly, Liane Bernstein, D. Englund\",\"doi\":\"10.1145/3473938.3474508\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present In-network Optical Inference (IOI), a system providing low-latency machine learning inference by leveraging programmable switches and optical matrix multiplication. IOI consists of a novel transceiver module designed specifically to perform linear operations such as matrix multiplication in the optical domain. IOI's transceivers are plugged into programmable switches to perform non-linear activation and respond to inference queries. We demonstrate how to process inference queries inside the network, without the need to send the queries to cloud or edge inference servers, thus significantly reducing end-to-end inference latency experienced by users. We believe IOI is the next frontier for exploring real-time machine learning systems and opens up exciting new opportunities for low-latency in-network inference.\",\"PeriodicalId\":302760,\"journal\":{\"name\":\"Proceedings of the ACM SIGCOMM 2021 Workshop on Optical Systems\",\"volume\":\"216 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM SIGCOMM 2021 Workshop on Optical Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3473938.3474508\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM SIGCOMM 2021 Workshop on Optical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3473938.3474508","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We present In-network Optical Inference (IOI), a system providing low-latency machine learning inference by leveraging programmable switches and optical matrix multiplication. IOI consists of a novel transceiver module designed specifically to perform linear operations such as matrix multiplication in the optical domain. IOI's transceivers are plugged into programmable switches to perform non-linear activation and respond to inference queries. We demonstrate how to process inference queries inside the network, without the need to send the queries to cloud or edge inference servers, thus significantly reducing end-to-end inference latency experienced by users. We believe IOI is the next frontier for exploring real-time machine learning systems and opens up exciting new opportunities for low-latency in-network inference.