Jonas Fuchtmann, Thomas Riedel, Maximilian Berlet, Alissa Jell, Luca Wegener, Lars Wagner, Simone Graf, Dirk Wilhelm, Daniel Ostler-Mildner
{"title":"基于音频的手术室事件检测","authors":"Jonas Fuchtmann, Thomas Riedel, Maximilian Berlet, Alissa Jell, Luca Wegener, Lars Wagner, Simone Graf, Dirk Wilhelm, Daniel Ostler-Mildner","doi":"10.1007/s11548-024-03211-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Even though workflow analysis in the operating room has come a long way, current systems are still limited to research. In the quest for a robust, universal setup, hardly any attention has been given to the dimension of audio despite its numerous advantages, such as low costs, location, and sight independence, or little required processing power.</p><p><strong>Methodology: </strong>We present an approach for audio-based event detection that solely relies on two microphones capturing the sound in the operating room. Therefore, a new data set was created with over 63 h of audio recorded and annotated at the University Hospital rechts der Isar. Sound files were labeled, preprocessed, augmented, and subsequently converted to log-mel-spectrograms that served as a visual input for an event classification using pretrained convolutional neural networks.</p><p><strong>Results: </strong>Comparing multiple architectures, we were able to show that even lightweight models, such as MobileNet, can already provide promising results. Data augmentation additionally improved the classification of 11 defined classes, including inter alia different types of coagulation, operating table movements as well as an idle class. With the newly created audio data set, an overall accuracy of 90%, a precision of 91% and a F1-score of 91% were achieved, demonstrating the feasibility of an audio-based event recognition in the operating room.</p><p><strong>Conclusion: </strong>With this first proof of concept, we demonstrated that audio events can serve as a meaningful source of information that goes beyond spoken language and can easily be integrated into future workflow recognition pipelines using computational inexpensive architectures.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"2381-2387"},"PeriodicalIF":2.3000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11607025/pdf/","citationCount":"0","resultStr":"{\"title\":\"Audio-based event detection in the operating room.\",\"authors\":\"Jonas Fuchtmann, Thomas Riedel, Maximilian Berlet, Alissa Jell, Luca Wegener, Lars Wagner, Simone Graf, Dirk Wilhelm, Daniel Ostler-Mildner\",\"doi\":\"10.1007/s11548-024-03211-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Even though workflow analysis in the operating room has come a long way, current systems are still limited to research. In the quest for a robust, universal setup, hardly any attention has been given to the dimension of audio despite its numerous advantages, such as low costs, location, and sight independence, or little required processing power.</p><p><strong>Methodology: </strong>We present an approach for audio-based event detection that solely relies on two microphones capturing the sound in the operating room. Therefore, a new data set was created with over 63 h of audio recorded and annotated at the University Hospital rechts der Isar. Sound files were labeled, preprocessed, augmented, and subsequently converted to log-mel-spectrograms that served as a visual input for an event classification using pretrained convolutional neural networks.</p><p><strong>Results: </strong>Comparing multiple architectures, we were able to show that even lightweight models, such as MobileNet, can already provide promising results. Data augmentation additionally improved the classification of 11 defined classes, including inter alia different types of coagulation, operating table movements as well as an idle class. With the newly created audio data set, an overall accuracy of 90%, a precision of 91% and a F1-score of 91% were achieved, demonstrating the feasibility of an audio-based event recognition in the operating room.</p><p><strong>Conclusion: </strong>With this first proof of concept, we demonstrated that audio events can serve as a meaningful source of information that goes beyond spoken language and can easily be integrated into future workflow recognition pipelines using computational inexpensive architectures.</p>\",\"PeriodicalId\":51251,\"journal\":{\"name\":\"International Journal of Computer Assisted Radiology and Surgery\",\"volume\":\" \",\"pages\":\"2381-2387\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11607025/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computer Assisted Radiology and Surgery\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s11548-024-03211-1\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-024-03211-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/11 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
Audio-based event detection in the operating room.
Purpose: Even though workflow analysis in the operating room has come a long way, current systems are still limited to research. In the quest for a robust, universal setup, hardly any attention has been given to the dimension of audio despite its numerous advantages, such as low costs, location, and sight independence, or little required processing power.
Methodology: We present an approach for audio-based event detection that solely relies on two microphones capturing the sound in the operating room. Therefore, a new data set was created with over 63 h of audio recorded and annotated at the University Hospital rechts der Isar. Sound files were labeled, preprocessed, augmented, and subsequently converted to log-mel-spectrograms that served as a visual input for an event classification using pretrained convolutional neural networks.
Results: Comparing multiple architectures, we were able to show that even lightweight models, such as MobileNet, can already provide promising results. Data augmentation additionally improved the classification of 11 defined classes, including inter alia different types of coagulation, operating table movements as well as an idle class. With the newly created audio data set, an overall accuracy of 90%, a precision of 91% and a F1-score of 91% were achieved, demonstrating the feasibility of an audio-based event recognition in the operating room.
Conclusion: With this first proof of concept, we demonstrated that audio events can serve as a meaningful source of information that goes beyond spoken language and can easily be integrated into future workflow recognition pipelines using computational inexpensive architectures.
期刊介绍:
The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.