Kai Li, Khalid Zaman, Xingfeng Li, Masato Akagi, Masashi Unoki
{"title":"Machine Anomalous Sound Detection Using Spectral-temporal Modulation Representations Derived from Machine-specific Filterbanks","authors":"Kai Li, Khalid Zaman, Xingfeng Li, Masato Akagi, Masashi Unoki","doi":"arxiv-2409.05319","DOIUrl":null,"url":null,"abstract":"Early detection of factory machinery malfunctions is crucial in industrial\napplications. In machine anomalous sound detection (ASD), different machines\nexhibit unique vibration-frequency ranges based on their physical properties.\nMeanwhile, the human auditory system is adept at tracking both temporal and\nspectral dynamics of machine sounds. Consequently, integrating the\ncomputational auditory models of the human auditory system with\nmachine-specific properties can be an effective approach to machine ASD. We\nfirst quantified the frequency importances of four types of machines using the\nFisher ratio (F-ratio). The quantified frequency importances were then used to\ndesign machine-specific non-uniform filterbanks (NUFBs), which extract the log\nnon-uniform spectrum (LNS) feature. The designed NUFBs have a narrower\nbandwidth and higher filter distribution density in frequency regions with\nrelatively high F-ratios. Finally, spectral and temporal modulation\nrepresentations derived from the LNS feature were proposed. These proposed LNS\nfeature and modulation representations are input into an autoencoder\nneural-network-based detector for ASD. The quantification results from the\ntraining set of the Malfunctioning Industrial Machine Investigation and\nInspection dataset with a signal-to-noise (SNR) of 6 dB reveal that the\ndistinguishing information between normal and anomalous sounds of different\nmachines is encoded non-uniformly in the frequency domain. By highlighting\nthese important frequency regions using NUFBs, the LNS feature can\nsignificantly enhance performance using the metric of AUC (area under the\nreceiver operating characteristic curve) under various SNR conditions.\nFurthermore, modulation representations can further improve performance.\nSpecifically, temporal modulation is effective for fans, pumps, and sliders,\nwhile spectral modulation is particularly effective for valves.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Early detection of factory machinery malfunctions is crucial in industrial
applications. In machine anomalous sound detection (ASD), different machines
exhibit unique vibration-frequency ranges based on their physical properties.
Meanwhile, the human auditory system is adept at tracking both temporal and
spectral dynamics of machine sounds. Consequently, integrating the
computational auditory models of the human auditory system with
machine-specific properties can be an effective approach to machine ASD. We
first quantified the frequency importances of four types of machines using the
Fisher ratio (F-ratio). The quantified frequency importances were then used to
design machine-specific non-uniform filterbanks (NUFBs), which extract the log
non-uniform spectrum (LNS) feature. The designed NUFBs have a narrower
bandwidth and higher filter distribution density in frequency regions with
relatively high F-ratios. Finally, spectral and temporal modulation
representations derived from the LNS feature were proposed. These proposed LNS
feature and modulation representations are input into an autoencoder
neural-network-based detector for ASD. The quantification results from the
training set of the Malfunctioning Industrial Machine Investigation and
Inspection dataset with a signal-to-noise (SNR) of 6 dB reveal that the
distinguishing information between normal and anomalous sounds of different
machines is encoded non-uniformly in the frequency domain. By highlighting
these important frequency regions using NUFBs, the LNS feature can
significantly enhance performance using the metric of AUC (area under the
receiver operating characteristic curve) under various SNR conditions.
Furthermore, modulation representations can further improve performance.
Specifically, temporal modulation is effective for fans, pumps, and sliders,
while spectral modulation is particularly effective for valves.