{"title":"一种不利用信号预测的语音压缩方法。","authors":"Ikuo Matsuo, Kazuo Ueda, Yoshitaka Nakajima","doi":"10.1177/20416695251340236","DOIUrl":null,"url":null,"abstract":"<p><p>Previous speech compression methods for practical purposes had been based on signal prediction, taking the auditory functions into account but overlooking features specific to speech signals. A new method was developed in which amplitude envelopes in four frequency bands corresponding to spectral factors common to different languages were used to modulate infinitely peak-clipped signals, which also had been revealed to contain useful linguistic information. In a pilot experiment, intelligibility reached ~80% with limited information of only 2,400 bits per second (bps), whereas the bit rate of the original signal was 256,000 bps. This algorithm preserves the naturalness of speech and is easy to grasp intuitively.</p>","PeriodicalId":47194,"journal":{"name":"I-Perception","volume":"16 3","pages":"20416695251340236"},"PeriodicalIF":2.4000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12120533/pdf/","citationCount":"0","resultStr":"{\"title\":\"A speech compression method without utilizing signal prediction.\",\"authors\":\"Ikuo Matsuo, Kazuo Ueda, Yoshitaka Nakajima\",\"doi\":\"10.1177/20416695251340236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Previous speech compression methods for practical purposes had been based on signal prediction, taking the auditory functions into account but overlooking features specific to speech signals. A new method was developed in which amplitude envelopes in four frequency bands corresponding to spectral factors common to different languages were used to modulate infinitely peak-clipped signals, which also had been revealed to contain useful linguistic information. In a pilot experiment, intelligibility reached ~80% with limited information of only 2,400 bits per second (bps), whereas the bit rate of the original signal was 256,000 bps. This algorithm preserves the naturalness of speech and is easy to grasp intuitively.</p>\",\"PeriodicalId\":47194,\"journal\":{\"name\":\"I-Perception\",\"volume\":\"16 3\",\"pages\":\"20416695251340236\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12120533/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"I-Perception\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1177/20416695251340236\",\"RegionNum\":4,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"I-Perception","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/20416695251340236","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
A speech compression method without utilizing signal prediction.
Previous speech compression methods for practical purposes had been based on signal prediction, taking the auditory functions into account but overlooking features specific to speech signals. A new method was developed in which amplitude envelopes in four frequency bands corresponding to spectral factors common to different languages were used to modulate infinitely peak-clipped signals, which also had been revealed to contain useful linguistic information. In a pilot experiment, intelligibility reached ~80% with limited information of only 2,400 bits per second (bps), whereas the bit rate of the original signal was 256,000 bps. This algorithm preserves the naturalness of speech and is easy to grasp intuitively.