Purrai: A Deep Neural Network based Approach to Interpret Domestic Cat Language

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2021-12-01 DOI:10.1109/ICMLA52953.2021.00104

Weilin Sun, V. Lu, Aaron Truong, Hermione Bossolina, Yuan Lu

{"title":"Purrai: A Deep Neural Network based Approach to Interpret Domestic Cat Language","authors":"Weilin Sun, V. Lu, Aaron Truong, Hermione Bossolina, Yuan Lu","doi":"10.1109/ICMLA52953.2021.00104","DOIUrl":null,"url":null,"abstract":"Being able to understand and communicate with domestic cats has always been fascinating to humans, although it is considered a difficult task even for phonetics experts. In this paper, we present our approach to this problem: Purrai, a neural-network-based machine learning platform to interpret cat’s language. Our framework consists of two parts. First, we build a comprehensively constructed cat voice dataset that is 3.7x larger than any existing public available dataset [1]. To improve accuracy, we also use several techniques to ensure labeling quality, including rule-based labeling, cross validation, cosine distance, and outlier detection, etc. Second, we design a two-stage neural network structure to interpret what cats express in the context of multiple sounds called sentences. The first stage is a modification of Google’s Vggish architecture [2] [3], which is a Convolutional Neural Network (CNN) architecture that focuses on the classification of nine primary cat sounds. The second stage takes the probability outputs of a sequence of sound classifications from the first stage and determines the emotional meaning of a cat sentence. Our first stage architecture generates a top-l and top-2 accuracy of 74.1% and 92.1%, better than that of the state-of-the-art approach: 64.9% and 83.4% [4]. Our sentence-based AI model achieves an accuracy of 81.1% for emotion prediction.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"11 1","pages":"622-627"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Being able to understand and communicate with domestic cats has always been fascinating to humans, although it is considered a difficult task even for phonetics experts. In this paper, we present our approach to this problem: Purrai, a neural-network-based machine learning platform to interpret cat’s language. Our framework consists of two parts. First, we build a comprehensively constructed cat voice dataset that is 3.7x larger than any existing public available dataset [1]. To improve accuracy, we also use several techniques to ensure labeling quality, including rule-based labeling, cross validation, cosine distance, and outlier detection, etc. Second, we design a two-stage neural network structure to interpret what cats express in the context of multiple sounds called sentences. The first stage is a modification of Google’s Vggish architecture [2] [3], which is a Convolutional Neural Network (CNN) architecture that focuses on the classification of nine primary cat sounds. The second stage takes the probability outputs of a sequence of sound classifications from the first stage and determines the emotional meaning of a cat sentence. Our first stage architecture generates a top-l and top-2 accuracy of 74.1% and 92.1%, better than that of the state-of-the-art approach: 64.9% and 83.4% [4]. Our sentence-based AI model achieves an accuracy of 81.1% for emotion prediction.

查看原文本刊更多论文

Purrai:一种基于深度神经网络的方法来解释家猫的语言

人类一直对能够听懂家猫的声音并与之交流很感兴趣，尽管即使对语音专家来说，这也被认为是一项艰巨的任务。在本文中，我们提出了解决这个问题的方法:Purrai，一个基于神经网络的机器学习平台，用于解释猫的语言。我们的框架由两部分组成。首先，我们构建了一个全面构建的猫声数据集，该数据集比任何现有的公共可用数据集大3.7倍[1]。为了提高准确性，我们还使用了几种技术来确保标注质量，包括基于规则的标注、交叉验证、余弦距离和离群值检测等。其次，我们设计了一个两阶段的神经网络结构来解释猫在多个声音(称为句子)的背景下表达的内容。第一阶段是对Google的Vggish架构的修改[2][3]，这是一个卷积神经网络(CNN)架构，专注于九种主要猫音的分类。第二阶段从第一阶段获得一系列声音分类的概率输出，并确定猫句的情感意义。我们的第一阶段架构产生的top- 1和top-2准确率分别为74.1%和92.1%，优于最先进的方法:64.9%和83.4%[4]。我们基于句子的人工智能模型在情绪预测方面达到了81.1%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量