目标! !体育视频中的事件检测

Computer Vision Applications in Sports Pub Date : 2017-01-29 DOI:10.2352/ISSN.2470-1173.2017.16.CVAS-344

Grigorios Tsagkatakis, M. Jaber, P. Tsakalides

{"title":"目标! !体育视频中的事件检测","authors":"Grigorios Tsagkatakis, M. Jaber, P. Tsakalides","doi":"10.2352/ISSN.2470-1173.2017.16.CVAS-344","DOIUrl":null,"url":null,"abstract":"Understanding complex events from unstructured video, like scoring a goal in a football game, is an extremely challenging task due to the dynamics, complexity and variation of video sequences. In this work, we attack this problem exploiting the capabilities of the recently developed framework of deep learning. We consider independently encoding spatial and temporal information via convolutional neural networks and fusion of features via regularized Autoencoders. To demonstrate the capacities of the proposed scheme, a new dataset is compiled, composed of goal and no-goal sequences. Experimental results demonstrate that extremely high classification accuracy can be achieved, from a dramatically limited number of examples, by leveraging pretrained models with fine-tuned fusion of spatio-temporal features. Introduction Analyzing unstructured video streams is a challenging task for multiple reasons [10]. A first challenge is associated with the complexity of real world dynamics that are manifested in such video streams, including changes in viewpoint, illumination and quality. In addition, while annotated image datasets are prevalent, a smaller number of labeled datasets are available for video analytics. Last, the analysis of massive, high dimensional video streams is extremely demanding, requiring significantly higher computational resources compared to still imagery [11]. In this work, we focus on the analysis of a particular type of videos showing multi-person sport activities and more specifically football (soccer) games. Sport videos in general are acquired from different vantage points and the decision of selecting a single stream for broadcasting is taken by the director. As a result, the broadcasted video stream is characterized by varying acquisition conditions like zooming-in near the goalpost during a goal and zooming-out to cover the full field. In this complex situation, we consider the high level objective of detecting specific and semantically meaningful events like an opponent team scoring a goal. Succeeding in this task will allow the automatic transcription of games, video summarization and automatic statistical analysis. Despite the many challenges associated with video analytics, the human brain is able to extract meaning and provide contextual information in a limited amount of time and from a limited set of training examples. From a computational perspective, the process of event detection in a video sequence amounts to two foundamental steps, namely (i) spatio-temporal feature extraction and (ii) example classification. Typically, feature extraction approaches rely on highly engineered handcrafted features like the SIFT, which however are not able to generalize to more challenging cases. To achieve this objective, we consider the state-of-theart framework of deep learning [18] and more specifically the case of Convolutional Neural Networks (CNNs) [16], which has taken by storm almost all problems related to computer vision, ranging from image classification [15, 16], to object detection [17], and multi-modal learning [6]. At the same time, the concept of Autoencoders, a type of neural network which tries to appropriate the input at the output via regularization with various constrains, is also attracting attention due to its learning capacity in cases of unsupervised learning [21]. While significant effort has been applied in designing and evaluating deep learning architectures for image analysis, leading to highly optimized architectures, the problem of video analysis is at the forefront of research, where multiple avenues are explored. The urgent need for video analytics is driven by both the wealth of unstructured videos available online, as well as the complexities associated with adding the temporal dimension. In this work, we consider the problem of goal detection in broadcasted low quality football videos. The problem is formulated as a binary classification of short video sequences which are encoded though a spatiotemporal deep feature learning network. The key novelties of this work are to: • Develop a novel dataset for event detection in sports video and more specifically, for goal detection is football games; • Investigate deep learning architectures, such as CNN and Autoencoders, for achieving efficient event detection; • Demonstrate that learning, and thus accurate event detection, can be achieved by leveraging information from a few labeled examples, exploiting pre-trained models. State-of-the-art For video analytics, two major lines of research have been proposed, namely frame-based and motion-based, where in the former case, features are extracted from individual frames, while in the latter case, additional information regarding the inter-frame motion, like optical flow [3], is also introduced. In terms of single frame spatial feature extraction, CNNs have had a profound impact in image recognition, scene classification, and object detection, among others [16]. To account for the dynamic nature of video, a recently proposed concept involves extenting the two-dimensional convolution to three dimensions, leading to 3D CNNs, where temporal information is included as a distinct input [12, 13]. An alternative approach for encoding the temporal informarion is through the use of Long-Short Term Memory (LSTM) networks [1, 13], while another concept involves the generation of dynamic images through the collapse of multiple video frames and the use of 2D deep feature exaction on such representations [7]. In [2], temporal information is encoded through average pooling of frame-based descriptors and Figure 1: Block diagram of the proposed Goal detection framework. A 20-frame moving window initially selects part of the sequence of interest, and the selected frames undergo motion estimation. Raw pixel values and optical flows are first independently encoded using the pre-trained deep CNN for extracting spatial and temporal features. The extracted features can either be introduced into a higher level network for fusion which is fine-tuned for the classification problem, or concatenated and used as extended input features for the classification. the subsequent encoding in Fisher and VLAD vectors. In [4], the authors investigated deep video representation for action recognition, where temporal information was introduced in the frame-diff layer of the deep network architecture, through different temporal pooling strategies applied in patch-level, frame-level, and temporal window-level. One of the most successful frameworks for encoding both spatial and temploral information is the two-stream CNN [8]. Two-stream networks consider two sources of information, raw frames and optical flow, which are independently encoded by a CNN and fused into an SVM classifier. Further studies on this framework demonstrated that using pre-trained models can have a dramatic impact on training time, for the spatial and temporal features [22], while convolutional two-stream network fusion was recently applied in video action recognition [23]. The combination of 3D convolutions and the two-stream approach was also recently reported for video classification, achieving state-of-theart performance at significantly lower processing times [24]. The performance demonstrated by the two-streams approach for video analysis led to the choice of this paradigm in this work. Event Detection Network The proposed temporal event detection network is modeled as a two-stream deep network, coupled with a sparsity regularized Autoencoder for fusion of spatial and temporal data. We investigate Convolutional and Autoencoder Neural Networks for the extraction of spatial, temporal and fused spatio-temporal features and the subsequent application of kernel based Support Vector Machines for the binary detection of goal events. A high level overview of the processing pipeline is shown in Figure 1. While in fully-connected networks each hidden activation is computed by multiplying the entire input by the corresponding weights in that layer, in CNNs each hidden activation is computed by multiplying a small local input against the weights. The typical structure of a CNN consists of a number of convolution and pooling/subsampling layers, optionally followed by fully connected layers. At each convolution layer, the outputs of the previous layer are convolved with learnable kernels and passed through the activation function to form this layer’s output feature map. Let n× n be a square region extracted from a training input image X ∈ RN×M , and w be a filter of kernel size (m×m). The output of the convolutional layer h ∈ R(n−m+1)×(n−m+1) is given by: hi j = σ (m−1 ∑ a=0 m−1 ∑ b=0 wabx (i+a)( j+b)+b ` i j ) , (1) where b is the additive bias term, and σ(·) stands for the neuron’s activation unit. Specifically, the activation function σ , is a standard way to model a neuron’s output, as a function of its input. Convenient choices for the activation function include the logistic sigmoid, the hyperbolic tangent, and the Rectified Linear Unit. Taking into consideration the training time required for the gradient descent process, the saturating (i.e tanh, and logistic sigmoid) non-linearities are much slower than the non-saturating ReLU function. The output of the convolutional layer is directly utilized as input to a sub-sampling layer that produces downsampled versions of the input maps. There are several types of pooling, two common types of which are max-pooling and average-pooling, which partition the input image into a set of non-overlapping or overlapping patches and output the maximum or average value for each such sub-region. For the 2D feature extraction networks, we consider the VGG-16 CNN architecture, which is composed of 13 convolutional layers, with five of them being followed by a max-pooling layer, leading to three fully connected layers [9]. Unlike image detection problems, feature extraction in video must address the challenges associat","PeriodicalId":261646,"journal":{"name":"Computer Vision Applications in Sports","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Goal!! Event detection in sports video\",\"authors\":\"Grigorios Tsagkatakis, M. Jaber, P. Tsakalides\",\"doi\":\"10.2352/ISSN.2470-1173.2017.16.CVAS-344\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding complex events from unstructured video, like scoring a goal in a football game, is an extremely challenging task due to the dynamics, complexity and variation of video sequences. In this work, we attack this problem exploiting the capabilities of the recently developed framework of deep learning. We consider independently encoding spatial and temporal information via convolutional neural networks and fusion of features via regularized Autoencoders. To demonstrate the capacities of the proposed scheme, a new dataset is compiled, composed of goal and no-goal sequences. Experimental results demonstrate that extremely high classification accuracy can be achieved, from a dramatically limited number of examples, by leveraging pretrained models with fine-tuned fusion of spatio-temporal features. Introduction Analyzing unstructured video streams is a challenging task for multiple reasons [10]. A first challenge is associated with the complexity of real world dynamics that are manifested in such video streams, including changes in viewpoint, illumination and quality. In addition, while annotated image datasets are prevalent, a smaller number of labeled datasets are available for video analytics. Last, the analysis of massive, high dimensional video streams is extremely demanding, requiring significantly higher computational resources compared to still imagery [11]. In this work, we focus on the analysis of a particular type of videos showing multi-person sport activities and more specifically football (soccer) games. Sport videos in general are acquired from different vantage points and the decision of selecting a single stream for broadcasting is taken by the director. As a result, the broadcasted video stream is characterized by varying acquisition conditions like zooming-in near the goalpost during a goal and zooming-out to cover the full field. In this complex situation, we consider the high level objective of detecting specific and semantically meaningful events like an opponent team scoring a goal. Succeeding in this task will allow the automatic transcription of games, video summarization and automatic statistical analysis. Despite the many challenges associated with video analytics, the human brain is able to extract meaning and provide contextual information in a limited amount of time and from a limited set of training examples. From a computational perspective, the process of event detection in a video sequence amounts to two foundamental steps, namely (i) spatio-temporal feature extraction and (ii) example classification. Typically, feature extraction approaches rely on highly engineered handcrafted features like the SIFT, which however are not able to generalize to more challenging cases. To achieve this objective, we consider the state-of-theart framework of deep learning [18] and more specifically the case of Convolutional Neural Networks (CNNs) [16], which has taken by storm almost all problems related to computer vision, ranging from image classification [15, 16], to object detection [17], and multi-modal learning [6]. At the same time, the concept of Autoencoders, a type of neural network which tries to appropriate the input at the output via regularization with various constrains, is also attracting attention due to its learning capacity in cases of unsupervised learning [21]. While significant effort has been applied in designing and evaluating deep learning architectures for image analysis, leading to highly optimized architectures, the problem of video analysis is at the forefront of research, where multiple avenues are explored. The urgent need for video analytics is driven by both the wealth of unstructured videos available online, as well as the complexities associated with adding the temporal dimension. In this work, we consider the problem of goal detection in broadcasted low quality football videos. The problem is formulated as a binary classification of short video sequences which are encoded though a spatiotemporal deep feature learning network. The key novelties of this work are to: • Develop a novel dataset for event detection in sports video and more specifically, for goal detection is football games; • Investigate deep learning architectures, such as CNN and Autoencoders, for achieving efficient event detection; • Demonstrate that learning, and thus accurate event detection, can be achieved by leveraging information from a few labeled examples, exploiting pre-trained models. State-of-the-art For video analytics, two major lines of research have been proposed, namely frame-based and motion-based, where in the former case, features are extracted from individual frames, while in the latter case, additional information regarding the inter-frame motion, like optical flow [3], is also introduced. In terms of single frame spatial feature extraction, CNNs have had a profound impact in image recognition, scene classification, and object detection, among others [16]. To account for the dynamic nature of video, a recently proposed concept involves extenting the two-dimensional convolution to three dimensions, leading to 3D CNNs, where temporal information is included as a distinct input [12, 13]. An alternative approach for encoding the temporal informarion is through the use of Long-Short Term Memory (LSTM) networks [1, 13], while another concept involves the generation of dynamic images through the collapse of multiple video frames and the use of 2D deep feature exaction on such representations [7]. In [2], temporal information is encoded through average pooling of frame-based descriptors and Figure 1: Block diagram of the proposed Goal detection framework. A 20-frame moving window initially selects part of the sequence of interest, and the selected frames undergo motion estimation. Raw pixel values and optical flows are first independently encoded using the pre-trained deep CNN for extracting spatial and temporal features. The extracted features can either be introduced into a higher level network for fusion which is fine-tuned for the classification problem, or concatenated and used as extended input features for the classification. the subsequent encoding in Fisher and VLAD vectors. In [4], the authors investigated deep video representation for action recognition, where temporal information was introduced in the frame-diff layer of the deep network architecture, through different temporal pooling strategies applied in patch-level, frame-level, and temporal window-level. One of the most successful frameworks for encoding both spatial and temploral information is the two-stream CNN [8]. Two-stream networks consider two sources of information, raw frames and optical flow, which are independently encoded by a CNN and fused into an SVM classifier. Further studies on this framework demonstrated that using pre-trained models can have a dramatic impact on training time, for the spatial and temporal features [22], while convolutional two-stream network fusion was recently applied in video action recognition [23]. The combination of 3D convolutions and the two-stream approach was also recently reported for video classification, achieving state-of-theart performance at significantly lower processing times [24]. The performance demonstrated by the two-streams approach for video analysis led to the choice of this paradigm in this work. Event Detection Network The proposed temporal event detection network is modeled as a two-stream deep network, coupled with a sparsity regularized Autoencoder for fusion of spatial and temporal data. We investigate Convolutional and Autoencoder Neural Networks for the extraction of spatial, temporal and fused spatio-temporal features and the subsequent application of kernel based Support Vector Machines for the binary detection of goal events. A high level overview of the processing pipeline is shown in Figure 1. While in fully-connected networks each hidden activation is computed by multiplying the entire input by the corresponding weights in that layer, in CNNs each hidden activation is computed by multiplying a small local input against the weights. The typical structure of a CNN consists of a number of convolution and pooling/subsampling layers, optionally followed by fully connected layers. At each convolution layer, the outputs of the previous layer are convolved with learnable kernels and passed through the activation function to form this layer’s output feature map. Let n× n be a square region extracted from a training input image X ∈ RN×M , and w be a filter of kernel size (m×m). The output of the convolutional layer h ∈ R(n−m+1)×(n−m+1) is given by: hi j = σ (m−1 ∑ a=0 m−1 ∑ b=0 wabx (i+a)( j+b)+b ` i j ) , (1) where b is the additive bias term, and σ(·) stands for the neuron’s activation unit. Specifically, the activation function σ , is a standard way to model a neuron’s output, as a function of its input. Convenient choices for the activation function include the logistic sigmoid, the hyperbolic tangent, and the Rectified Linear Unit. Taking into consideration the training time required for the gradient descent process, the saturating (i.e tanh, and logistic sigmoid) non-linearities are much slower than the non-saturating ReLU function. The output of the convolutional layer is directly utilized as input to a sub-sampling layer that produces downsampled versions of the input maps. There are several types of pooling, two common types of which are max-pooling and average-pooling, which partition the input image into a set of non-overlapping or overlapping patches and output the maximum or average value for each such sub-region. For the 2D feature extraction networks, we consider the VGG-16 CNN architecture, which is composed of 13 convolutional layers, with five of them being followed by a max-pooling layer, leading to three fully connected layers [9]. Unlike image detection problems, feature extraction in video must address the challenges associat\",\"PeriodicalId\":261646,\"journal\":{\"name\":\"Computer Vision Applications in Sports\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-01-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision Applications in Sports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2352/ISSN.2470-1173.2017.16.CVAS-344\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision Applications in Sports","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2352/ISSN.2470-1173.2017.16.CVAS-344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

与图像检测问题不同，视频中的特征提取必须解决相关的挑战

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Goal!! Event detection in sports video

Understanding complex events from unstructured video, like scoring a goal in a football game, is an extremely challenging task due to the dynamics, complexity and variation of video sequences. In this work, we attack this problem exploiting the capabilities of the recently developed framework of deep learning. We consider independently encoding spatial and temporal information via convolutional neural networks and fusion of features via regularized Autoencoders. To demonstrate the capacities of the proposed scheme, a new dataset is compiled, composed of goal and no-goal sequences. Experimental results demonstrate that extremely high classification accuracy can be achieved, from a dramatically limited number of examples, by leveraging pretrained models with fine-tuned fusion of spatio-temporal features. Introduction Analyzing unstructured video streams is a challenging task for multiple reasons [10]. A first challenge is associated with the complexity of real world dynamics that are manifested in such video streams, including changes in viewpoint, illumination and quality. In addition, while annotated image datasets are prevalent, a smaller number of labeled datasets are available for video analytics. Last, the analysis of massive, high dimensional video streams is extremely demanding, requiring significantly higher computational resources compared to still imagery [11]. In this work, we focus on the analysis of a particular type of videos showing multi-person sport activities and more specifically football (soccer) games. Sport videos in general are acquired from different vantage points and the decision of selecting a single stream for broadcasting is taken by the director. As a result, the broadcasted video stream is characterized by varying acquisition conditions like zooming-in near the goalpost during a goal and zooming-out to cover the full field. In this complex situation, we consider the high level objective of detecting specific and semantically meaningful events like an opponent team scoring a goal. Succeeding in this task will allow the automatic transcription of games, video summarization and automatic statistical analysis. Despite the many challenges associated with video analytics, the human brain is able to extract meaning and provide contextual information in a limited amount of time and from a limited set of training examples. From a computational perspective, the process of event detection in a video sequence amounts to two foundamental steps, namely (i) spatio-temporal feature extraction and (ii) example classification. Typically, feature extraction approaches rely on highly engineered handcrafted features like the SIFT, which however are not able to generalize to more challenging cases. To achieve this objective, we consider the state-of-theart framework of deep learning [18] and more specifically the case of Convolutional Neural Networks (CNNs) [16], which has taken by storm almost all problems related to computer vision, ranging from image classification [15, 16], to object detection [17], and multi-modal learning [6]. At the same time, the concept of Autoencoders, a type of neural network which tries to appropriate the input at the output via regularization with various constrains, is also attracting attention due to its learning capacity in cases of unsupervised learning [21]. While significant effort has been applied in designing and evaluating deep learning architectures for image analysis, leading to highly optimized architectures, the problem of video analysis is at the forefront of research, where multiple avenues are explored. The urgent need for video analytics is driven by both the wealth of unstructured videos available online, as well as the complexities associated with adding the temporal dimension. In this work, we consider the problem of goal detection in broadcasted low quality football videos. The problem is formulated as a binary classification of short video sequences which are encoded though a spatiotemporal deep feature learning network. The key novelties of this work are to: • Develop a novel dataset for event detection in sports video and more specifically, for goal detection is football games; • Investigate deep learning architectures, such as CNN and Autoencoders, for achieving efficient event detection; • Demonstrate that learning, and thus accurate event detection, can be achieved by leveraging information from a few labeled examples, exploiting pre-trained models. State-of-the-art For video analytics, two major lines of research have been proposed, namely frame-based and motion-based, where in the former case, features are extracted from individual frames, while in the latter case, additional information regarding the inter-frame motion, like optical flow [3], is also introduced. In terms of single frame spatial feature extraction, CNNs have had a profound impact in image recognition, scene classification, and object detection, among others [16]. To account for the dynamic nature of video, a recently proposed concept involves extenting the two-dimensional convolution to three dimensions, leading to 3D CNNs, where temporal information is included as a distinct input [12, 13]. An alternative approach for encoding the temporal informarion is through the use of Long-Short Term Memory (LSTM) networks [1, 13], while another concept involves the generation of dynamic images through the collapse of multiple video frames and the use of 2D deep feature exaction on such representations [7]. In [2], temporal information is encoded through average pooling of frame-based descriptors and Figure 1: Block diagram of the proposed Goal detection framework. A 20-frame moving window initially selects part of the sequence of interest, and the selected frames undergo motion estimation. Raw pixel values and optical flows are first independently encoded using the pre-trained deep CNN for extracting spatial and temporal features. The extracted features can either be introduced into a higher level network for fusion which is fine-tuned for the classification problem, or concatenated and used as extended input features for the classification. the subsequent encoding in Fisher and VLAD vectors. In [4], the authors investigated deep video representation for action recognition, where temporal information was introduced in the frame-diff layer of the deep network architecture, through different temporal pooling strategies applied in patch-level, frame-level, and temporal window-level. One of the most successful frameworks for encoding both spatial and temploral information is the two-stream CNN [8]. Two-stream networks consider two sources of information, raw frames and optical flow, which are independently encoded by a CNN and fused into an SVM classifier. Further studies on this framework demonstrated that using pre-trained models can have a dramatic impact on training time, for the spatial and temporal features [22], while convolutional two-stream network fusion was recently applied in video action recognition [23]. The combination of 3D convolutions and the two-stream approach was also recently reported for video classification, achieving state-of-theart performance at significantly lower processing times [24]. The performance demonstrated by the two-streams approach for video analysis led to the choice of this paradigm in this work. Event Detection Network The proposed temporal event detection network is modeled as a two-stream deep network, coupled with a sparsity regularized Autoencoder for fusion of spatial and temporal data. We investigate Convolutional and Autoencoder Neural Networks for the extraction of spatial, temporal and fused spatio-temporal features and the subsequent application of kernel based Support Vector Machines for the binary detection of goal events. A high level overview of the processing pipeline is shown in Figure 1. While in fully-connected networks each hidden activation is computed by multiplying the entire input by the corresponding weights in that layer, in CNNs each hidden activation is computed by multiplying a small local input against the weights. The typical structure of a CNN consists of a number of convolution and pooling/subsampling layers, optionally followed by fully connected layers. At each convolution layer, the outputs of the previous layer are convolved with learnable kernels and passed through the activation function to form this layer’s output feature map. Let n× n be a square region extracted from a training input image X ∈ RN×M , and w be a filter of kernel size (m×m). The output of the convolutional layer h ∈ R(n−m+1)×(n−m+1) is given by: hi j = σ (m−1 ∑ a=0 m−1 ∑ b=0 wabx (i+a)( j+b)+b ` i j ) , (1) where b is the additive bias term, and σ(·) stands for the neuron’s activation unit. Specifically, the activation function σ , is a standard way to model a neuron’s output, as a function of its input. Convenient choices for the activation function include the logistic sigmoid, the hyperbolic tangent, and the Rectified Linear Unit. Taking into consideration the training time required for the gradient descent process, the saturating (i.e tanh, and logistic sigmoid) non-linearities are much slower than the non-saturating ReLU function. The output of the convolutional layer is directly utilized as input to a sub-sampling layer that produces downsampled versions of the input maps. There are several types of pooling, two common types of which are max-pooling and average-pooling, which partition the input image into a set of non-overlapping or overlapping patches and output the maximum or average value for each such sub-region. For the 2D feature extraction networks, we consider the VGG-16 CNN architecture, which is composed of 13 convolutional layers, with five of them being followed by a max-pooling layer, leading to three fully connected layers [9]. Unlike image detection problems, feature extraction in video must address the challenges associat

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision Applications in Sports

自引率

0.00%

发文量