1. Introduction
In many countries, pigs are the main source of meat for people, and pork is an important part of livestock products and food composition and has considerable economic value [
1]. By 2021, the total pork output of China had reached 52.959 million tons, which accounted for the largest proportion of total meat production, that is, approximately 58.9%. The huge demand for pork has accelerated the development of large-scale pig farming and promoted higher requirements for intensive and specialized modern pig breeding technology [
2]. With the transformation of the pig breeding mode, the health status and welfare level of pigs have also attracted increasing attention [
3]. Particularly in large-scale breeding houses, because of the high feeding density, it is difficult for farmers to take good care of each pig and detect pig abnormalities in a timely manner based on manual inspection alone [
4]. A delay in the treatment of sick pigs may cause heavy production losses. Vocalization is an important way for pigs to transmit real-time health information to greatly improve the efficiency of sick pig evaluation and environmental regulation and promote healthy and efficient pig breeding [
5,
6].
At present, studies on pig vocalization recognition have mainly focused on the classification of pig voices. A large number of studies have been conducted around the features and classification models of vocal signals and achieved good results. In most studies, Mel frequency cepstral coefficient (MFCC) was frequently used as the key acoustic feature in animal sound category classification and abnormal recognition [
7]. In addition, frequency and time domain features, such as root mean square (RMS) and power spectral density (PSD), were also considered as the key features in sound classification [
8]. Chung et al. [
9] used the support vector data description and sparse representation classifier as the early abnormal monitor and respiratory disease classifier, respectively, by extracting the MFCC. The results showed that the method could be used to accurately monitor pig diseases (94% of monitoring accuracy and 91% of classification accuracy). Studies found there are obvious differences in the time domain and frequency domain features of different types of pig vocalizations [
10]. Exadaktylos et al. [
11] studied the frequency features of coughing vocalization in sick pigs using power spectral density (PSD) and classified the vocalization, and the accuracy of coughing vocalization recognition was 82%. Xu et al. [
12] extracted the vocal PSD feature as the clustering center and identified the coughing and squealing vocalizations of pigs; the overall recognition accuracies were approximately 83.4% and 83.1%, respectively. However, sound data often show poor robustness when their signal-to-noise ratio is low, due to their non-stationary characteristic [
7]. Additionally, sounds generally contain multiple acoustic features; it is difficult to further improve sound category classification accuracies by only relying on a single feature [
13], especially under real-life production conditions, which restricts certain kinds of classification for acoustic features. Fusion strategies provide a new direction for boosting the accuracy of pig cough sound recognition [
7].
Regarding feature fusion, Li et al. [
14] combined short-time energy with time domain features and the MFCC dimensionality with frequency domain features, and they further reduced the dimensionality using PCA to construct a deep belief network pig coughing vocalization recognition model fine-tuned by a BP neural network. The recognition rate of pig coughing vocalization was improved and reached 95.8% in the optimal group, which was higher than the results analyzing from single feature [
9,
11,
12,
14]. In addition, it was found there were a lot of acoustic features showing up differently among different sound categories. The RMS value of a non-infectious pig cough was higher than that of an infectious pig cough, and there were also significant differences in the duration and short-term energy (STE) of coughing vocalization in healthy pigs and pigs with respiratory diseases [
15,
16]. Researchers have found that there is a significant difference between the mean value of the formant frequency vocalization of pigs in a normal state and those in an abnormal state. When the mean value of the formant frequency vocalization is lower than 2671.99 Hz and the signal duration is less than 0.28 s, piglets are in a normal state; otherwise, they are in an abnormal state [
15,
16]. The in-depth clarification of the features of each type of vocalization will be conducive to vocalization classification and vocalization information extraction. However, it is not better to introduce more parameters into classification algorithm, more parameters mean more noise, which will affect the classification performance [
17]. Wang et al. [
18] reduced the dimensionality of the MFCC features of piglet coughing vocalization using principal component analysis (PCA), the input features were reduced to 13 from 24, and the accuracy achieved 95% using relatively mature and simple support vector machine algorithms. The sound of pig is one of its important pieces of physical information that closely reflect its growth status and health condition; different sound categories are considered as bases for judging the stress state of pigs [
19]. In addition to coughing, typical pig sounds include grunting and squealing. In current research on the classification and recognition of abnormal voices in pigs, researchers mainly focus on coughing vocalization, and only few studies focus on the classification and monitoring of various sound types of pigs in large-scale breeding houses, making a lack of the effective mining of the vocal information of pigs, which has seriously weakened the accuracy of vocalization information in reflecting the health condition and breeding environment of pigs. Yu et al. [
20] developed a genetic algorithm optimized BP neural network with multi-feature fusion to successfully recognize the typical calls of laying hens, such as egg laying, singing, feeding, and screeching. Although the audio characteristics of pigs are different from those of hens, this study still gives us a good idea to classify and recognize pig sounds using a relatively mature and easy-to-use method.
With the development of signal processing technology, machine learning algorithms have been gradually applied to the field of pig sound categories classification. In this study, the main objective was to develop a vocalization classification model based on multi-feature fusion to classify and identify pig grunting, squealing, and coughing. The sub-objectives were (1) to evaluate the effect of a comprehensive evaluation score as a newly introduced feature on pig sound classification and (2) to compare the influence of different dimensions of features on the recognition effect of the model.
4. Conclusions
In this study, a pig vocalization classification recognition method was proposed based on the GA-BP neural network and multi-feature fusion with the time domain, frequency domain, and comprehensive evaluation score. The classification recognition of pig grunting, squealing, and coughing was performed, and then the recognition performances of classification models with various feature combinations for various types of pig vocalization were compared and optimized. After the dimensionality of short-time energy, frequency centroid, formant frequency and first-order difference, and MFCC and first-order difference feature were reduced using PCA, the vocalization classification model constructed using the 16-dimensional features, which included the comprehensive evaluation score of pig vocalization, had the highest recognition performance for three types of pig vocalization, with an average recognition accuracy of 93.2%, average precision of 92.9%, and average recall of 92.8%. It was feasible and efficient to apply the multi-feature fusion algorithm to the classification of pig vocalization, and the introduction of features that clearly distinguished vocalization types effectively improved the recognition ability of the vocalization classification model for various types of pig vocalization.