Acoustic Classification of Singing Insects Based on MFCC/LFCC Fusion

Noda, Juan J.; Travieso-González, Carlos M.; Sánchez-Rodríguez, David; Alonso-Hernández, Jesús B.

doi:10.3390/app9194097

Open AccessArticle

Acoustic Classification of Singing Insects Based on MFCC/LFCC Fusion

by

Juan J. Noda

^1,†,

Carlos M. Travieso-González

^1,2,†

,

David Sánchez-Rodríguez

^1,3,*,†

and

Jesús B. Alonso-Hernández

^1,2,†

¹

Institute for Technological Development and Innovation in Communications, University of Las Palmas de Gran Canaria, Campus Universitario de Tafira, 35017 Las Palmas de Gran Canaria, Spain

²

Signal and Communications Department, University of Las Palmas de Gran Canaria, Campus Universitario de Tafira, 35017 Las Palmas de Gran Canaria, Spain

³

Telematic Engineering Department, University of Las Palmas de Gran Canaria, Campus Universitario de Tafira, 35017 Las Palmas de Gran Canaria, Spain

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2019, 9(19), 4097; https://doi.org/10.3390/app9194097

Submission received: 11 August 2019 / Revised: 15 September 2019 / Accepted: 23 September 2019 / Published: 1 October 2019

(This article belongs to the Special Issue Recent Advances on Wireless Acoustic Sensor Networks (WASN))

Download

Browse Figures

Versions Notes

Abstract

:

This work introduces a new approach for automatic identification of crickets, katydids and cicadas analyzing their acoustic signals. We propose the building of a tool to identify this biodiversity. The study proposes a sound parameterization technique designed specifically for identification and classification of acoustic signals of insects using Mel Frequency Cepstral Coefficients (MFCC) and Linear Frequency Cepstral Coefficients (LFCC). These two sets of coefficients are evaluated individually as has been done in previous studies and have been compared with the fusion proposed in this work, showing an outstanding increase in identification and classification at species level reaching a success rate of 98.07% on 343 insect species.

Keywords:

acoustic monitoring; biological acoustic analysis; bioacoustic taxonomy identification; insect sound classification; support vector machine

1. Introduction

Insects, with more than one million described species, form the largest ecological group on Earth, and there may be as many more yet to be identified. In the species acoustic classification field, considerable studies have been made to create intelligent and automatic systems to correctly identify and classify these species. This field is mostly based on the ability of these species to create a specific sound either deliberately or by its biological activities as moving from one place to another, as a defense against predators or any other goal as described in many studies as in [1,2,3,4].

Research on those sounds has revealed that it is possible to obtain reliable information of a taxonomic classification and this could be used to measure the biodiversity of a habitat as presented in [5]. The development of tools to have knowledge of the biodiversity will give more information and indicators about the state of the biodiversity [6]. In particular, insects play an important role in ecological balance and biodiversity conservation [4,7].

This research has proven to be of great interest to many fields due to the importance of the identification and classification of insects’ sounds. Insects are beneficial organisms in agriculture and forestry activities. Nevertheless, in some cases, commercial crop growers consider these species as pests, and these insects are eliminated by insecticides. Moreover, the tasks of detection and identification of species are performed by biologists, and this human expertise is expensive in terms of time and money. Hence, an intelligent system as the one presented in this work could help to reduce these costs and time in identification tasks.

Insects exhibit a wide variety of sound generation mechanisms [2,8,9,10]. Typically, katydids and crickets (Orthoptera order) produce sounds by stridulation rubbing a scraper or plectrum against a file. The stridulation organs are located on their forewings, the file is composed of a row of teeth which is dragged over a callous area (the scraper) generating a vibration. The resulting sound signal is then amplified by resonance on the harp and the mirror structures producing species-specific frequency and temporal patterns. This mechanism is driven by descending commands from the brain to the thoracic central pattern generator (CPG). A deeply neurophysiological study of this acoustic behavior can be found in [11]. On the other hand, cicadas (Hemiptera order) mainly make sounds using a pair of tymbal organs situated on the sides of their abdomen. Each tymbal is composed of a ribbed membrane attached to a tymbal muscle. A series of rapid muscle contractions deform the membrane making it vibrate, producing the sound.

The signals produced by insects are distributed over a wide frequency range between 15 Hz and 100 KHz, where the carrier frequency is influenced by the size of its sound production organs [12], but these are not always correlated with the insect body size [13]. Crickets usually produce almost pure tone sinusoidal signals with carrier frequencies around 2–10 KHz. By contrast, katydids normally generate broadband signals up to 100 KHz. Songs vary widely across cicada species and include wide band and almost pure sinusoidal sound signals.

The aim of this research is to create a bioinformatic expert system based on digital acoustic signal processing and machine learning techniques to classify species of crickets, katydids and cicadas. Thus, the contribution of this paper is the introduction of a methodology based on a new sound parametrization technique to obtain a more robust representation of the chants of the insects. This methodology allows a more effective representation of insect sounds fusing two types of cepstral coefficients, MFCC and LFCC, to gather information across the whole spectrum. Moreover, this approach can be implemented in a real system to perform the task of detection, segmentation, parameterization and classification of these sounds without human intervention.

The remainder of this paper is organized as follows. Section 2 summarizes the state-of-the-art about acoustic classification of insects. Next, the audio signal segmentation and the feature extraction procedure are described in Section 3. Then, two classification systems based on Support Vector Machine (SVM) and Random Forest (RF) algorithms are described in Section 4, particularized for acoustic recognition. The experimental methodology, the sound dataset, and the setup of experiments are shown in Section 5. Next, the experimental results are discussed in Section 6. Finally, the conclusions and future work are shown in Section 7.

2. State of the Art

In spite of the significant importance that insects play in the fields of biology and ecological conservation, the number of studies in the field of automatic acoustic identification of insects is low. However, in the past years, some related interesting studies have been done. For instance, in [14], there is a good approach which has been used as the baseline method for comparison. This approach relies on three machine learning algorithms: Hidden Markov Models (HMM), Gaussian Mixture Models (GMM) and Probabilistic Neural Networks (PNN) to perform the classification of several singing insect species. Besides, the authors employed LFCC, the dominant harmonic and segment duration as features to parameterize the insect sounds, achieving an identification accuracy of 86.3% on the level of specific species. This study was conducted on 313 species of crickets, katydids and cicadas from the Insect Sound World collection of crickets and katydids from Japan and the North America sound collection (SINA). On the other hand, the HMM method was also proposed by Leqing and Zend in [15] for insect pest management. This work was based on a sub-band decomposition method to extract parameters from the wavelet transform of the sound signal. The proposed method was successfully tested on 50 insect sound samples which were collected from the sound library of the United States Department of Agriculture (USDA) yielding a 90.98% accuracy. Another approach was followed in [16] to classify 25 British Orthoptera species and ten bird species using a method that relies exclusively on time domain features. The signal is split into epochs (successive zero-crossings) and encoded in terms of shape and duration by using a Time-Domain Signal Coding (TDSC) procedure. This technique fed a Neuronal Network (NN) achieving results of 99% recognition accuracy.

In [17], Continuous Wavelet Transformation (CWT) and Convolutional Neural Network (CNN) were used for mosquito detection and bird identification getting promising results. The 57 mosquito recordings used in this work were labeled with a success rate of 92.2% detecting the presence or absence of mosquitoes surpassing human labelers. More recently, in [18], the authors identified three bee species and a hornet analyzing their flight sounds. The method was able to correctly distinguish between flight sounds and environmental sounds, but only had a precision of 70–90% in terms of species-specific identification. For classification, they employed MFCC and SVM.

There are no doubts that significant advances have been reported in the literature regarding the recognition of biological acoustic signals. Nevertheless, an improvement to actual methodologies is needed, giving a better success ratio; and this could be obtained merging or fusing different sets of features to enhance the quality of the information supplied to the intelligent system to enable the recognition of a greater number of species. Hence, this work presents a novel method to successfully identify and classify 255 species of crickets, katydids and cicadas registered in the Singing Insects of North America (SINA) [19] collection and 88 species of cicadas from the web collection Insectsingers [20]. This approach has given a success rate of 98.07% in the identification stage through the novel use of fused frequency features in the pattern recognition process. This new approach allows a more effective representation of insects sounds, where MFCC represents the information at lower frequency regions and the LFCC represents the information at higher frequency regions. The fusion of coefficients increases the efficiency and reliability of the system making the proposed approach unique in terms of insect species recognition. Compared to previous works, this proposal is capable of identifying and classifying 343 species of either katydids crickets or cicadas on the level of specific species with higher success rates than previous systems, as shown in [14,15,16,17,18].

3. Segmentation and Feature Extraction

In order to develop a feature extraction process to yield useful information for identification and classification purposes, a pre-processing stage was made. This stage involves the tasks of segmenting the continuous recordings of all the species into sets of samples per species. Afterwards, a feature extraction process was made obtaining the LFCC and MFCC of the insects’ sounds.

3.1. Segmentation

Each sound recording is made up of multiple calls of one insect species, so the segmentation is performed taking continuous recordings of singing insects and extracting the calls produced by that species. To achieve this task, Härmä procedure [21] has been followed on each recording. Härmä procedure looks for adjacent signal picks until a threshold would be reached. Once a pattern is found, the recording is sliced to obtain individual samples which are stored into the database.

To begin with, the original signal is transformed of stereotype to monotype as:

x (n) = x_{r} (n) + x_{l} (n)

(1)

where

x_{r}

and

x_{l}

denotes the right and left channel of original audio signal. The signal,

x (n)

, is divided into a set of frequency and amplitude modulated pulses analyzing the audio spectrogram, where each pulse represents an insect call or syllable. The audio spectrogram was calculated using the Short Time Fourier Transform (STFT) and applying a Hamming window filter of 256 samples (signal frequency sample = 1024) with 30% overlap. An example of the resulting spectrogram can be appreciated in Figure 1. The signal spectrogram is characterized as a matrix in frequency and time

M (f, t)

. This matrix is computed as the magnitude of the STFT, and therefore, contains the power spectral density of the signal. The algorithm scans the matrix by finding the highest peak amplitudes in the spectrogram to localize the calls,

|M (f_{k}, t_{k})| \geq |M (f, t)| \forall (f, t)

, positioning the kth syllable in

t_{k}

. The amplitude of this point is computed as

A_{k} (0) = 20 \log_{10} (|M (f_{k}, t_{k})|)

. From

t_{k}

, the algorithm explores the matrix seeking the syllable edges for

t < t_{k}

and

t > t_{k}

until

A_{k} (t) < A_{k} (0) - λ

dB is reached by both sides, where

λ

represents the stop criteria which was established to 30 dB. This time slot is saved as the

k^{t h}

syllable and deleted from the spectrogram

M (f, t_{k} - t_{s}, \dots, t_{k} + t_{e}) = 0

. These steps are repeated increasing

k = k + 1

until the end of the spectrogram is reached.

This procedure is carried out iteratively with all 343 species to obtain individual samples of the chants of each species from the continuous sound recordings.

3.2. Feature Extraction

In order to develop an efficient pattern recognition system, a sound parameterization technique designed specifically for acoustic identification and classification of acoustic sounds of insects is applied in this work. This technique is based on MFCC and LFCC. These parameters are forwarded as feature vectors to the machine learning algorithm to carry out taxonomic identification.

The LFCC is a linear cepstral representation of sound. This representation is known as cepstrum and is the result of taking the Discrete Fourier Transform (DFT) of the logarithm of the estimated spectrum of a signal. These linear coefficients have been calculated using a linear filter bank, which has better resolution in higher frequency regions. The MFCC is based on a nonlinear cepstral representation of the signal equally spaced on the Mel scale. This scale approximates the response of the human auditory system more closely than linear-spaced frequencies [22]. The transformation from Hertz to Mel scale is determined by the following equation:

m e l = 2595 {log}_{10} (\frac{f}{700} + 1)

(2)

For both, LFCC and MFCC computation, a window size of 20 ms has been selected with an overlap of the window segments of 33%. The MFCC used in this proposal are calculated based on the method of Lee et al. [23], with some parameter adjustments to obtain better results from these coefficients. In this study, Mel filter bank consists of 40 triangular filters. The length of MFCC for each frame is L = 18. Finally, the average MFCC over all frames is calculated as:

f_{m} = \sum_{n = 1}^{N} (C_{m}, n) / N, 0 \leq m \leq L - 1

(3)

where

f_{m}

is the mth Mel coefficient, and N is the number of frames.

To calculate the LFCC, a log energy filter bank has been applied to the discrete cosine transform as shown in (4), where j denotes the index of the linear frequency cepstral coefficients,

X_{i}

is the log-energy output from the ith filter, B is the number of triangular filters and M the number of coefficients to compute.

L F C C_{j} = \sum_{i = 1}^{B} X_{i} cos (j (i - 1 / 2) π / B), j = 0 \dots M

(4)

The proposed approach in this work is based on a data fusion of the MFCC and LFCC in order to obtain a more robust sound parameterization of insect sounds. The data fusion technique consists of creating a matrix of data with both LFCC and MFCC. This matrix is formed by horizontally concatenating the MFCC data matrix and the LFCC data matrix as shown in (5). Each row of the matrix represents a segmented syllable and the columns are the MFCC and LFCC extracted by each syllable, merged horizontally. Most insects can produce sounds in frequencies anywhere in the spectrum, forcing the system to find the right useful information at many ranges of the acoustic spectrum. These two types of cepstral coefficients have been selected to be fused because several species of insects produce their sounds at higher frequencies and other at lower frequencies. Thus, a representation the acoustic signal information with a high degree of precision is reached.

P a r a m e t e r s = [M F C C L F C C]

(5)

4. Classification System

This section contains a brief introduction to the machine learning algorithms that have been selected to identify the samples, SVM and RF, parameterized for the classification of insects’ sounds.

4.1. SVM

Given a set of classes, the SVM algorithm [24] calculates the optimal hyperplane to separate the samples into two classes. For nonlinear classification, the hyperplane decision boundary can be designed with different kernels; in this work, a Radial Basis Function (RBF) or Gaussian kernel have been utilized:

k (x, x^{'}) = exp (\frac{- {(x - x^{'})}^{2}}{2 σ^{2}})

(6)

In adittion, a soft margin can be also applied to introduce a penalty parameter, C, for misclassified data points. The parameters

σ

and C have been selected thought a two-step grid search, where

C = {2^{- 10}, 2^{- 14}, \dots, 2^{10}}

and

σ = {2^{- 15}, 2^{- 14}, \dots, 2^{3}}

. Due to SVM only recognizes two classes, the multiclass classification system has been implemented through the strategy one-versus-all [25].

4.2. RF

Random Forest [26] is an ensemble machine learning technique which implements a group of decision trees (DT) randomly sampling subsets of the training feature space (bagging). For classification, the predicted class is obtained by using a majority voting method:

P r e d i c t i o n = a r g m a x_{c} (\sum_{i = 1}^{T} Y_{i} = c), where Y_{n} is the n th tree output .

(7)

In this study, the number of trees has been established in

T = 100

to characterize the insects sounds. In addition, the maximum number of features allowed to try in an individual tree was defined as

m = \sqrt{N}

where N is the number of features per syllable.

5. Experimental Methodology

In order to implement the intelligent automatic system for identification and classification of insects by their sounds, an experimental methodology has been followed. This methodology includes the sound collections used to test the robustness of the algorithm, the design of the experiments conducted to achieve the classification results and its analysis.

5.1. Sound Collections

SINA corpora of insects’ recordings have been used to provide a reliable evaluation scheme for the intelligent system for automatic identification and classification of singing insects, obtained directly from [19]. A total of 255 species of katydids, crickets and cicadas from this collection have been used. Additionally, 88 species sound collection of cicadas have been obtained from the web collection Insectsingers [20], which gives us a total of 343 species of katydids, crickets and cicadas in the final dataset.

5.2. Experiment Setup

The experiments were organized using a supervised learning method which consists in the use of pre-established knowledge to determine the best classification scenario. All experiments are carried out using an Intel i7-4510U processor and 16 GB RAM. Besides, experiments were carried out 100 times to achieve statistical significance. Thus, in each iteration, the dataset has been randomly shuffled and split into training and testing subsets, and the average of the results has been calculated. Therefore, in each experiment, the system is trained with different parts of the dataset avoiding selection bias in the results. Due to the number of samples per class is relatively small, the holdout method was applied, as in [27], which builds train and test subsets composed on non-overlapping part of the corpora. Specifically, 50% of the samples have been taken for training purposes and the remaining samples for test purposes. The complete methodology followed in this work can be seen in the flow diagram presented in Figure 2.

5.3. Computational Complexity

The segmentation phase and the classifier training phase are the most computationally cost intensive. Segmentation depends on the length of the audio files and performs on them a search algorithm to locate the syllables. Using the big O notation, the complexity of the hamming algorithm can be expressed as O(nlog(n)). On the other hand, the training complexity of the classification algorithms are as follows: O(n

^{2}

) for SVM and O(nlog(n)) for Random Forest. Therefore, in the SVM solution, the algorithm presents quadratic time complexity.

6. Results and Discussions

A set of experiments carried out using different features, LFCC and MFCC, and different classifiers, RF and SVM, are presented in this section. The experimental results are presented in Table 1, where the success rate is shown in terms of mean and standard deviation (std), and the accuracy in each experiment has been computed dividing the number of sounds correctly identified by the total number of sounds. The columns, TT and TST represent the mean training time (TT) and testing time (TST) in seconds.

As can be observed, with respect to the performance of the classification algorithm, both algorithms have reached a high success rate. A high proportion of the produced sounds by insects are of very short time duration and great amplitude, so they generate a clear spectrogram which allows an optimal segmentation. Therefore, the input data to the classification phase present very little noise, and thus, the discrimination among classes and the success rate is improved regardless of the classification algorithm. On the other hand, analyzing the impact of the type of acoustic parameterization, it can be stated that the fusion of LFCC and MFCC features gives higher accuracy than each isolated one. LFCC exceeds to MFCC in each experiment because insects emit signals predominantly at high frequencies and with high energy levels that can reach up to 100 dB. Hence, the best representation of high frequencies by the LFCC provides a major amount of information to the system. Once all the results are obtained, it can be demonstrated that insects’ sounds classification system presented in this paper can identify the individuals in the database, offering a better success rate than the ones shown in the state of the art, and showing an improvement in the field.

Figure 3 shows a swarm plot with the results of the confusion matrix for each dataset utilizing the best approach in the experiments (SVM + MFCC/LFCC fusion). As can be noticed, the high concentration of classes is around 100% success rate. Besides, one of the insects could hardly be identified in all iterations. Specifically, for the database SINA the grasshopper Stilpnochlora couloniana only was identified 30% of the time (40% in the full dataset) due to a bad segmentation of the calls. Similarly, the audio file of the katydid Scudderia furcata contained only 2 calls that were divided into 4 by the segmentation algorithm which meant that only 65% of the time could be recognized. For the Insectsingers database, all results per class remain above 86%. The lowest result was 86.4% belonging to the cicada Diceroprocta marevagans. Observing the classes of the full dataset, it is evident that there is a greater dispersion of the results due to the increase of the spectral overlap among the frequencies of the insect syllables. Therefore, SVM has difficulties to find an optimal hyperplane to properly separate the different specie sounds. On the other hand, Figure 4 represents the confusion matrix histogram for the full dataset in terms of accuracy and standard deviation. As can be observed, the quality of the solution proposed in this study is demonstrated, where the values from the standard deviation are highly concentrated and closed to zero.

A comparison versus some references of the state-of-the-art is shown in Table 2. It is not easy to find references about the acoustic identification of insects considering that most of publications are based on general approaches [28] or another kind of animals. However, the results obtained on the SINA database can be compared with those obtained in [14], where the same database was used. Ganchev et al. were able to recognize insects at the species level with 86.3% success rate, while the presented data fusion on the same data reached 97.99%. As can be observed, this methodology is clearly capable of discerning on a greater number of species with rates of superior success than the previous techniques used in the state of the art.

7. Conclusions and Future Work

This work has introduced a classification system through a data fusion of Mel and lineal cepstral coefficients of chants of cricket, katydid, and cicada species, which accomplished the detection and identification of these sounds. These insect species are especially suitable for acoustic recognition because they tend to produce loud pure-tone songs. The results obtained have demonstrated that the fusion of these cepstral parameters improves the representation of the insect sound signals adding information of the high and low part of the spectrum enhancing the classification results. In addition, the increase in the number of features used by data fusion leads to a slight increment in training and classification times. In the experiments conducted, the SVM algorithm using a Gaussian kernel has achieved better results than RF due to its major ability to separate nonlinear relationships in the data. Nevertheless, the differences are less than 5% because of the clarity of the sounds emitted by these insect species.

The robustness of the system has been tested by classifying multiple species of insects simultaneously; 343 species, with a high success rate of 98.07%. The proposed method achieves superior results compared to previous methods [14,15,16,17], due to the proximity to a 100% classification. Moreover, this approach can be applied to other taxonomic groups adjusting the system parameters to the particularities of their sound emissions.

More work should be done in the field of bioinformatics for automatic species classification to develop more refined methods. In particular, future research should be oriented to successfully detect, recognize and classify insects (katydids, crickets and cicadas) by the family, suborder, genus and species level with a higher success rate than the ones presented in the state-of-the-art or in this study.

Author Contributions

Conceptualization, J.J.N., C.M.T.-G. and D.S.-R.; methodology, J.J.N., C.M.T.-G. and D.S.-R.; software, J.J.N.; validation, J.J.N.; investigation, J.J.N., C.M.T.-G., D.S.-R. and J.B.A.-H.; resources, J.J.N., C.M.T.-G. and D.S.-R.; writing—original draft preparation, J.J.N., C.M.T.-G., D.S.-R. and J.B.A.-H.; writing—review and editing, J.J.N., C.M.T.-G., D.S.-R. and J.B.A.-H.; supervision, C.M.T.-G. and D.S.-R.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

ter Hofstede, H.M.; Fullard, J.H. The neuroethology of song cessation in response to gleaning bat calls in two species of katydids, Neoconocephalus ensiger and Amblycorypha oblongifolia. J. Exp. Biol. 2008, 211, 2431–2441. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Montealegre-Z, F.; Morris, G.K.; Mason, A.C. Generation of extreme ultrasonics in rainforest katydids. J. Exp. Biol. 2006, 209, 4923–4937. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Morris, G.K.; Kerr, G.E.; Fullard, J.H. Phonotactic preferences of female meadow katydids (Orthoptera: Tettigoniidae: Conocephalus nigropleurum). Can. J. Zool. 1978, 56, 1479–1487. [Google Scholar] [CrossRef]
Gaston, K.J.; O’Neill, M.A. Automated species identification: Why not? Philos. Trans. R. Soc. Lond. Biol. Sci. 2004, 359, 655–667. [Google Scholar] [CrossRef] [PubMed]
Riede, K. Acoustic monitoring of Orthoptera and its potential for conservation. J. Insect Conserv. 1998, 2, 217–223. [Google Scholar] [CrossRef]
Prince, P.; Hill, A.; Piña Covarrubias, E.; Doncaster, P.; Snaddon, J.; Rogers, A. Deploying Acoustic Detection Algorithms on Low-Cost, Open-Source Acoustic Sensors for Environmental Monitoring. Sensors 2019, 19, 553. [Google Scholar] [CrossRef] [PubMed]
Samways, M.J. Insect Diversity Conservation; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Stephen, R.; Hartley, J. Sound production in crickets. J. Exp. Biol. 1995, 198, 2139–2152. [Google Scholar] [PubMed]
Robinson, D.J.; Hall, M.J. Sound signalling in Orthoptera. Adv. Insect Physiol. 2002. [Google Scholar] [CrossRef]
Fonseca, P.J. Cicada acoustic communication. In Insect Hearing and Acoustic Communication; Springer: Berlin/Heidelberg, Germany, 2014; pp. 101–121. [Google Scholar]
Jacob, P.F.; Hedwig, B. Acoustic signalling for mate attraction in crickets: Abdominal ganglia control the timing of the calling song pattern. Behav. Brain Res. 2016, 309, 51–66. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bennet-Clark, H. Size and scale effects as constraints in insect sound communication. Philos. Trans. R. Soc. Lond. Ser. Biol. Sci. 1998, 353, 407–419. [Google Scholar] [CrossRef] [Green Version]
Montealegre-Z, F. Scale effects and constraints for sound production in katydids (Orthoptera: Tettigoniidae): Correlated evolution between morphology and signal parameters. J. Evol. Biol. 2009, 22, 355–366. [Google Scholar] [CrossRef] [PubMed]
Ganchev, T.; Potamitis, I.; Fakotakis, N. Acoustic monitoring of singing insects. In Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing—ICASSP ’07, Honolulu, HI, USA, 15–20 April 2007; Volume 4, p. IV-721. [Google Scholar]
Leqing, Z.; Zhen, Z. Insect sound recognition based on SBC and HMM. In Proceedings of the 2010 International Conference on Intelligent Computation Technology and Automation, Changsha, China, 11–12 May 2010; Volume 2, pp. 544–548. [Google Scholar]
Chesmore, E.D. Application of time domain signal coding and artificial neural networks to passive acoustical identification of animals. Appl. Acoust. 2001, 62, 1359–1374. [Google Scholar] [CrossRef]
Kiskin, I.; Zilli, D.; Li, Y.; Sinka, M.; Willis, K.; Roberts, S. Bioacoustic detection with wavelet-conditioned convolutional neural networks. Neural Comput. Appl. 2018, 1–13. [Google Scholar] [CrossRef]
Kawakita, S.; Ichikawa, K. Automated Classification of Bees and Hornet Using Acoustic Analysis of their Flight Sounds; Springer: Berlin/Heidelberg, Germany, 2018; pp. 1–9. [Google Scholar]
Walker, T.J.; Moore, T.E. Singing Insects of North America(SINA) Collection. University of Florida. Available online: http://entnemdept.ufl.edu/walker/buzz/ (accessed on 24 April 2019).
Marshall, D.; Hill, K. Insectsingers. Available online: http://www.insectsingers.com/ (accessed on 23 April 2019).
Härmä, A. Automatic identification of bird species based on sinusoidal modeling of syllables. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’03), Hong Kong, China, 6–10 April 2003; Volume 5, p. V-545. [Google Scholar]
Wong, E.; Sridharan, S. Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification. In Proceedings of the 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing (ISIMP 2001) (IEEE Cat. No. 01EX489), Hong Kong, China, 4 May 2001; pp. 95–98. [Google Scholar]
Lee, C.H.; Chou, C.H.; Han, C.C.; Huang, R.Z. Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis. Pattern Recognit. Lett. 2006, 27, 93–101. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006; Volume 4. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Arlot, S.; Celisse, A. A survey of cross-validation procedures for model selection. Stat. Surv. 2010, 4, 40–79. [Google Scholar] [CrossRef]
Kaloudis, S.; Anastopoulos, D.; Yialouris, C.P.; Lorentzos, N.A.; Sideridis, A.B. Insect identification expert system for forest protection. Expert Syst. Appl. 2005, 28, 445–452. [Google Scholar] [CrossRef]
Le-Qing, Z. Insect sound recognition based on mfcc and pnn. In Proceedings of the 2011 International Conference on Multimedia and Signal Processing, Guilin, China, 14–15 May 2011; Volume 2, pp. 42–46. [Google Scholar]
Chaves, V.A.E.; Travieso, C.M.; Camacho, A.; Alonso, J.B. Katydids acoustic classification on verification approach based on MFCC and HMM. In Proceedings of the 2012 IEEE 16th International Conference on Intelligent Engineering Systems (INES), Lisbon, Portugal, 13–15 June 2012; pp. 561–566. [Google Scholar]

Figure 1. Example of two insect sound spectrograms.

Figure 2. Diagram of the proposed methodology.

Figure 3. Classification results by class.

Figure 4. Confusion matrix histogram.

Table 1. Classification results.

Dataset	Features	Classifier	TT (sec)	TST (sec)	Accuracy (%) ± std
SINA (255 species)	MFCC	RF	51.63	0.37	91.02% ± 17.66
	MFCC	SVM	76.18	63.43	95.20% ± 10.59
	LFCC	RF	49.34	0.35	94.18% ± 12.81
	LFCC	SVM	62.93	77.80	96.97% ± 7.13
	MFCC + LFCC	RF	79.35	0.40	94.78% ± 12.72
	MFCC + LFCC	SVM	93.00	66.55	97.99% ± 6.61
Insectsingers (88 species)	MFCC	RF	6.65	0.08	94.71% ± 11.66
	MFCC	SVM	2.31	1.41	98.23% ± 5.40
	LFCC	RF	6.45	0.08	95.44% ± 10.82
	LFCC	SVM	2.36	1.43	99.15% ± 2.62
	MFCC + LFCC	RF	9.66	0.09	96.72% ± 8.34
	MFCC + LFCC	SVM	3.57	1.61	99.40% ± 2.07
Full insects dataset (363 species)	MFCC		146.98	80.22	94.83% ± 14.22
	LFCC	SVM	155.40	77.65	96.94% ± 8.29
	MFCC + LFCC		246.39	170.29	98.07% ± 6.25

Table 2. Comparison of the proposal vs. the state-of-the-art.

Reference	Dataset	Features	Classifiers	Results
[14]	SINA (313 sounds)	LFCC and length frame	PNN, GMM and HMM	86.3%
[29]	Insect Sound Library: United States Department of Agriculture (USDA) (50 species)	MFCC	PNN	96.17%
[15]	Insect Sound Library: United States Department of Agriculture (USDA) (50 species)	Sub-band Based Cepstral	HMM	90.98%
[30]	26 grasshopper species	MFCC	HMM	99.31%
[16]	British Orthoptera (25 katydids)	Time Domain Signal Coding	Neural Network	99%
This work	Insect Dataset (343 species)	MFCC/LFCC fusion	SVM	98.05% ± 6.25

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Noda, J.J.; Travieso-González, C.M.; Sánchez-Rodríguez, D.; Alonso-Hernández, J.B. Acoustic Classification of Singing Insects Based on MFCC/LFCC Fusion. Appl. Sci. 2019, 9, 4097. https://doi.org/10.3390/app9194097

AMA Style

Noda JJ, Travieso-González CM, Sánchez-Rodríguez D, Alonso-Hernández JB. Acoustic Classification of Singing Insects Based on MFCC/LFCC Fusion. Applied Sciences. 2019; 9(19):4097. https://doi.org/10.3390/app9194097

Chicago/Turabian Style

Noda, Juan J., Carlos M. Travieso-González, David Sánchez-Rodríguez, and Jesús B. Alonso-Hernández. 2019. "Acoustic Classification of Singing Insects Based on MFCC/LFCC Fusion" Applied Sciences 9, no. 19: 4097. https://doi.org/10.3390/app9194097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Acoustic Classification of Singing Insects Based on MFCC/LFCC Fusion

Abstract

1. Introduction

2. State of the Art

3. Segmentation and Feature Extraction

3.1. Segmentation

3.2. Feature Extraction

4. Classification System

4.1. SVM

4.2. RF

5. Experimental Methodology

5.1. Sound Collections

5.2. Experiment Setup

5.3. Computational Complexity

6. Results and Discussions

7. Conclusions and Future Work

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI