# Prediction of Head Related Transfer Functions Using Machine Learning Approaches

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Acoustic Measurements

#### 2.2. Anthropometric Data

#### 2.3. Final Dataset

#### 2.4. Multivariate Analysis

#### 2.5. Simple Linear Regression

#### 2.6. Artificial Neural Networks

#### 2.7. Validation Method

#### 2.8. Robustness Criteria

## 3. Results and Discussion

## 4. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Moller, H.; Sorensen, M.F.; Hammershoi, D.; Jensen, C.B. Head related transfer functions of human subjects. J. Audio Eng. Soc.
**1995**, 43, 300–321. [Google Scholar] - Blauert, J.P. Spatial Hearing; Revised Edition; MIT: Cambridge, MA, USA, 1997. [Google Scholar]
- Wenzel, E.M.; Arruda, M.; Kistler, D.J.; Wightman, F.L. Localization using non-individualized head-related transfer functions. J. Acoust. Soc. Am.
**1993**, 94, 111–123. [Google Scholar] [CrossRef] - Spagnol, S.; Purkhús, K.B.; Björnsson, S.K.; Unnthórsson, R. The Viking HRTF dataset. In Proceedings of the 16th Sound & Music Computing Conference (SMC 2019), Málaga, Spain, 28–31 May 2019; pp. 55–60. [Google Scholar]
- Yu, G.; Wu, R.; Liu, Y.; Xie, B. Near-field head-related transfer-function measurement and database of human subjects. J. Acoust. Soc. Am.
**2018**, 143, EL194–EL198. [Google Scholar] [CrossRef][Green Version] - Gupta, N.; Barreto, A.; Joshi, M.; Agudelo, J.C. HRTF database at FIU DSP Lab. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010; pp. 169–172. [Google Scholar]
- Xie, B.; Zhong, X.; Rao, D.; Liang, Z. Head-related transfer function database and its analyses. Sci. China Physics Mech. Astron.
**2007**, 50, 267–280. [Google Scholar] [CrossRef] - Algazi, V.R.; Duda, R.O.; Thompson, D.M.; Avendano, C. The CIPIC HRTF database. In Proceedings of the IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, New Platz, NY, USA, 21–24 October 2001; pp. 99–102. [Google Scholar]
- Stitt, P.; Katz, B.F.G. Sensitivity analysis of pinna morphology on head-related transfer functions simulated via a parametric pinna model. J. Acoust. Soc. Am.
**2021**, 149, 2559–2572. [Google Scholar] [CrossRef] - Spagnol, S.; Geronazzo, M.; Avanzini, F. On the Relation Between Pinna Reflection Patterns and Head-Related Transfer Function Features. IEEE Trans. Audio, Speech, Lang. Process.
**2012**, 21, 508–519. [Google Scholar] [CrossRef] - Zotkin, D.Y.N.; Hwang, J.; Duraiswaini, R.; Davis, L.S. HRTF personalization using anthropometric measurements. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 19–22 October 2003; pp. 157–160. [Google Scholar]
- Lopez-Poveda, E.A.; Meddis, R. A physical model of sound diffraction and reflections in the human concha. J. Acoust. Soc. Am.
**1996**, 100, 3248–3259. [Google Scholar] [CrossRef][Green Version] - Pollack, W.B.K.; Kreuzer, W.; Majdak, P. Perspective Chapter: Modern Acquisition of Personalised Head-Related Transfer Functions—An Overview. In Advances in Fundamental and Applied Research on Spatial Audio; IntechOpen: Rijeka, Croatia, 2022. [Google Scholar]
- Bomhardt, R. Anthropometric Individualization of Head-Related Transfer Functions. Analysis and Modeling. Ph.D. Thesis, RWTH Aachen University, Aachen, Germany, 2017. [Google Scholar]
- Brinkmann, F.; Dinakaran, M.; Pelzer, R.; Grosche, P.; Voss, D.; Weinzierl, S. A Cross-Evaluated Database of Measured and Simulated HRTFs Including 3D Head Meshes, Anthropometric Features, and Headphone Impulse Responses. J. Audio Eng. Soc.
**2019**, 67, 705–718. [Google Scholar] [CrossRef] - Jiang, Z.; Sang, J.; Zheng, C.; Li, A.; Li, X. Modeling individual head-related transfer functions from sparse measurements using a convolutional neural network. J. Acoust. Soc. Am.
**2023**, 153, 248–259. [Google Scholar] [CrossRef] - Gutierrez-Parera, P.; Lopez, J.J.; Mora-Merchan, J.M.; Larios, D.F. Interaural time difference individualization in HRTF by scaling through anthropometric parameters. EURASIP J. Audio Speech Music Process.
**2022**, 2022, 1–19. [Google Scholar] [CrossRef] - Yao, D.; Zhao, J.; Cheng, L.; Li, J.; Li, X.; Guo, X.; Yan, Y. An individualization approach for head-related transfer function in arbitrary directions based on deep learning. JASA Express Lett.
**2022**, 2, 064401. [Google Scholar] [CrossRef] [PubMed] - Grijalva, F.; Martini, L.; Florencio, D.; Goldenstein, S. A Manifold Learning Approach for Personalizing HRTFs from Anthropometric Features. IEEE/ACM Trans. Audio Speech Lang. Process.
**2016**, 24, 559–570. [Google Scholar] [CrossRef] - Xie, B.; Zhong, X.; He, N. Typical data and cluster analysis on head-related transfer functions from Chinese subjects. Appl. Acoust.
**2015**, 94, 1–13. [Google Scholar] [CrossRef] - Li, L.; Huang, Q. HRTF personalization modeling based on RBF neural network. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 3707–3710. [Google Scholar] [CrossRef]
- Chan, C.F.; Wang, Z. Hrir Customization Using Common Factor Decomposition and Joint Support Vector Regression. In Proceedings of the 21st European Signal Processing Conference, Marrakech, Morocco, 9–13 September 2013. [Google Scholar] [CrossRef]
- Kistler, D.J.; Wightman, F.L. A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction. J. Acoust. Soc. Am.
**1992**, 91, 1637–1647. [Google Scholar] [CrossRef] [PubMed] - Huang, Q.; Zhuang, Q. HRIR personalisation using support vector regression in independent feature space. Electron. Lett.
**2009**, 45, 1002–1003. [Google Scholar] [CrossRef] - Huang, Q.-H.; Fang, Y. Modeling personalized head-related impulse response using support vector regression. J. Shanghai Univ.
**2009**, 13, 428–432. [Google Scholar] [CrossRef] - Hu, H.; Zhou, L.; Ma, H.; Wu, Z. HRTF personalization based on artificial neural network in individual virtual auditory space. Appl. Acoust.
**2008**, 69, 163–172. [Google Scholar] [CrossRef] - Lee, G.W.; Kim, H.K. Personalized HRTF Modeling Based on Deep Neural Network Using Anthropometric Measurements and Images of the Ear. Appl. Sci.
**2018**, 8, 2180. [Google Scholar] [CrossRef][Green Version] - Chen, T.-Y.; Kuo, T.-H.; Chi, T.-S. Autoencoding HRTFS for DNN Based HRTF Personalization Using Anthropometric Features. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 271–275. [Google Scholar] [CrossRef]
- Zhang, M.; Ge, Z.; Liu, T.; Wu, X.; Qu, T. Modeling of Individual HRTFs Based on Spatial Principal Component Analysis. IEEE/ACM Trans. Audio, Speech, Lang. Process.
**2020**, 28, 785–797. [Google Scholar] [CrossRef][Green Version] - Bishop, C. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
- Martinez, R.F.; Lorza, R.L.; Delgado, A.A.S.; Piedra, N. Use of classification trees and rule-based models to optimize the funding assignment to research projects: A case study of UTPL. J. Inf.
**2021**, 15, 101107. [Google Scholar] [CrossRef] - Lostado-Lorza, R.; Escribano-Garcia, R.; Fernandez-Martinez, R.; Illera-Cueva, M.; Mac Donald, B.J. Using the finite element method and data mining techniques as an alternative method to determine the maximum load capacity in tapered roller bearings. J. Appl. Log.
**2017**, 24, 4–14. [Google Scholar] [CrossRef] - Martinez, R.F.; Iturrondobeitia, M.; Ibarretxe, J.; Guraya, T. Methodology to classify the shape of reinforcement fillers: Optimization, evaluation, comparison, and selection of models. J. Mater. Sci.
**2016**, 52, 569–580. [Google Scholar] [CrossRef] - Spagnol, S.; Miccini, R.; Unnthórsson, R. The Viking HRTF Dataset v2. Zenodo. 2020. Available online: https://doi.org/10.5281/zenodo.4160401 (accessed on 1 January 2023).
- Onofrei, M.G.; Miccini, R.; Unnthórsson, R.; Serafin, S.; Spagnol, S. 3D ear shape as an estimator of HRTF notch frequency. In Proceedings of the 17th Sound & Music Computing Conference (SMC 2020), Torino, Italy, 24–26 June 2020; pp. 131–137. [Google Scholar]
- Guo, Z.; Lu, Y.; Zhou, H.; Li, Z.; Fan, Y.; Yu, G. Anthropometric-based clustering of pinnae and its application in personalizing HRTFs. Int. J. Ind. Ergon.
**2021**, 81, 103076. [Google Scholar] [CrossRef] - Spagnol, S. HRTF Selection by Anthropometric Regression for Improving Horizontal Localization Accuracy. IEEE Signal Process. Lett.
**2020**, 27, 590–594. [Google Scholar] [CrossRef] - Nishino, T.; Inoue, N.; Takeda, K.; Itakura, F. Estimation of HRTFs on the horizontal plane using physical features. Appl. Acoust.
**2007**, 68, 897–908. [Google Scholar] [CrossRef] - Hair, J.F.; Black, W.C.; Babin, B.J.; Anderson, R.E. Multivariate Data Analysis, 8th ed.; Pearson: Andover, UK, 2019. [Google Scholar]
- Ripley, B. Pattern Recognition and Neural Networks; Cambridge University Press: Cambridge, UK, 1996. [Google Scholar]
- Funahashi, K.-I. On the approximate realization of continuous mappings by neural networks. Neural Netw.
**1989**, 2, 183–192. [Google Scholar] [CrossRef] - R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019; Available online: https://www.R-project.org/ (accessed on 1 January 2023).

**Figure 2.**Relation between HRIR and HRTF. Example: ‘A’ pinna, $\theta =190\xb0$, $\varphi =-45\xb0$.

**Figure 5.**Graphical correlation analysis between elevation angle, azimuth angle and frequency with the output, in this case, the amplitude of each frequency.

**Figure 6.**Graphic representation of one random selected HRTF of the used to test the models (Pinna ‘S’, $\varphi =250\xb0$, $\theta =-45\xb0$ ). Comparison between the real HRTF of the selected case, the predicted HRFT using LR, the predicted HRFT using MLP ANN, and the real HRTF measured with the standard pinna (Pinna ‘T’).

**Figure 7.**Graphic analysis of the residuals obtained in the prediction of the testing dataset using the model built based on the LR algorithm.

**Figure 8.**RMSE obtained during the training stage for the nonlinear model. Number of neurons and weight decay were tuned.

**Figure 9.**Analysis of the residuals obtained in the prediction of the testing dataset using the model built based on the multilayer perceptron artificial neural network algorithm.

Elevations [$\xb0$] | [−45, 45] | [50, 70] | [75, 85] | 90 |
---|---|---|---|---|

Step [$\xb0$] | 5 | 15 | 45 | 360 |

No. of azimuths | 72 | 24 | 8 | 1 |

Parameter | Definition | Units |
---|---|---|

${d}_{1}$ | Cavum conchae height | Millimeters |

${d}_{2}$ | Cymba conchae height | Millimeters |

${d}_{3}$ | Cavum conchae width | Millimeters |

${d}_{4}$ | Fossa height | Millimeters |

${d}_{5}$ | Pinna height | Millimeters |

${d}_{6}$ | Pinna width | Millimeters |

${d}_{7}$ | Intertragal incisures width | Millimeters |

${d}_{8}$ | Cavum conchae depth | Millimeters |

${d}_{9}$ | Physiognomic pinna length | Millimeters |

${d}_{10}$ | Pinna flaring distance | Millimeters |

${d}_{11}$ | Pinna posterior to tragus distance | Millimeters |

${\theta}_{1}$ | Pinna rotation angle | Euler degree |

${\theta}_{2}$ | Cavum conchae angle | Euler degree |

${\theta}_{3}$ | Pinna flare angle | Euler degree |

${\theta}_{4}$ | Pinna deflection angle | Euler degree |

Mean | SD | Min | Max | Percentiles | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|

5th | 10th | 25th | 50th | 75th | 90th | 95th | |||||

${d}_{1}$ | 18.71 | 3.06 | 10.96 | 22.96 | 14.34 | 15.11 | 17.47 | 18.78 | 20.33 | 22.69 | 22.88 |

${d}_{2}$ | 8.83 | 2.43 | 5.36 | 14.53 | 5.76 | 6.08 | 7.42 | 8.33 | 9.72 | 12.44 | 12.97 |

${d}_{3}$ | 18.59 | 3.37 | 13.43 | 25.91 | 13.83 | 14.07 | 16.31 | 18.09 | 19.95 | 23.12 | 23.90 |

${d}_{4}$ | 20.87 | 4.90 | 11.23 | 30.39 | 13.40 | 14.96 | 17.74 | 21.17 | 24.09 | 26.20 | 28.51 |

${d}_{5}$ | 68.03 | 6.15 | 54.66 | 81.95 | 60.90 | 61.53 | 64.28 | 68.23 | 70.40 | 73.42 | 79.18 |

${d}_{6}$ | 33.75 | 3.88 | 27.20 | 41.84 | 28.94 | 29.11 | 30.78 | 34.09 | 36.70 | 37.17 | 40.11 |

${d}_{7}$ | 7.25 | 1.38 | 5.32 | 10.19 | 5.52 | 5.70 | 6.03 | 7.07 | 8.00 | 9.23 | 9.32 |

${d}_{8}$ | 11.27 | 1.95 | 7.65 | 15.00 | 7.97 | 8.73 | 10.38 | 110.24 | 12.47 | 13.76 | 14.18 |

${d}_{9}$ | 66.64 | 6.03 | 53.16 | 79.33 | 58.65 | 58.98 | 63.74 | 67.28 | 69.07 | 71.88 | 77.94 |

${d}_{10}$ | 20.07 | 3.81 | 14.14 | 26.92 | 15.72 | 15.88 | 17.22 | 19.21 | 22.10 | 25.83 | 26.29 |

${d}_{11}$ | 27.92 | 5.33 | 17.98 | 37.06 | 20.61 | 20.79 | 24.32 | 28.93 | 30.58 | 34.96 | 35.22 |

${\theta}_{1}$ | 7.85 | 4.41 | 0.00 | 18.00 | 0.00 | 2.70 | 4.75 | 9.00 | 10.00 | 12.20 | 14.20 |

${\theta}_{2}$ | 25.95 | 6.64 | 14.00 | 42.00 | 15.90 | 19.60 | 21.75 | 25.50 | 30.00 | 32.50 | 37.25 |

${\theta}_{3}$ | 52.50 | 10.93 | 38.00 | 74.00 | 39.90 | 40.00 | 42.00 | 49.50 | 59.50 | 69.00 | 69.25 |

${\theta}_{4}$ | 38.70 | 9.93 | 24.00 | 59.00 | 24.95 | 25.00 | 31.00 | 40.00 | 43.00 | 49.60 | 55.20 |

**Table 4.**Tuned parameters during the training stage for each of the regression techniques applied in the analysis: brief definition and applied range.

Regression Technique | Parameters | Range |
---|---|---|

LR | no tuning parameters | - |

MLP ANN | size: number of units in the hidden layer | 1–20 |

decay: regularization parameter to avoid over-fitting | 0–0.1 |

**Table 5.**Results obtained from the ANOVA analysis for the predicted feature. Significant codes according to p-value: ‘***’ 0.001, ‘*’ 0.05, ‘ ’ 0.1.

Estimate | Std. Error | t Value | p Value | ||
---|---|---|---|---|---|

Intercept | 0.8738976 | 0.0012773 | 684.192 | $<2\times {10}^{-16}$ | *** |

${d}_{1}$ | −0.0114785 | 0.0013543 | −8.476 | $<2\times {10}^{-16}$ | *** |

${d}_{2}$ | −0.0077947 | 0.0005956 | −13.087 | $<2\times {10}^{-16}$ | *** |

${d}_{3}$ | −0.0084882 | 0.0007583 | −11.193 | $<2\times {10}^{-16}$ | *** |

${d}_{4}$ | 0.0190734 | 0.0009394 | 20.304 | $<2\times {10}^{-16}$ | *** |

${d}_{5}$ | −0.0045556 | 0.0055041 | −0.828 | 0.4079 | |

${d}_{6}$ | −0.0064500 | 0.0008496 | −7.592 | $3.15\times {10}^{-14}$ | *** |

${d}_{7}$ | 0.0082313 | 0.0006021 | 13.672 | $<2\times {10}^{-16}$ | *** |

${d}_{8}$ | 0.0288839 | 0.0015689 | 18.411 | $<2\times {10}^{-16}$ | *** |

${d}_{9}$ | −0.0131845 | 0.0051347 | −2.568 | $0.0102$ | * |

${d}_{10}$ | −0.0168657 | 0.0005806 | −29.050 | $<2\times {10}^{-16}$ | *** |

${d}_{11}$ | −0.0027780 | 0.0006965 | −3.989 | $<6.65\times {10}^{-5}$ | *** |

${\theta}_{1}$ | −0.0261202 | 0.0015868 | −16.461 | $<2\times {10}^{-16}$ | *** |

${\theta}_{2}$ | 0.0062978 | 0.0010993 | 5.729 | $<1.01\times {10}^{-8}$ | *** |

${\theta}_{3}$ | −0.0140979 | 0.0008816 | −15.991 | $<2\times {10}^{-16}$ | *** |

${\theta}_{4}$ | 0.0142957 | 0.0007572 | 18.881 | $<2\times {10}^{-16}$ | *** |

azimut | 0.1339585 | 0.0002315 | 578.635 | $<2\times {10}^{-16}$ | *** |

elevation | −0.0034120 | 0.0002842 | −12.005 | $<2\times {10}^{-16}$ | *** |

frequency | −0.2717950 | 0.0002313 | −1174.992 | $<2\times {10}^{-16}$ | *** |

Training | Testing | |||
---|---|---|---|---|

MAE (%) | RMSE (%) | MAE (%) | RMSE (%) | |

LR | 6.52 | 8.76 | 5.82 | 7.57 |

MLP ANN | 2.66 | 3.66 | 3.54 | 4.58 |

**Table 7.**Obtained results during the testing stage for the pinnae ‘R’ and ‘S’, pinnae that form the test dataset. Additionally, it is shown the error committed in the case of using the standard KEMAR pinnae (Pinna ‘T’) instead of the models.

Pinna R | Pinna S | |||
---|---|---|---|---|

MAE (%) | RMSE (%) | MAE (%) | RMSE (%) | |

LR | 5.84 | 7.62 | 5.81 | 7.52 |

MLP ANN | 4.11 | 5.28 | 2.98 | 3.88 |

Pinna T | 15.35 | 20.66 | 15.33 | 20.02 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fernandez Martinez, R.; Jimbert, P.; Sumner, E.M.; Riedel, M.; Unnthorsson, R.
Prediction of Head Related Transfer Functions Using Machine Learning Approaches. *Acoustics* **2023**, *5*, 254-267.
https://doi.org/10.3390/acoustics5010015

**AMA Style**

Fernandez Martinez R, Jimbert P, Sumner EM, Riedel M, Unnthorsson R.
Prediction of Head Related Transfer Functions Using Machine Learning Approaches. *Acoustics*. 2023; 5(1):254-267.
https://doi.org/10.3390/acoustics5010015

**Chicago/Turabian Style**

Fernandez Martinez, Roberto, Pello Jimbert, Eric Michael Sumner, Morris Riedel, and Runar Unnthorsson.
2023. "Prediction of Head Related Transfer Functions Using Machine Learning Approaches" *Acoustics* 5, no. 1: 254-267.
https://doi.org/10.3390/acoustics5010015