Next Article in Journal
A Review of Colonoscopy in Intestinal Diseases
Next Article in Special Issue
Idiopathic Pulmonary Hemorrhage in Infancy: A Case Report and Literature Review
Previous Article in Journal
Pitfalls of Using NIR-Based Clinical Instruments to Test Eyes Implanted with Diffractive Intraocular Lenses
Previous Article in Special Issue
Hypoplastic Left Heart Syndrome: About a Postnatal Death
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Beyond Henssge’s Formula: Using Regression Trees and a Support Vector Machine for Time of Death Estimation in Forensic Medicine

1
Department of Forensic Medicine, University of Pécs Medical School, Szigeti út 12, H-7624 Pécs, Hungary
2
Bánki Donát Faculty of Mechanical and Safety Engineering, Óbuda University, Népszínház u. 8, H-1081 Budapest, Hungary
*
Author to whom correspondence should be addressed.
Diagnostics 2023, 13(7), 1260; https://doi.org/10.3390/diagnostics13071260
Submission received: 9 March 2023 / Revised: 22 March 2023 / Accepted: 26 March 2023 / Published: 27 March 2023
(This article belongs to the Special Issue Diagnostic Methods in Forensic Pathology)

Abstract

:
Henssge’s nomogram is a commonly used method to estimate the time of death. However, uncertainties arising from the graphical solution of the original mathematical formula affect the accuracy of the resulting time interval. Using existing machine learning techniques/tools such as support vector machines (SVMs) and decision trees, we present a more accurate and adaptive method for estimating the time of death compared to Henssge’s nomogram. Using the Python programming language, we built a synthetic data-driven model in which the majority of the selected tools can estimate the time of death with low error rates even despite having only 3000 training cases. An SVM with a radial basis function (RBF) kernel and AdaBoost+SVR provided the best results in estimating the time of death with the lowest error with an estimated time of death accuracy of approximately ±20 min or ±9.6 min, respectively, depending on the SVM parameters. The error in the predicted time ( t p [h]) was t p ± 0.7 h with a 94.45% confidence interval. Because training requires only a small quantity of data, our model can be easily customized to specific populations with varied anthropometric parameters or living in different climatic zones. The errors produced by the proposed method are a magnitude smaller than any previous result.

1. Introduction

The determination of the post mortem interval (PMI) is one of the oldest questions in forensic medicine, which has posed major challenges for experts since its inception and remains the focus of significant research. Both mathematical [1,2,3,4] and nonmathematical methods are used to address the problem [5]. The process of changing the temperature of the body involves a complicated interplay of various biological processes and factors, yet it is nevertheless characterized by physical laws. Early studies suggested that the Newtonian cooling law was unsuitable for mathematically characterizing the process because the cooling curve is sigmoidal rather than exponential due to a plateau phase [6,7,8]. The Marshall–Hoare formula, which was created empirically and contains a linear combination of two exponential functions [9,10,11], can be used to describe this sigmoidal curve. As it is a transcendental equation, if we want to determine the PMI, we can solve it numerically or graphically with the help of Henssge’s nomogram [12,13,14,15] or another simplified graphical solution [16]. Subsequent studies required the extension of the Marshall–Hoare formula with a weight-related correction factor [12,17,18,19,20], known as the Henssge formula, allowing for a more precise estimation of the PMI.
The mathematical description of the process has not changed since the introduction of the Henssge formula; however, multiple solutions for fitting empirical data using diverse methodologies have been developed: nonlinear least squares [21], conditional probability [22], Bayesian estimation [23,24], finite element simulation [25], Laplace transformation [26], numerical simulations [27,28,29], and neural networks [30]. A triple exponential model was another strategy to take into account; however, it did not yield the desired outcomes [31,32]. Brute-force calculations [33,34], heat-transfer modeling [25,35,36], the evaluation of a back-calculation [37], and a computational approximation in PHP [38] are other examples of unique approaches.
Machine learning is widely used in numerous fields of medical diagnostics and prognostics [39,40,41,42]. One possible method of estimating parameters is finding them by using the method of linear regression. We can choose from various mathematical tools to do so, such as different regression methods, decision trees [43], or applying different neural networks, in which the goal is to find an approximation to the model function belonging to a given learning set. A support vector machine (SVM) can be considered a special neural network, which is a supervised learning method that can have different kernel functions for its decision function [44,45,46]. The objective of the kernel method is to convert the original problem into a linearly solvable one. With its use, the data describing the problem to be solved are transformed into the kernel space through the application of nonlinear transformations, such as radial basis functions (RBF). The aim of our study was to analyze the accuracy of several different regression methods (decision tree [47], random forests [48], extra trees, bagging, AdaBoost, SVM, AdaBoost + SVM) and their combination in solving the aforementioned mathematical problem using the Python programming language.
The motivation behind this work comes from our desire to support the work of forensic experts by developing a modernized, flexible, and adaptive method that utilizes existing machine learning tools to enable a more accurate estimation of the PMI using present day training data than the commonly used Henssge nomogram and that can adapt to the constantly changing population.

2. Materials and Methods

The Henssge formula, and its graphical solution, the Henssge nomogram, are commonly used in methods to estimate the PMI:
T r T a T 0 T a = A · exp B t + 1 A · exp A B A 1 t ,
where T r and T a are the rectal and environmental temperatures, respectively, measured at time t, T 0 = 37.2   ° C is a constant representing the rectal temperature commonly assumed at the time of death. In the formula, A and B are parameters obtained empirically [17]. The value of the parameter A depends on the environmental temperature (Table 1).
The parameter B includes the body weight (m).
B = 1.2815 · m 0.625 + 0.0284
The Henssge formula in the two temperature ranges is as follows:
For T a 23.2   ° C ,
T r T a 37.2 T a = 1.25 · exp B t 0.25 · exp 5 B t ;
for T a 23.3   ° C ,
T r T a 37.2 T a = 1.11 · exp B t 0.11 · exp 10 B t .
Any proper mathematical model should be capable of handling the uncertainties that can affect the accuracy of the resulting time of death, the estimate of which is based on the Henssge formula. As can be seen in Henssge’s nomogram, uncertainty can be caused by various factors, including the correction factor [49], body weight [12], and a variable environmental temperature and humidity [50]. From a practical point of view, a basic source of error may be the incorrect size ratio of the printed nomogram [51]. The Henssge nomogram graphically handles the uncertainties which can affect the accuracy of the determined PMI, while data-based models incorporate these uncertainties within themselves.

2.1. Data-Driven Model

The purpose of creating the data-driven model was to examine the estimation of the PMI using other mathematical methods. We decided to use decision trees (regression trees) and an SVM with an RBF kernel. Our model relied on the assumption that the generated data from which the system learned closely resembled reality. To create the model and perform the calculations, we used the Python programming language to generate data for learning and testing, which formed the basis of the theoretical model. We chose various regression trees and an SVM from the scikit-learn [52] package.

2.1.1. The Generation of Data and Test Data

For training and testing the regression trees and the SVM, we used generated data. For each parameter that is required for the calculation using the Henssge formula, we randomly selected from a predetermined list of values. These were as follows:
  • Time (h): 1–18, with a step of 0.5 h.
  • Ambient temperature (   ° C ): −10–35, in increments of 0.5   ° C .
  • Correction factor: 0.7, 0.9, 1.0, 1.1, 1.2,1.3, 1.4, based on Table 5 in [49]
  • Body weight (kg): between 50 and 100 kg, with a precision of 0.5 kg, drawn from a normal distribution with postselection (mean of 70 kg with σ large enough to generate an appropriate quantity of test data close to the upper limit).
  • Rectal temperature (   ° C ). Based on the randomly selected data described above, it was calculated from the Henssge formula according to Algorithm 1 which uses Algorithm 2.
  • The number of desired data points, which is an approximate value, since some of the weights drawn from a normal distribution were outside of the desired range and therefore were not considered in either the training or test data sets.
According to the literature [49], certain restrictions were taken into account when creating the data:
(1)
The ambient temperature must not be higher than the measured rectal temperature. Since the rectal temperature was calculated during data generation, this case could not occur.
(2)
For the correction factors, the value does not need to be adjusted based on weight until 1.4. Beyond this value, it must be corrected, but our model is currently not set up for this (see Table 5 in [49]).
(3)
In the case of the weight, the selection of the lower and upper limits was again based on the Table 5 in [49]. The selection of the 70 kg average was also based on Table 5 in [49].
Steps for generating the training and test data:
(1)
Randomly select one parameter set (weight, correction factor, environmental temperature) from the required sets of parameters for the Henssge formula.
(2)
Determine the rectal temperature by evaluating the Henssge formula.
In the pseudocode of Algorithm 3, the input parameter “count” is an integer that roughly determines the number of generated data points, as values outside the lower and upper bounds of the weight given by the normal distribution were also generated, but they were not used for training and testing.
The samples were separated into training and test data by dividing the total generated data set ( X , y ) into two parts. Usually, these parts are comprised of different percentages of data, with one part being used as the data to train the model, and the other part being used as the test data to evaluate the performance of the model. X is an n × 4 dimensional matrix, where n is the number of actual data points that meet the conditions, and it contains the selected and calculated parameters in the following order m , c f , T r , T a y is an n-dimensional vector, where y i , i = 1 n contains the randomly selected expected time interval for X i . During the division, a prespecified percentage of the total data set was chosen for the test data and the rest was used for training.
Algorithm 1: Calculating rectal temperature.
Diagnostics 13 01260 i001
Algorithm 2: Body weight adjusted by correction factor.
Diagnostics 13 01260 i002
Algorithm 3: Generating training data and test data.
Diagnostics 13 01260 i003

2.1.2. Training

We began the process of training our selected regression models by utilizing the partial sample X t r a i n , y t r a i n . This data set served as the input for our model training, which was performed using several different approaches.
One of the approaches we utilized was decision (regression) trees, including bagging, random forests, and extremely randomized trees. These techniques have been shown to be effective at modeling complex relationships between variables and making predictions in a variety of scenarios. In addition to the decision trees, we also used support vector regression (SVR) with a radial basis function (RBF) kernel. This is a powerful method that has been shown to be effective at modeling nonlinear relationships between variables. To further refine the results obtained from our extremely randomized trees and SVM, we applied a tree modified with an adaptive boosting method. This allowed us to improve the accuracy and precision of our model predictions.
Once our models had been successfully trained and optimized, we saved them for later use. This ensured that we could quickly and easily access our models and use them to make predictions in new scenarios, without having to repeat the time-consuming and computationally expensive training process.

2.1.3. Testing

We tested the trained model with X t e s t , y t e s t subsets. The performance of the model was determined based on the mean absolute error (MAE), mean squared error (MSE), and coefficient of determination ( R 2 ) values (see below). The results of the runs can be found in Appendix A Table A1Table A6.

2.1.4. Error Calculation

There are several mathematical tools available for determining the prediction error. Let N be the size of the sample and y ^ i = β ^ 0 + β ^ 1 x i the estimated value of y i . The residual for the ith observation is defined as e i = y i y ^ i , that is, the difference between the expected value y i and the estimated value for the ith observation.

Sum of Squared Residuals (SSR)

In most cases, we minimized the sum of squared residuals (least-squares method).
S S R = e 1 2 + e 2 2 + + e N 2 = i = 1 N y i y ^ i 2

Mean Squared Error (MSE)

The average of the squares of the differences between the estimated values and the actual values.
M S E = i = 1 N y i y ^ i 2 N

Mean Absolute Error (MAE)

The average of the absolute differences between the estimated and actual values.
M A E = i = 1 N y i y ^ i N

Coefficient of Determination ( R 2 )

R 2 is a statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s) in a regression model. R 2 ranges from 0 to 1, where 0 indicates that the model explains none of the variance in the dependent variable, and 1 indicates that the model explains all of the variance.
R 2 = T S S S S R T S S ,
where
T S S = i = 1 N y i 1 N i = 1 N y i 2 = i = 1 N y i y ¯ 2
is the sum of squared errors, where y ¯ is the mean value of the given data set.

3. Results

The accuracy of the mathematical model we used for estimating the PMI depended on the proper choice of the relatively large number of adjustable parameters. We considered a choice proper, if we obtained it through a learning process and the resulting model gave meaningful estimates in cases that were similar to those it had already encountered. If we tested a case that fell outside of the domain determined by the learned data, then the estimation error was supposed to grow. The version in this paper used the most commonly used correction factors from the set of all correction factors. The choice of environmental temperature range was based on the Henssge nomograms. One important factor influencing the quality of the generated data was the mean value (70 kg) and standard deviation ( σ ) of the normal distribution of the body weight, which determined the width of the Gaussian curve, which, in turn, set the range for the random weights obtained. When  σ was very small, <5, the generated weights were in a very small range with a very high probability, but if σ was chosen large enough, the data were selected from a larger set. The goal was to have enough data for training with weights between 50 and 100 kg. We determined the σ through multiple trials and for 11 , 000 generated data, and we found that σ = 10 was already sufficient.
The data were generated such that for each body weight, we were randomly selecting T a , the correction factor, and the expected time of death using a uniform distribution. Then, T r was calculated based on these values.
In order to examine the results of the theoretical model, we trained and tested the model using a variety of number of cases and methods, so as to find the tool working with the smallest error for solving the problem.
We designated 25% of the generated data as test data and utilized the remainder for training the system. As the foundation for the theoretical model, we employed various regression tools with differing configurations and sought out the best parameterization for each tool individually. Following this, by using the mathematical tools in combination, we further improved the results obtained. The methods investigated were as follows:
  • Regression tree;
  • Random forests;
  • Extremely randomized trees;
  • Tree modified with the bagging method;
  • SVR with an RBF kernel;
  • SVR improved with adaptive boosting.

Results of Training

The training and estimation time, the errors (MAE, MSE, R 2 ), and the best parameterization of the regression tools tested with various parameterizations are presented in the tables in Appendix A Table A1Table A6 for each method. The number of generated data was increased by a thousand, minus the number of cases that did not fall within the determined range of 50–100 kg.
Based on the results, it can be concluded that with a larger training data set, all methods were capable of estimating the time of death with a decreasing error, as shown in the graphs in Figure 1. According to both the MAE, MSE, and the R 2 ( 1 ) values, the best result was achieved by the combined use of SVR and an adaptive regression tree, as this method further improved the results obtained by SVR [53,54]. Since in the used Python implementation, the C parameter represents the compromise between minimizing false classification errors and maximizing the decision boundary, meaning the higher the value of C, the fewer the false classifications and the stricter the decision margin, we performed four additional control runs with higher C values (10, 20, 50, 100) to check the accuracy of the SVR estimate when improved by adaptive boosting for these four cases as well. The results of these were divided into two parts, first for SVR alone, and then for the improved results using the adaptive boosting method.
According to Figure 1, it can be observed that most selected mathematical tools were able to estimate the time of death with low error rates even with a minimum of 3000 training examples, based on the current settings. However, the decision tree was an exception, as it still produced high errors compared to the others, even with over 10,000 data points.
The results obtained with SVR and AdaBoost + SVR models using the parameters C = 50 and C = 100 at at a sample size of approximately 11,000 were as follows: Based on Table 2 and Table 3, it can be concluded that increasing the value of C further improved the achieved results. By breaking down the average error of the 25% of test data into correction factors with a 5 kg binning, we determined for the cases of C = 5 and C = 100 (see in Figure 2) that in the former case, the error was approximately ± 0.3 h = 20 min, and in the latter case, the two worst results were approximately ± 0.16 h = ±9.6 min, but the average errors were below 4 min.
After comparing the results of various selected methods, it can be concluded that the two best results were obtained with SVR and AdaBoost + SVR, as can be seen in Figure 3 and Table 4. This is due to the fact that these two methods had the most test results within 1 σ of the mean.

4. Discussion

The Henssge formula and its graphical solution, the Henssge nomogram, are commonly used to estimate the PMI. However, uncertainties—including the correction factor, body weight, and variable environmental conditions—can affect the accuracy of the resulting time interval. The Henssge nomogram handles these uncertainties graphically, while our model incorporated these uncertainties within itself. In other words, our model did not require the use or knowledge of the Marshall–Hoare or Henssge formula with correction factors, and it did not contain any empirical variables. The generated data closely resembled the reality and formed the basis of our theoretical model. Our data-driven model showed that an SVM with an RBF kernel and the AdaBoost + SVR method provided the best results in estimating the time of death with the lowest error. The estimated accuracy of the time of death was approximately within ±20 min or ±9.6 min, depending on the SVM parameters used. The predicted time error was t p ± 0.7 h with a 94.45% confidence interval. When compared to the Henssge nomogram, where the accuracy was claimed to be ±2.8 h for both temperature ranges when correction factors were applied, it can be concluded that the created model was capable of estimating the time of death with a sufficient accuracy while taking into account the constraints based on the learned data set. The significant differences and errors arose from the fact that the model encountered these cases with fewer samples during the learning process, but they still fell within the accuracy zone determined by the nomogram. The current limitations of the theoretical model are the number of correction factors, a maximum time interval of 18 h, the need for the training data to be provided with an accuracy of 30 min, and a body weight limited to 50–100 kg. Based on the results presented in Figure 1, it can be inferred that most of the mathematical tools used in this study were able to accurately estimate the time of death with relatively low error rates, even with a minimum of 3000 training examples under the current settings. One notable advantage of our models was that they required very few data for training, which means that they can be applied in various geographical regions, including smaller areas. This feature makes the model highly versatile and adaptable to specific populations with differing anthropometric characteristics or living in different climate zones, because it can be trained with real, available data. Moreover, the model can be easily adapted to suit one’s needs, making it an ideal tool for a range of settings and situations.
Most articles on this topic determine the PMI using basic physics or numerical calculations. However, these results cannot be compared to ours because we used the Henssge formula to generate synthetic data. To the best of our knowledge, there is only one paper that used neural networks (multilayer feedforward networks) to address this problem [30]. Zerdazi and coworkers constructed a network using MATLAB 2012 with two layers. The first layer, called the hidden layer, contained 10 neurons, each using the hyperbolic tangent as an activation function. The second layer, known as the output layer, had only one neuron, which employed a linear activation function.
Our method achieved much better results with 257 cases than Henssge’s solution. While our model used a different machine learning approach, we can compare our results to theirs (see Table 5 and Table 6). To obtain the best comparability, we trained our model again with the same features as those described in that paper [30]. Two scenarios were investigated with common features: an environmental temperature ranging from 4.5   ° C to 18   ° C, 20% of data for validation, and 20% for testing.
Scenario 1: 275 observations, time of death between 20 min and 18 h. The results obtained are as follows:
Table 5. Comparison of our SVM and AdaBoost + SVM results with the result from the multilayer feedforward network (neural method) by Zerdazi et al. [30] in case of Scenario 1.
Table 5. Comparison of our SVM and AdaBoost + SVM results with the result from the multilayer feedforward network (neural method) by Zerdazi et al. [30] in case of Scenario 1.
NameMAEMSE
Neural method1.855.69
SVR0.170.14
AdaBoost + SVR0.170.12
Scenario 2: 184 observations, time of death less than 7 h. The results obtained are as follows:
Table 6. Comparison of our SVM and AdaBoost + SVM results with the result of the multilayer feedforward network (neural method) by Zerdazi et al. [30] in case of Scenario 2.
Table 6. Comparison of our SVM and AdaBoost + SVM results with the result of the multilayer feedforward network (neural method) by Zerdazi et al. [30] in case of Scenario 2.
NameMAEMSE
Neural method0.861.21
SVR0.180.08
AdaBoost + SVR0.140.05
From these results, we can conclude that the proposed method performed much better: all the errors were at least one order of magnitude smaller than the results in [30].
Further testing with real data is needed. The future goal is to develop a phone or web application based on the model with a graphical interface for easier use and to create a database for storing anonymized training data.

5. Conclusions

Our research demonstrated that the estimated PMIs produced by our models using existing machine learning tools such as SVMs and decision trees were far more satisfactory than those produced by the Henssge formula or the method utilizing neural networks. In contrast to traditional mathematical methods, including the Henssge nomogram, that yield fixed formulas and whose performance remains constant, our models can be continuously improved because training can be resumed whenever new additional data are available. As our models estimated the PMI with low error rates even with only 3000 training cases, they can be easily adapted to specific populations with different characteristics or living in different climatic zones.

Author Contributions

Conceptualization, L.M.D., D.T. and A.B.F.; methodology, A.B.F.; software, L.M.D.; validation, L.M.D., D.T. and A.B.F.; writing—original draft, L.M.D.; writing—review and editing, L.M.D., D.T., A.B.F. and Z.K.; visualization, L.M.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code of our model is available: https://github.com/livdan/TOD (accessed on 6 March 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MAEMean absolute error
MSEMean squared error
PMIPost mortem interval
R 2 Coefficient of determination
RBFRadial basis function
SSRSum of squared residuals
SVMSupport vector machine
SVRSupport vector regression
TSSSum of squared errors

Appendix A

Table A1. Results of running the model with: DecisionTreeRegressor.
Table A1. Results of running the model with: DecisionTreeRegressor.
(#)Case NumberTraining Time (s)Prediction Time (s)MAEMSER2Best Parameters
19863.2100.0281.44944.10320.8285random_state = 10
219554.0790.0351.11862.79040.8814criterion = ’friedman_mse’
329254.7730.0851.01432.25650.9114criterion = ’friedman_mse’, random_state = 10
439093.4100.0661.00922.24390.9048criterion = ’poisson’, random_state = 10
548813.9940.0840.95331.84680.9199criterion = ’friedman_mse’, random_state = 50
658643.7570.0940.91681.73430.9275criterion = ’friedman_mse’, random_state = 10
768065.2990.1060.85151.47680.9386default
878245.2100.1220.84361.53940.9380criterion = ’friedman_mse’, random_state = 100
987854.7530.1390.78541.23110.9476random_state = 50
1097595.2690.1500.78421.27240.9480criterion = ’friedman_mse’, random_state = 50
11107554.9950.1660.75221.24890.9505criterion = ’poisson’, random_state = 25
12117086.8110.1840.73711.16540.9523criterion = ’poisson’, random_state = 10
Table A2. Results of running the model with: RandomForestRegressor.
Table A2. Results of running the model with: RandomForestRegressor.
(#)Case NumberTraining Time (s)Prediction Time (s)MAEMSER2Best Parameters
198639.1882.7490.96621.79660.9249criterion = ’poisson’, max_features = None, n_estimators = 200, random_state = 10
2195561.6482.6300.70921.00640.9572max_features = None, random_state = 25
32925106.1748.6610.67970.97130.9619criterion = ’poisson’, max_features = None, n_estimators = 200, random_state = 25
43909120.7019.0310.61100.80320.9659criterion = ’poisson’, max_features = None, n_estimators = 150, random_state = 25
54881148.61714.4660.57610.64450.9720criterion = ’poisson’, max_features = None, n_estimators = 200, random_state = 25
65864154.82015.0390.53700.57590.9759max_features = None, n_estimators = 200, random_state = 50
76806180.88715.6590.48010.47740.9802criterion = ’friedman_mse’, max_features = None, n_estimators = 150
87824207.77215.0760.49670.50810.9795criterion = ’poisson’, max_features = None, n_estimators = 150
98785266.93324.5480.45100.42710.9818criterion = ’poisson’, max_features = None, n_estimators = 200
109759342.60431.5340.46080.44860.9817criterion = ’poisson’, max_features = None, n_estimators = 200, random_state = 10
1110755401.34931.7060.42520.38160.9849criterion = ’friedman_mse’, max_features = None, n_estimators = 200, random_state = 10
1211708344.10129.8970.41340.36250.9852criterion = ’poisson’, max_features = None, n_estimators = 200, random_state = 10
Table A3. Results of running the model with: ExtraTreesRegressor.
Table A3. Results of running the model with: ExtraTreesRegressor.
(#)Case NumberTraining Time (s)Prediction Time (s)MAEMSER2Best Parameters
198612.3550.8830.84221.39350.9417max_features = None, n_estimators = 50
2195524.2245.9540.58490.70860.9699criterion = ’friedman_mse’, max_features = None, n_estimators = 200
3292529.0649.1080.59090.76530.9700criterion = ’friedman_mse’, max_features = None, n_estimators = 200
4390940.33113.0090.51080.59830.9746criterion = ’friedman_mse’, max_features = None, n_estimators = 200
5488148.34510.6110.48960.50110.9783criterion = ’poisson’, max_features = None, n_estimators = 125
6586460.53917.5680.45680.46080.9807criterion = ’poisson’, max_features = None, n_estimators = 200
7680667.47411.4640.41430.39830.9834criterion = ’poisson’, max_features = None, n_estimators = 125
8782481.66523.1570.40520.37980.9847max_features = None, n_estimators = 200
98785103.04115.6190.37450.33720.9856criterion = ’poisson’, max_features = None, n_estimators = 125
109759132.05932.4120.36450.32180.9869criterion = ’friedman_mse’, max_features = None, n_estimators = 200
1110755144.70133.2630.34190.27450.9891max_features = None, n_estimators = 200
1211708139.60430.4510.34370.28500.9883max_features = None, n_estimators = 200
Table A4. Results of running the model with: BaggingRegressor.
Table A4. Results of running the model with: BaggingRegressor.
(#)Case NumberTraining Time (s)Prediction Time (s)MAEMSER2Best Parameters
19862.1761.5930.99071.86380.9221n_estimators = 100
219553.5154.3600.67960.94200.9600n_estimators = 125
329254.1984.9130.69611.00270.9606n_estimators = 100
439096.98913.2310.61930.83440.9646n_estimators = 200
548817.11110.5650.57260.64190.9722n_estimators = 125
658647.98918.1090.53870.58200.9757n_estimators = 200
768066.82210.7430.48570.48620.9798n_estimators = 100
878249.39024.0570.48590.49120.9802n_estimators = 200
987859.69519.8550.45920.44410.9811n_estimators = 150
1097599.83017.6220.46160.45240.9815n_estimators = 150
111075511.06424.3760.43130.38700.9847n_estimators = 150
121170811.20824.2290.41530.36290.9851n_estimators = 200
Table A5. Results of running the model with: SVR.
Table A5. Results of running the model with: SVR.
(#)Case NumberTraining Time (s)Prediction Time (s)MAEMSER2Best Parameters
19860.7490.0700.94062.53320.8941C = 5, epsilon = 0.005, gamma = 1
219553.1840.1630.57791.23090.9477C = 5, epsilon = 0.01, gamma = 1
329255.6120.3330.47740.83720.9671C = 5, epsilon = 0.01, gamma = 1
439099.6110.3600.42570.90120.9618C = 5, epsilon = 0.05, gamma = 1
5488113.4040.6260.39870.73100.9683C = 5, epsilon = 0.05, gamma = 1
6586418.9820.6120.38730.64300.9731C = 5, epsilon = 0.05, gamma = 1
7680628.3621.0580.31810.46210.9808C = 5, epsilon = 0.01, gamma = 1
8782428.2120.8240.33720.48310.9805C = 5, gamma = 1
9878536.9481.1100.28820.38190.9837C = 5, epsilon = 0.05, gamma = 1
10975950.2281.3180.30930.45500.9814C = 5, epsilon = 0.05, gamma = 1
111075558.9581.6510.27630.33060.9869C = 5, epsilon = 0.01, gamma = 1
121170870.5282.3180.27530.33880.9861C = 5, epsilon = 0.01, gamma = 2
Table A6. Results of running the model with:AdaBoostRegressor + SVR.
Table A6. Results of running the model with:AdaBoostRegressor + SVR.
(#)Case NumberTraining Time (s)Prediction Time (s)MAEMSER2Best Parameters
198630.9031.4010.90262.12450.9112loss = ’exponential’, n_estimators = 20, random_state = 15
21955132.3682.5910.54370.80680.9657loss = ’exponential’, n_estimators = 20, random_state = 15
32925262.2883.7750.42430.46300.9818loss = ’square’, n_estimators = 20
43909474.1967.8630.35750.43670.9815loss = ’exponential’, n_estimators = 20
54881737.6719.7020.33580.31340.9864loss = ’exponential’, n_estimators = 20, random_state = 15
658641163.73013.0940.31900.25590.9893loss = ’exponential’, n_estimators = 20, random_state = 25
768061507.81616.3510.26190.19620.9918loss = ’exponential’, n_estimators = 20, random_state = 10
878241778.71719.7380.26670.17640.9929loss = ’exponential’, n_estimators = 20
987852378.61425.0490.23250.14340.9939loss = ’exponential’, n_estimators = 20, random_state = 25
1097593199.34531.6560.26070.17880.9927n_estimators = 20, random_state = 25
11107553350.83033.0120.21020.11150.9956loss = ’exponential’, n_estimators = 20, random_state = 25
12117084460.48144.0180.21770.13230.9946loss = ’square’, n_estimators = 20, random_state = 2

References

  1. Knight, B. The evolution of methods for estimating the time of death from body temperature. Forensic Sci. Int. 1988, 36, 47–55. [Google Scholar] [CrossRef] [PubMed]
  2. Nokes, L.; Flint, T.; Williams, J.; Knight, B. The application of eight reported temperature-based algorithms to calculate the postmortem interval. Forensic Sci. Int. 1992, 54, 109–125. [Google Scholar] [CrossRef] [PubMed]
  3. Madea, B. Methods for determining time of death. Forensic Sci. Med. Pathol. 2016, 12, 451–485. [Google Scholar] [CrossRef] [PubMed]
  4. Laplace, K.; Baccino, E.; Peyron, P.A. Estimation of the time since death based on body cooling: A comparative study of four temperature-based methods. Int. J. Leg. Med. 2021, 135, 2479–2487. [Google Scholar] [CrossRef]
  5. Mathur, A.; Agrawal, Y. An overview of methods used for estimation of time since death. Aust. J. Forensic Sci. 2011, 43, 275–285. [Google Scholar] [CrossRef]
  6. Rainy, H. On the cooling of dead bodies as indicating the length of time since death. Glasg. Med. J. 1868, 1, 323–330. [Google Scholar]
  7. Brown, A.; Marshall, T. Body temperature as a means of estimating the time of death. Forensic Sci. 1974, 4, 125–133. [Google Scholar] [CrossRef]
  8. Al-Alousi, L.M. A study of the shape of the post-mortem cooling curve in 117 forensic cases. Forensic Sci. Int. 2002, 125, 237–244. [Google Scholar] [CrossRef]
  9. Marshall, T.K. Estimating the time since death - the rectal cooling after death and its mathematical representation. J. Forensic Sci. 1962, 7, 56–81. [Google Scholar]
  10. Marshall, T.K. The use of the cooling formula in the study of post mortem body cooling. J. Forensic Sci. 1962, 7, 189–210. [Google Scholar]
  11. Marshall, T.K. The use of body temperature in estimating the time of death. J. Forensic Sci. 1962, 7, 211–221. [Google Scholar]
  12. Henssge, C. Death time estimation in case work. I. The rectal temperature time of death nomogram. Forensic Sci. Int. 1988, 38, 209–236. [Google Scholar] [CrossRef]
  13. Henssge, C.; Madea, B.; Gallenkemper, E. Death time estimation in case work. II. Integration of different methods. Forensic Sci. Int. 1988, 39, 77–87. [Google Scholar] [CrossRef] [PubMed]
  14. Henßge, C.; Madea, B. Estimation of the time since death in the early post-mortem period. Forensic Sci. Int. 2004, 144, 167–175. [Google Scholar] [CrossRef] [PubMed]
  15. Leinbach, C. Beyond Newton’s law of cooling - estimation of time since death. Int. J. Math. Educ. Sci. Technol. 2011, 42, 765–774. [Google Scholar] [CrossRef]
  16. Potente, S.; Kettner, M.; Verhoff, M.; Ishikawa, T. Minimum time since death when the body has either reached or closely approximated equilibrium with ambient temperature. Forensic Sci. Int. 2017, 281, 63–66. [Google Scholar] [CrossRef]
  17. Henßge, C. Die Präzision von Todeszeitschätzungen durch die mathematische Beschreibung der rektalen Leichenabkühlung. Z. FÜR Rechtsmed. 1979, 83, 49–67. [Google Scholar] [CrossRef]
  18. Henssge, C. Todeszeitschätzungen durch die mathematische Beschreibung der rektalen Leichenabkühlung unter verschiedenen Abkühlungsbedingungen. Z. FüR Rechtsmed. 1981, 87, 147–178. [Google Scholar] [CrossRef]
  19. Hubig, M.; Muggenthaler, H.; Sinicina, I.; Mall, G. Body mass and corrective factor: Impact on temperature-based death time estimation. Int. J. Legal. Med. 2011, 125, 437–444. [Google Scholar] [CrossRef]
  20. Potente, S.; Kettner, M.; Ishikawa, T. Time since death nomographs implementing the nomogram, body weight adjusted correction factors, metric and imperial measurements. Int. J. Leg. Med. 2019, 133, 491–499. [Google Scholar] [CrossRef]
  21. Rodrigo, M.R. A Nonlinear Least Squares Approach to Time of Death Estimation Via Body Cooling. J. Forensic Sci. 2016, 61. [Google Scholar] [CrossRef] [PubMed]
  22. Biermann, F.M.; Potente, S. The deployment of conditional probability distributions for death time estimation. Forensic Sci. Int. 2011, 210, 82–86. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Hubig, M.; Muggenthaler, H.; Mall, G. Conditional probability distribution (CPD) method in temperature based death time estimation: Error propagation analysis. Forensic Sci. Int. 2014, 238, 53–58. [Google Scholar] [CrossRef]
  24. Giana, F.E.; Onetto, M.A.; Pregliasco, R.G. Uncertainty in the estimation of the postmortem interval based on rectal temperature measurements: A Bayesian approach. Forensic Sci. Int. 2020, 317, 110505. [Google Scholar] [CrossRef]
  25. Bartgis, C.; LeBrun, A.; Ma, R.; Zhu, L. Determination of Time of Death in Forensic Science via a 3-D Whole Body Heat Transfer Model. J. Therm. Biol. 2016, 62, 109–115. [Google Scholar] [CrossRef] [PubMed]
  26. Rodrigo, M.R. Time of death estimation from temperature readings only: A Laplace transform approach. Appl. Math. Lett. 2015, 39, 47–52. [Google Scholar] [CrossRef]
  27. Muñoz-Barús, J.I.; Rodríguez-Calvo, M.S.; Suárez-Peñaranda, J.M.; Vieira, D.N.; Cadarso-Suárez, C.; Febrero-Bande, M. PMICALC: An R code-based software for estimating post-mortem interval (PMI) compatible with Windows, Mac and Linux operating systems. Forensic Sci. Int. 2010, 194, 49–52. [Google Scholar] [CrossRef] [PubMed]
  28. Nedugov, G.V. Numerical method for solving double exponential models of corpse cooling in the determination of the time of death. Sud. Med. Ekspert 2021, 64, 25–28. [Google Scholar] [CrossRef]
  29. Abraham, J.; Wei, T.; Cheng, L. Validation of a new method of providing case-specific time-of-death estimates using cadaver temperatures. J. Forensic. Sci. 2023. Early View. [Google Scholar] [CrossRef]
  30. Zerdazi, D.; Chibat, A.; Rahmani, F.L. Estimation of Postmortem Period by Means of Artificial Neural Networks. Electron. J. Appl. Stat. Anal. 2016, 9, 326. [Google Scholar]
  31. Al-Alousi, L.M.; Anderson, R.A.; Worster, D.M.; Land, D.V. Factors influencing the precision of estimating the postmortem interval using the triple-exponential formulae (TEF): Part I. A study of the effect of body variables and covering of the torso on the postmortem brain, liver and rectal cooling rates in 117 forensic cases. Forensic Sci. Int. 2002, 125, 223–230. [Google Scholar] [CrossRef]
  32. Al Alousi, L.M.; Anderson, R.A.; Worster, D.M.; Land, D.V. Factors influencing the precision of estimating the postmortem interval using the triple-exponential formulae (TEF): Part II. A study of the effect of body temperature at the moment of death on the postmortem brain, liver and rectal cooling in 117 forensic cases. Forensic Sci. Int. 2002, 125, 231–236. [Google Scholar] [CrossRef]
  33. Potente, S.; Henneicke, L.; Schmidt, P. Prism—A novel approach to dead body cooling and its parameters. Forensic Sci. Int. 2021, 325, 110870. [Google Scholar] [CrossRef]
  34. Potente, S.; Henneicke, L.; Schmidt, P. Prism (II): 127 cooling dummy experiments. Forensic. Sci. Int. 2022, 333, 111238. [Google Scholar] [CrossRef] [PubMed]
  35. Wilk, L.S.; Hoveling, R.J.M.; Edelman, G.J.; Hardy, H.J.J.; van Schouwen, S.; van Venrooij, H.; Aalders, M.C.G. Reconstructing the time since death using noninvasive thermometry and numerical analysis. Sci. Adv. 2020, 6, eaba4243. [Google Scholar] [CrossRef]
  36. Sharma, P.; Kabir, C.S. A Simplified Approach to Understanding Body Cooling Behavior and Estimating the Postmortem Interval. Forensic Sci. 2022, 2, 403–416. [Google Scholar] [CrossRef]
  37. Bovenschen, M.; Schwender, H.; Ritz-Timme, S.; Beseoglu, K.; Hartung, B. Estimation of time since death after a post-mortem change in ambient temperature: Evaluation of a back-calculation approach. Forensic. Sci. Int. 2020, 319, 110656. [Google Scholar] [CrossRef] [PubMed]
  38. Schweitzer, W.; Thali, M.J. Computationally approximated solution for the equation for Henssge’s time of death estimation. Bmc Med. Inform. Decis. Mak. 2019, 19, 201. [Google Scholar] [CrossRef]
  39. Franchuk, V.; Mikhaylichenko, B.; Franchuk, M. Application of the decision tree method in forensic-medical practice in the analysis of ’doctors proceedings. Sud.-Meditsinskaia Ekspertiza 2020, 63, 9–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  40. Lenin Fred, A.; Kumar, S.; Padmanabhan, P.; Gulyas, B.; Haridhas, A.K.; Dayana, N. Chapter 8—Multiview decision tree-based segmentation of tumors in MR brain medical images. In Handbook of Decision Support Systems for Neurological Disorders; Jude, H.D., Ed.; Academic Press: Cambridge, MA, USA, 2021; pp. 125–142. [Google Scholar] [CrossRef]
  41. Murdaca, G.; Caprioli, S.; Tonacci, A.; Billeci, L.; Greco, M.; Negrini, S.; Cittadini, G.; Zentilin, P.; Ventura Spagnolo, E.; Gangemi, S. A Machine Learning Application to Predict Early Lung Involvement in Scleroderma: A Feasibility Evaluation. Diagnostics 2021, 11, 1880. [Google Scholar] [CrossRef]
  42. Shehab, M.; Abualigah, L.; Shambour, Q.; Abu-Hashem, M.A.; Shambour, M.K.Y.; Alsalibi, A.I.; Gandomi, A.H. Machine learning in medical applications: A review of state-of-the-art methods. Comput. Biol. Med. 2022, 145, 105458. [Google Scholar] [CrossRef] [PubMed]
  43. Gareth, J.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning, with Applications in R; Springer: New York, NY, USA, 2021; p. 607. [Google Scholar] [CrossRef]
  44. Vapnik, V.N. The Nature of Statistical Learning Theory, 2nd ed.; Springer: New York, NY, USA, 2000. [Google Scholar]
  45. Haykin, S. Neural Networks and Learning Machines; Number 10. k. in Neural Networks and Learning Machines; Prentice Hall: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
  46. Chen, B.B. Comprehensive Chemometrics, Chemical and Biochemical Data Analysis; Elsevier: Amsterdam, The Netherlands, 2009; Volume 3. [Google Scholar]
  47. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely Randomized Trees. Mach. Learn. 2006, 36, 3–42. [Google Scholar] [CrossRef] [Green Version]
  48. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  49. Henssge, C. Rectal temperature time of death nomogram: Dependence of corrective factors on the body weight under stronger thermic insulation conditions. Forensic. Sci. Int. 1992, 54, 51–66. [Google Scholar] [CrossRef]
  50. Mall, G.; Hubig, M.; Eckl, M.; Buettner, A.; Eisenmenger, W. Modelling postmortem surface cooling in continuously changing environmental temperature. Leg. Med. 2002, 4, 164–173. [Google Scholar] [CrossRef]
  51. Burger, E.H.; Dempers, J.J.; Steiner, S.; Shepherd, R. Henssge nomogram typesetting error. Forensic. Sci. Med. Pathol. 2013, 9, 615–617. [Google Scholar] [CrossRef] [PubMed]
  52. scikit-learn 1.2.1. Available online: https://scikit-learn.org/stable/10.02.2023 (accessed on 14 September 2022).
  53. Hastie, T.J.; Rosset, S.; Zhu, J.; Zou, H. Multi-class AdaBoost. Stat. Its Interface 2009, 2, 349–360. [Google Scholar] [CrossRef] [Green Version]
  54. Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
Figure 1. MAE, MSE, and R 2 of theoretical model.
Figure 1. MAE, MSE, and R 2 of theoretical model.
Diagnostics 13 01260 g001
Figure 2. Average error with a 5 kg windowing as a function of the correction factor for C = 5 and C = 100 cases with the AdaBoost + SVR model.
Figure 2. Average error with a 5 kg windowing as a function of the correction factor for C = 5 and C = 100 cases with the AdaBoost + SVR model.
Diagnostics 13 01260 g002
Figure 3. The results of the different methods at distances of 1 σ and 2 σ .
Figure 3. The results of the different methods at distances of 1 σ and 2 σ .
Diagnostics 13 01260 g003
Table 1. The value of the parameter A.
Table 1. The value of the parameter A.
T a A
23.2   ° C 1.25
23.3   ° C 1.11
Table 2. The errors of SVR.
Table 2. The errors of SVR.
 MAEMSE R 2
C = 100.25780.27460.9886
C = 200.22550.22520.9906
C = 500.19790.18280.9924
C = 1000.16830.12900.9947
Table 3. The errors of AdaBoost + SVR.
Table 3. The errors of AdaBoost + SVR.
 MAEMSE R 2
C = 100.21770.13400.9944
C = 200.18750.09870.9959
C = 500.18200.11090.9954
C = 1000.16060.07620.9969
Table 4. The results of the different methods at distances of 1 σ and 2 σ .
Table 4. The results of the different methods at distances of 1 σ and 2 σ .
Name 1 σ Value 2 σ Value 1 σ 2 σ
Decision tree−1.1751–1.0571−2.2912–2.17322161 (80.36%)2546 (94.68%)
Bagging−0.64297–0.60864−1.2688–1.23442004 (74.53%)2506 (93.19%)
Random forests−0.64995–0.59144−1.2706–1.21212034 (75.64%)2504 (93.12%)
Extra trees−0.5316–0.51464−1.0601–1.03952064 (76.76%)2514 (93.49%)
SVR−0.60545–0.54442−1.1804–1.11942292 (85.24%)2569 (95.54%)
AdaBoost + SVR−0.3423–0.32924−0.67807–0.665012076 (77.2%)2552 (94.91%)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dani, L.M.; Tóth, D.; Frigyik, A.B.; Kozma, Z. Beyond Henssge’s Formula: Using Regression Trees and a Support Vector Machine for Time of Death Estimation in Forensic Medicine. Diagnostics 2023, 13, 1260. https://doi.org/10.3390/diagnostics13071260

AMA Style

Dani LM, Tóth D, Frigyik AB, Kozma Z. Beyond Henssge’s Formula: Using Regression Trees and a Support Vector Machine for Time of Death Estimation in Forensic Medicine. Diagnostics. 2023; 13(7):1260. https://doi.org/10.3390/diagnostics13071260

Chicago/Turabian Style

Dani, Lívia Mária, Dénes Tóth, Andrew B. Frigyik, and Zsolt Kozma. 2023. "Beyond Henssge’s Formula: Using Regression Trees and a Support Vector Machine for Time of Death Estimation in Forensic Medicine" Diagnostics 13, no. 7: 1260. https://doi.org/10.3390/diagnostics13071260

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop