# Some Insights about the Applicability of Logistic Factorisation Machines in Banking

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Logistic Factorisation Machines

**β**is a vector of parameters and $p$ is the linkage function in a generalised linear model. Let $\mathit{x}$ denote a particular realisation of $\mathit{X}$, then the odds of a positive event are $\frac{p\left(\mathit{x}\right)}{1-p\left(\mathit{x}\right)}$. Notice that since $p\left(\mathit{x}\right)\in \left[0,1\right]$, the logarithm of the odds ranges in the set of real numbers, thus constituting an unrestricted continuous quantity to which it is possible to fit a linear regression model as

**ϕ**’s ($KG$ factor loadings) have to be estimated and that $G$ is an input parameter for the fitting procedure. Therefore, the LFM model in Equation (4) has $1+K+KG$ parameters (or coefficients), while the regression model in Equation (3) has $1+K+K\left(K-1\right)/2$ parameters. When $G<\left(K-1\right)/2$, LFM requires fewer parameters to be estimated. The advantages and disadvantages of FMs compared to regression models with interaction are discussed in Slabber et al. (2021) and Slabber et al. (2022). As mentioned in the latter papers, when the number of predictors is large, ordinary two-way interaction regression models suffer from a combinatorial explosion of parameters which make them impractical to fit (see, e.g., James et al. (2021)). In such cases, LFM provides an attractive alternative since the number of parameters increases in a linear rather than in a quadratic way.

## 3. Fitting Logistic Factorization Machines

## 4. Performance Measures

#### 4.1. Popular Measures

**Remark**

**1.**

#### 4.2. Other Measures

#### 4.2.1. Goodness-of-Fit

#### 4.2.2. R-Square Measures

#### 4.2.3. The H Measure (H)

#### 4.3. Summary

## 5. Simulation Study

**Remark**

**2.**

## 6. Analysis of Prediction Performance on Some Data Sets

#### 6.1. Artificial Recommender System Example

**Remark**

**3.**

- (a)
- When inspecting the results of the LFM fits, we noticed that, when all the observations on the edges of a block structure are removed, LFMs struggle to predict those values correctly.
- (b)
- A straightforward implementation of RF on this problem provided poor results. This is expected because it is well-known that RFs struggle with sparse data sets. Of course, research on improving RFs to cater for these types of problems are ongoing (see, e.g., Hariharan et al. 2017; Wang et al. 2018). However, given the fact that LFM2 provide a perfect fit on both training and validation sets, it is clearly the winner in this case.

#### 6.2. Credit Card Fraud Example

**Remark**

**4.**

- (a)
- To run PROC OPTMODEL, we had to adapt the SAS Config file by changing MEMSIZE from 2G to 10G; otherwise, one gets stuck in memory problems, especially when fitting LFM6.
- (b)
- We considered various other data sets where the number of observations was reduced by keeping the number of frauds and non-frauds equal. As the sample size gets smaller, the advantage of the FMs over LR, LRI, and BRF becomes more prominent.

#### 6.3. Credit Scoring Example

**Remark**

**5.**

## 7. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Agusta, Zahra Putri. 2019. Modified balanced random forest for improving imbalanced data prediction. International Journal of Advances in Intelligent Informatics 5: 58–65. [Google Scholar] [CrossRef]
- Ai, Chunrong, and Edward C. Norton. 2003. Interaction terms in logit and probit models. Economics Letters 80: 123–29. [Google Scholar] [CrossRef]
- Allison, Paul D. 2014. Measures of fit for logistic regression. Paper presented at SAS Global Forum 2014 Conference, Washington, DC, USA, March 23–26. [Google Scholar]
- Baesens, Bart, Daniel Roesch, and Harald Scheule. 2016. Credit Risk Analytics: Measurement Techniques, Applications, and Examples in SAS. Hoboken: John Wiley & Sons. [Google Scholar]
- Breiman, Leo. 2001. Random forests. Machine Learning 45: 5–32. [Google Scholar] [CrossRef][Green Version]
- Crook, Jonathan. 2014. Kruger National Park, Skukuza, South Africa. Personal communication.
- De Jongh, Riaan, Erika De Jongh, Marius Pienaar, Heather Gordon-Grant, Marien Oberholzer, and Leonard Santana. 2015. The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring. ORiON 31: 17–37. [Google Scholar] [CrossRef][Green Version]
- Engelmann, Bernd, and Robert Rauhmeier. 2006. The Basel II Risk Parameters: Estimation, Validation, and Stress Testing. Berlin/Heidelberg: Springer Science & Business Media. [Google Scholar]
- Frost, Jim. 2019. Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries. State College: Jim Publishing. ISBN 978-1-7354311-0-9. [Google Scholar]
- Gilpin, Leilani H., David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining explanations: An overview of interpretability of machine learning. Paper presented at 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, October 1–3; pp. 80–89. [Google Scholar]
- Giner-Baixauli, Carlos, Juan Tinguaro Rodríguez, Alejandro Álvaro-Meca, and Daniel Vélez. 2021. Modelling Interaction Effects by Using Extended WOE Variables with Applications to Credit Scoring. Mathematics 9: 1903. [Google Scholar] [CrossRef]
- Hand, David J. 2009. Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learning 77: 103–23. [Google Scholar] [CrossRef][Green Version]
- Hand, David J., and Christoforos Anagnostopoulos. 2022. Notes on the H-measure of classifier performance. Advances in Data Analysis and Classification. [Google Scholar] [CrossRef]
- Hand, David J., and William E. Henley. 1997. Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society: Series A (Statistics in Society) 160: 523–41. [Google Scholar] [CrossRef]
- Hariharan, Siddharth, Siddhesh Tirodkar, Alok Porwal, Avik Bhattacharya, and Aurore Joly. 2017. Random forest-based prospectivity modelling of greenfield terrains using sparse deposit data: An example from the Tanami Region, Western Australia. Natural Resources Research 26: 489–507. [Google Scholar] [CrossRef]
- Hilbe, Joseph M. 2009. Logistic Regression Models. Boca Raton: Chapman and Hall/CRC. [Google Scholar]
- James, Gareth, Witten Daniela, Hastie Trevor, and Tibshirani Robert. 2021. An Introduction to Statistical Learning with Applications in R, 2nd ed. New York: Springer. [Google Scholar] [CrossRef]
- Jiang, Yixiao. 2021. Semiparametric Estimation of a Corporate Bond Rating Model. Econometrics 9: 23. [Google Scholar] [CrossRef]
- Kaggle. 2021. Credit Card Fraud Detection Dataset. Available online: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud (accessed on 12 June 2021).
- Kleinbaum, David, and Mitchel Klein Regression. 2005. Logistic Regression: A Self-Learning Text. New York: Springer, p. 22. [Google Scholar]
- Lessmann, Stefan, Bart Baesens, Hsin-Vonn Seow, and Lyn C. Thomas. 2015. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 247: 124–36. [Google Scholar] [CrossRef][Green Version]
- McCullagh, Peter, and John A Nelder. 1989. Monographs on statistics and applied probability. Generalized Linear Models (second edition), Chapman and Hall (London and New York). Available online: https://www.utstat.toronto.edu/~brunner/oldclass/2201s11/readings/glmbook.pdf (accessed on 12 February 2023).
- McFadden, Daniel, and Paul Zarembka. 1974. Frontiers in Econometrics. New York: Academic Press. [Google Scholar]
- Prorokowski, Lukasz. 2019. Validation of the backtesting process under the targeted review of internal models: Practical recommendations for probability of default models. Journal of Risk Model Validation 13: 109–47. [Google Scholar] [CrossRef]
- Rendle, Steffen. 2010. Factorization machines. Paper presented at 2010 IEEE International Conference on Data Mining, Sydney, NSW, Australia, December 13–17; pp. 995–1000. [Google Scholar]
- Rendle, Steffen. 2012. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology (TIST) 3: 1–22. [Google Scholar] [CrossRef]
- SAS Institute Inc. 2010. Predictive Modelling Using Logistic Regression (SAS Course Notes). Cary: SAS Institution Inc. [Google Scholar]
- Schaeben, Helmut. 2014. A mathematical view of weights-of-evidence, conditional independence, and logistic regression in terms of Markov random fields. Mathematical Geosciences 46: 691–709. [Google Scholar] [CrossRef]
- Schaeben, Helmut. 2020. Comment on “Modified Weights-of-Evidence Modeling with Example of Missing Geochemical Data”. Complexity 2020: 1–4. [Google Scholar] [CrossRef]
- Sharma, Dhruv. 2011. Evidence in favor of weight of evidence and binning transformations for predictive modeling. Available online: https://ssrn.com/abstract=1925510 (accessed on 12 February 2023).
- Shtatland, Ernest S., Sara Moore, and Mary. B. Barton. 2000. Why we need an R-square measure of fit (and not only one) in PROC LOGISTIC and PROC GENMOD. Paper presented at Twenty-Fifth Annual SAS® Users Group International Conference, Indianapolis, Indiana, April 9–12; Cary: SAS Institute Inc., pp. 256–25. [Google Scholar]
- Siddiqi, Naeem. 2012. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Hoboken: John Wiley & Sons. [Google Scholar]
- Siddiqi, Naeem. 2017. Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards. Hoboken: John Wiley & Sons. [Google Scholar]
- Slabber, Erika, Tanja Verster, and Riaan De Jongh. 2021. Advantages of Using Factorisation Machines as a Statistical Modelling Technique. South African Statistical Journal 55: 125–44. [Google Scholar] [CrossRef]
- Slabber, Erika, Tanja Verster, and Riaan De Jongh. 2022. Algorithms for estimating the parameters of factorisation machines. South African Statistical Journal 56: 69–89. [Google Scholar] [CrossRef]
- Tjur, Tue. 2009. Coefficients of determination in logistic regression models—A new proposal: The coefficient of discrimination. The American Statistician 63: 366–72. [Google Scholar] [CrossRef]
- Venter, Hennie, and Riaan De Jongh. 2023. Variable selection by searching for good subsets. South African Statistical Journal. Accepted. [Google Scholar]
- Wang, Qiang, Thanh-Tung Nguyen, Joshua Z. Huang, and Thuy Thi Nguyen. 2018. An efficient random forests algorithm for high dimensional data classification. Advances in Data Analysis and Classification 12: 953–72. [Google Scholar] [CrossRef]
- Zeng, Guoping. 2014. A necessary condition for a good binning algorithm in credit scoring. Applied Mathematical Sciences 8: 3229–42. [Google Scholar] [CrossRef]

Predicted | Actual | ||

Positive | Negative | ||

Positive | True positives (TP) | False positives (FP) | |

Negative | False negatives (FN) | True negatives (TN) |

**Table 2.**Average of predictor coefficient estimates over 200 repetitions for $N=1000$ for logistic regression (LR), logistic regression with interaction (LRI), two factor factorisation machine based on logit loss (LLFM2) and maximum likelihood (MLEFM2), and four factorisation machines (LLFM4 and MLEFM4).

Betas | True Value | LR | LRI | LLFM2 | MLEFM2 | LLFM4 | MLEFM4 |
---|---|---|---|---|---|---|---|

0 | 0 | 0.225 | 0.008 | 0.014 | 0.013 | 0.012 | 0.013 |

1 | 1 | 0.576 | 1.157 | 1.075 | 1.075 | 1.136 | 1.136 |

2 | 2 | 1.046 | 2.321 | 2.155 | 2.155 | 2.278 | 2.278 |

3 | −3 | −1.578 | −3.486 | −3.240 | −3.240 | −3.419 | −3.419 |

4 | 1 | 0.561 | 1.171 | 1.092 | 1.092 | 1.149 | 1.149 |

5 | 1 | 0.565 | 1.172 | 1.080 | 1.080 | 1.149 | 1.149 |

6 | 0 | 0.002 | 0.009 | 0.005 | 0.005 | 0.008 | 0.008 |

7 | 0 | −0.005 | 0.002 | 0.002 | 0.002 | 0.000 | 0.000 |

8 | 0 | 0.003 | −0.005 | −0.007 | −0.007 | −0.006 | −0.006 |

9 | 0 | −0.008 | 0.003 | −0.004 | −0.004 | 0.003 | 0.003 |

10 | 0 | 0.005 | 0.006 | 0.011 | 0.011 | 0.007 | 0.007 |

**Table 3.**Standard deviation of predictor coefficient estimates over 200 repetitions for $N=1000$ for logistic regression (LR), logistic regression with interaction (LRI), two factor factorisation machine based on logit loss (LLFM2) and maximum likelihood (MLEFM2), and four factorisation machines (LLFM4 and MLEFM4).

Betas | LR | LRI | LLFM2 | MLEFM2 | LLFM4 | MLEFM4 |
---|---|---|---|---|---|---|

0 | 0.082 | 0.158 | 0.129 | 0.129 | 0.148 | 0.148 |

1 | 0.088 | 0.154 | 0.136 | 0.136 | 0.149 | 0.149 |

2 | 0.111 | 0.256 | 0.225 | 0.225 | 0.248 | 0.248 |

3 | 0.134 | 0.345 | 0.311 | 0.311 | 0.328 | 0.328 |

4 | 0.093 | 0.169 | 0.153 | 0.153 | 0.164 | 0.164 |

5 | 0.095 | 0.170 | 0.147 | 0.147 | 0.164 | 0.164 |

6 | 0.078 | 0.148 | 0.127 | 0.127 | 0.142 | 0.142 |

7 | 0.089 | 0.147 | 0.130 | 0.130 | 0.143 | 0.143 |

8 | 0.080 | 0.143 | 0.127 | 0.127 | 0.136 | 0.136 |

9 | 0.095 | 0.154 | 0.135 | 0.135 | 0.149 | 0.149 |

10 | 0.087 | 0.154 | 0.138 | 0.138 | 0.149 | 0.149 |

**Table 4.**Mean squared error of predictor coefficient estimates over 200 repetitions for $N=1000$ for the logistic regression and factorisation machine models.

Betas | LR | LRI | LLFM2 | MLEFM2 | LLFM4 | MLEFM4 |
---|---|---|---|---|---|---|

0 | 0.057 | 0.025 | 0.017 | 0.017 | 0.022 | 0.022 |

1 | 0.188 | 0.049 | 0.024 | 0.024 | 0.041 | 0.041 |

2 | 0.921 | 0.169 | 0.074 | 0.074 | 0.139 | 0.139 |

3 | 2.039 | 0.355 | 0.154 | 0.154 | 0.284 | 0.284 |

4 | 0.201 | 0.058 | 0.032 | 0.032 | 0.049 | 0.049 |

5 | 0.198 | 0.058 | 0.028 | 0.028 | 0.049 | 0.049 |

6 | 0.006 | 0.022 | 0.016 | 0.016 | 0.020 | 0.020 |

7 | 0.008 | 0.021 | 0.017 | 0.017 | 0.021 | 0.021 |

8 | 0.006 | 0.020 | 0.016 | 0.016 | 0.018 | 0.018 |

9 | 0.009 | 0.024 | 0.018 | 0.018 | 0.022 | 0.022 |

10 | 0.008 | 0.024 | 0.019 | 0.019 | 0.022 | 0.022 |

2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
---|---|---|---|---|---|---|---|---|---|

1 | 0.06821 | 0.08044 | 0.02662 | 0.02950 | 0.02522 | 0.02432 | 0.02343 | 0.02842 | 0.02549 |

2 | 0.20197 | 0.03057 | 0.03199 | 0.03260 | 0.03029 | 0.02791 | 0.03498 | 0.03719 | |

3 | 0.04024 | 0.04144 | 0.04289 | 0.03534 | 0.04172 | 0.04525 | 0.04125 | ||

4 | 0.03165 | 0.02307 | 0.02285 | 0.02166 | 0.02837 | 0.02934 | |||

5 | 0.02131 | 0.02465 | 0.02273 | 0.02747 | 0.02052 | ||||

6 | 0.02056 | 0.02078 | 0.02142 | 0.02436 | |||||

7 | 0.01717 | 0.02657 | 0.02172 | ||||||

8 | 0.02304 | 0.02339 | |||||||

9 | 0.16936 |

2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
---|---|---|---|---|---|---|---|---|---|

1 | 0.00510 | 0.00747 | 0.00001 | 0.00003 | 0.00009 | 0.00001 | 0.00001 | 0.00010 | 0.00017 |

2 | 0.02911 | 0.00005 | 0.00008 | 0.00040 | 0.00009 | 0.00005 | 0.00014 | 0.00013 | |

3 | 0.00006 | 0.00008 | 0.00031 | 0.00005 | 0.00005 | 0.00034 | 0.00014 | ||

4 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00009 | 0.00028 | |||

5 | 0.00000 | 0.00000 | 0.00000 | 0.00011 | 0.00006 | ||||

6 | 0.00000 | 0.00000 | 0.00009 | 0.00011 | |||||

7 | 0.00000 | 0.00015 | 0.00006 | ||||||

8 | 0.00008 | 0.00014 | |||||||

9 | 0.02429 |

**Table 7.**Time that the estimation algorithms of the models took to converge over the simulation runs.

Average | Standard Deviation | Minimum | Maximum | |
---|---|---|---|---|

LR | 0.057 | 0.015 | 0.032 | 0.185 |

LRI | 0.095 | 0.065 | 0.062 | 0.825 |

LLFM2 | 5.807 | 7.158 | 0.364 | 28.634 |

MLEFM2 | 5.310 | 6.526 | 0.336 | 37.872 |

LLFM4 | 8.572 | 3.820 | 1.013 | 21.079 |

MLEFM4 | 8.491 | 3.917 | 1.101 | 20.204 |

Average | Standard Deviation | Minimum | Maximum | |
---|---|---|---|---|

LR | 0.750 | 0.023 | 0.679 | 0.803 |

LRI | 0.935 | 0.011 | 0.904 | 0.964 |

LLFM2 | 0.926 | 0.014 | 0.846 | 0.956 |

MLEFM2 | 0.926 | 0.013 | 0.844 | 0.956 |

LLFM4 | 0.933 | 0.011 | 0.903 | 0.962 |

MLEFM4 | 0.933 | 0.011 | 0.902 | 0.962 |

Average | Standard Deviation | Minimum | Maximum | |
---|---|---|---|---|

LR | 0.689 | 0.031 | 0.605 | 0.763 |

LRI | 0.877 | 0.018 | 0.821 | 0.926 |

LLFM2 | 0.859 | 0.021 | 0.791 | 0.913 |

MLEFM2 | 0.859 | 0.022 | 0.768 | 0.913 |

LLFM4 | 0.873 | 0.018 | 0.816 | 0.924 |

MLEFM4 | 0.873 | 0.018 | 0.816 | 0.924 |

RF | 0.999 | 0.001 | 0.999 | 1.000 |

Average | Standard Deviation | Minimum | Maximum | |
---|---|---|---|---|

LR | 0.671 | 0.045 | 0.546 | 0.801 |

LRI | 0.800 | 0.035 | 0.658 | 0.886 |

LLFM2 | 0.816 | 0.036 | 0.657 | 0.901 |

MLEFM2 | 0.816 | 0.036 | 0.676 | 0.901 |

LLFM4 | 0.245 | 0.235 | −0.290 | 0.789 |

MLEFM4 | 0.251 | 0.222 | −0.231 | 0.788 |

RF | 0.696 | 0.049 | 0.537 | 0.798 |

**Table 11.**The full binary ratings data set where the shaded zeros and ones indicate the removed ratings.

${\mathit{I}}_{1}$ | ${\mathit{I}}_{2}$ | ${\mathit{I}}_{3}$ | ${\mathit{I}}_{4}$ | ${\mathit{I}}_{5}$ | ${\mathit{I}}_{6}$ | ${\mathit{I}}_{7}$ | ${\mathit{I}}_{8}$ | ${\mathit{I}}_{9}$ | ${\mathit{I}}_{10}$ | ${\mathit{I}}_{11}$ | ${\mathit{I}}_{12}$ | ${\mathit{I}}_{13}$ | ${\mathit{I}}_{14}$ | ${\mathit{I}}_{15}$ | ${\mathit{I}}_{16}$ | ${\mathit{I}}_{17}$ | ${\mathit{I}}_{18}$ | ${\mathit{I}}_{19}$ | ${\mathit{I}}_{20}$ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

${\mathit{U}}_{\mathbf{1}}$ | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{2}}$ | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{3}}$ | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{4}}$ | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{5}}$ | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{6}}$ | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{7}}$ | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{8}}$ | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{9}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{10}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{11}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{12}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{13}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{14}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{15}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{16}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |

${\mathit{U}}_{\mathbf{17}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |

${\mathit{U}}_{\mathbf{18}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |

${\mathit{U}}_{\mathbf{19}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |

${\mathit{U}}_{\mathbf{20}}$ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |

Modelling Technique | Train | Validation | Number of Parameters to Estimate |
---|---|---|---|

LR | 0.987 | 0.929 | 31 |

LRI | 0.994 | 0.925 | 466 |

LFM2 | 0.998 | 0.824 | 91 |

LFM4 | 0.999 | 0.835 | 151 |

LFM6 | 0.999 | 0.741 | 211 |

BRF | 0.996 | 0.737 |

Modelling Technique | Train | Validation |
---|---|---|

LR | 0.815 | 0.803 |

LRI | 0.648 | 0.507 |

LFM2 | 0.782 | 0.752 |

BRF | 1.000 | 0.860 |

Modelling Technique | Train | Validation | Number of Parameters to Estimate |
---|---|---|---|

LR | 0.995 | 0.827 | 31 |

LRI | |||

LFM2 | 0.999 | 0.816 | 91 |

LFM4 | 0.999 | 0.857 | 151 |

LFM6 | 0.999 | 0.853 | 211 |

BRF | 0.999 | 0.843 |

Modelling Technique | Train | Validation |
---|---|---|

LR | 0.989 | 0.900 |

LRI | ||

LFM2 | 1.000 | 0.881 |

BRF | 1.000 | 0.916 |

Modelling Technique | Standardised Original Variables | WoE Transformed Variables | Number of Parameters | ||
---|---|---|---|---|---|

Train | Validation | Train | Validation | ||

LR | 0.772 | 0.783 | 0.814 | 0.824 | 39 |

LRI | 0.805 | 0.800 | 0.847 | 0.825 | 742 |

LFM2 | 0.797 | 0.802 | 0.828 | 0.829 | 115 |

LFM4 | 0.804 | 0.804 | 0.832 | 0.830 | 191 |

LFM6 | 0.809 | 0.807 | 0.836 | 0.830 | 267 |

BRF | 0.999 | 0.883 | 0.999 | 0.862 |

Modelling Technique | Standardised Original Variables | WoE Transformed Variables | ||
---|---|---|---|---|

Train | Validation | Train | Validation | |

LR | 0.383 | 0.386 | 0.398 | 0.399 |

LRI | 0.400 | 0.386 | 0.433 | 0.401 |

LFM2 | 0.393 | 0.391 | 0.411 | 0.406 |

BRF | 0.999 | 0.391 | 0.977 | 0.402 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Slabber, E.; Verster, T.; de Jongh, R.
Some Insights about the Applicability of Logistic Factorisation Machines in Banking. *Risks* **2023**, *11*, 48.
https://doi.org/10.3390/risks11030048

**AMA Style**

Slabber E, Verster T, de Jongh R.
Some Insights about the Applicability of Logistic Factorisation Machines in Banking. *Risks*. 2023; 11(3):48.
https://doi.org/10.3390/risks11030048

**Chicago/Turabian Style**

Slabber, Erika, Tanja Verster, and Riaan de Jongh.
2023. "Some Insights about the Applicability of Logistic Factorisation Machines in Banking" *Risks* 11, no. 3: 48.
https://doi.org/10.3390/risks11030048