1. Introduction
Healthcare associated infections (HAI) are among the major complications of modern medical therapy [
1]. The most important HAIs are those related to invasive devices: central line-associated bloodstream infections (CLABSI), catheter-associated urinary tract infections (CAUTI), ventilator-associated pneumonia (VAP) as well as surgical site infections (SSI). Surgical site infections (SSIs) are infections that arise after surgery and can affect the skin, an organ and part of the body [
2]. The last ECDC Report for SSIs shows that in 2017, 10.149 SSIs were reported using patient-based or unit-based surveillance. Of these 47% were superficial, 30% deep and 22% organ/space SSIs. Thirty-four per cent of the SSIs were diagnosed in hospitals, whereas 52% were detected after discharge [
3]. The reported prevalence ranges from 0.5% to 10.1% [
3]. The latest Italian HAI prevalence study reports a prevalence of 8.03%. Of these infections, 16% were SSIs with a prevalence of 1.27% [
4].
Most of the surgical site infections (SSIs) arise in secondary care and are often associated with antibiotic-resistant organisms, such as methicillin-resistant Staphylococcus aureus. A long list of potential patient’s risk factors for SSI has been identified, but few have been confirmed as such in randomized clinical trials. SSIs represent about a fifth of all healthcare-associated infections [
5]. The SSIs cost usually is defined as hospital charges for additional goods and resources. In patients with SSI, costs are divided into direct and indirect. Direct costs include prolonged hospitalization and readmission to the hospital, outpatient visits, visits to the emergency department, additional surgery, and prolonged antibiotic therapy, while the indirect costs are difficult to quantify because they include lost productivity not only by the patient but also by family members or friends. For this reason, the true cost of an SSI often is unknown [
6]. There are multiple factors contributing to HAI and including healthcare associated factors, environmental factors, and patient-related factors. Healthcare factors include the use of invasive devices, the type of surgery surgical procedures and pressure from excessive antibiotic use as prophylaxis. The environmental factors include contaminated air-conditioning systems and the physical layout of the facility (e.g., open units with beds close together). Patient’s related factors include genetic factors, the severity of underlying illness, use of immunosuppressive agents and prolonged hospital stays.
It is recognized now that many SSIs are partially preventable and that healthcare can become safer. A passive strategy can be used for SSI reduction in which surveillance protocols lead to infection reduction through timely and timely feedback [
7]. The classic study that demonstrated the importance of this was the SENIC study, funded by the United States Centers for Disease Control and Prevention (CDC) and included 338 randomly selected hospitals stratified by geography, bed capacity and teaching status. The work showed that infection control programs with dedicated hospital epidemiologists and surveillance programs reduced nosocomial infections by 32% compared to facilities without infection control programs [
8]. To design an effective prevention program, it is necessary to consider the impact that SSIs have on the length of hospital stay [
9,
10,
11,
12], which is a performance indicator of the quality of health processes [
13,
14,
15,
16,
17,
18,
19]. In addition, identifying risk factors associated with SSIs can help reduce the incidence of SSIs [
11,
20] and add value to HTA studies, which are widely used to support health decision-making [
21,
22,
23,
24,
25]. This paper presents a logistic regression model to study the impact that different clinical, demographic and organizational factors have on the risk of occurrence of SSIs in a surgery department and an AI model to predict the risk of infection has been used.
2. Materials and Methods
The study population include all patients that underwent surgery at the surgical departments of the “Federico II” University Hospital in Naples (Italy) between 2015 and 2019. Active, patient-based surveillance for surgical site infection (SSIs) is continually performed by trained healthcare staff in all surgery departments of the Hospital. The Protocol adopted in the Campania Region for the surveillance of SSIs corresponds to that of the SNICh national surveillance system [
26,
27]. It defines which interventions to monitor, how and for how long to carry out surveillance, it indicates the information to be collected for each intervention, provides definitions for each of the variables of interest, such as diagnosis of surgical site infection, class and type of intervention, duration of intervention, ASA score, risk index, etc. Data collection is carried out prospectively for all patients undergoing selected surgical interventions. All patients who meet the inclusion criteria in the chosen surveillance period (1 year) are included without any selection. Patients must be supervised, if necessary, even after discharge, for a period of 30 days after surgery in the case of surgical interventions that do not involve the placement of prostheses; the follow-up must be continued for 90 days for patients undergoing interventions with prosthetic material implantation. The ICAARO web IT platform has been implemented since 2015. All companies in the Campania Region can access the platform and manage data from the surveillance of Infections Related to Assistance. ICAARO web allows direct entry of data collected as part of surveillance activities, as well as the extraction of specific reports for each surveillance [
27].
For this study, no patients’ informed consent, nor local Ethical Committee authorization was required, as all the data come from HAIs surveillance that is regulated by the Regional Health Authority as defined in the Regional Plan for Healthcare-associated infections Prevention and Control [
28]. Data were collected retrospectively using ICARO web IT reports and QuaniSDO, i.e., the system employed for the computerization of hospital discharge records,. A risk folder must be completed for all patients hospitalized from the time of admission to discharge. This folder contains epidemiological data important for the study of hospital infections. All diabetic patients and patients treated with corticosteroids were excluded because these two conditions slow wound healing and predispose to bacterial infections. The information, extracted from medical records, are the following:
Gender (male/female);
Age;
Length of stay (days);
Hospital regime;
Surgery department;
Number of antibiotics;
SSI (yes/no).
In accordance with the definition of the National Nosocomial Infection Surveillance System (NNIS), surgical site infection (SSI) is defined as an infection that occurs within 30 days of surgery (or within 1 year if thereafter an implant is left in place during the surgical procedure, i.e., an implantable foreign body, of non-human origin) and which may involve the incisional or deep tissue at the site of the surgery [
29].
All data relating to the SSIs were collected by the surgeon who performed the surgery. The surgeon also supervised the patient both during hospitalization and for a period of 30 days following discharge, as recommended by the Centers for Disease Control and Prevention (CDC) criteria [
2]. Post-operative follow-up could be done during an outpatient visit or by telephone interview. In addition, upon discharge, each patient received a pre-printed questionnaire, in order to record the onset of any symptoms of HAI (Hospital Acquired Infections) during the follow-up period.
2.1. Statistical Analysis
Before performing the statistical tests, the distributions were analyzed using the Shapiro-Francia test. For parameter age and parameter length of stay, the test showed a non-normal distribution. For this reason, non-parametric tests were used. Specifically, Kruskal-Wallis and statistic tests have been implemented to obtain population characteristics. The Kruskal-Wallis test is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups of an independent variable on a continuous or ordinal dependent variable. A
p-value below the threshold of 0.05 was considered significant for the above tests [
30,
31,
32,
33,
34,
35].
Logistic regressions were used to test the association between the SSIs (as dependent variable) and the different risk factors under study (as explanatory variables). The explanatory variables are: sex, age, hospital regime, surgery department, length of preoperative and postoperative hospital stay and antibiotic prophylaxis. The multivariate model was adjusted on the risk factors considered. Associations were deemed significant if p-values were below the threshold of 0.05. Sensitivity analyses were performed using Firth’s penalized maximum likelihood logistic regression. Data were analyzed using STATA version 15.
2.2. Predictive Analysis
Machine learning algorithms are learning functions that allow mapping input variables to an output value with the aim of making predictions. This initial learning task allows to subsequently classify given new samples of the same input variables. Different Artificial Intelligence models are used: Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), K-Nearest Neighbors (KNN), Gradient Boosted Tree (GBT), XGBoost (XGB) and Naive Bayes (NB). The target value (SSI) is influenced by the input variables. Starting from the knowledge acquired through the analysis of the initial set of data called training, the model was built. For this reason, the dataset was divided into training (75%) and test (25%) sets. Since the dataset was unbalanced in terms of the presence of infections, the Synthetic Minority Over-sampling Technique (SMOTE) was used [
36]. Some supervised learning algorithms (such as decision trees and neural networks) require an equal distribution of classes to generalize well, i.e., to achieve good classification performance. In case of unbalanced input data, e.g., there are only a few objects of the “active” class but many of the “inactive” class, this node adjusts the distribution of classes by adding artificial rows (in the example adding rows for the “active” class).
The algorithm works approximately as follows: It creates synthetic rows by extrapolating between a real object of a given class (in the above example “active”) and one of its nearest neighbors (of the same class). It then chooses a point along the line between these two objects and determines the attributes (cell values) of the new object based on this randomly chosen point.
4. Discussion
The factors that most influence the risk of SSIs were evaluated in order to study the unbalanced datasets, where the size of the class “infected patients” was lower than that of the “un-infected patients” and for this reason the “Firth’s penalized maximum likelihood logistic regression” was used, in fact, this type of regression is well-suited for unbalanced datasets and for the regression analysis of rare events. The obtained results demonstrated how the level of contamination of the type of surgery can impact significantly the infection rate (
p-value < 0.005) since it is obvious that Maxillo-facial and Orthopedics surgery has a higher risk of provoking infections. In the scientific literature, in fact, several studies report that surgical procedures in the Maxillo-Facial area are at high risk of infection. For example, Cunha et al. [
37] discuss the risk of infection for head and neck oncological surgery, while Cousin et al. [
38] for orthognathic surgery.
In addition, the length of preoperative hospital stay cannot increase the rate of surgical infections (
p-value = 0.071). In fact, it is known that the length of post-operative stay and the risk of infection are correlated. Mujagic et al. [
12] show that although there is no significant independent association between preoperative length of stay and risk of SSI, postoperative LOS were significantly associated with SSI. The latter is also confirmed by the various studies reviewed by Manoukian et al. [
39]. With regard to orthopaedic surgery, there is evidence of an increase in infection levels due to Enterobacteriaceae, together with high use of hip and knee replacement surgery and an increasingly obese surgical population, for which there is a higher risk [
40].
Furthermore, data revealed that antibiotic prophylaxis is a possible determining factor, particularly, the prophylaxis with 2 or more antibiotics is the most significant predictor of the model (
p-value = 0.004). This is likely to be related to the characteristics of patients who are prescribed combined antibiotic therapies and who are more fragile. This result is in line with what Nasuzione et al. [
41] reported on patients with chronic kidney disease. Hawn et al. [
42], on the other hand, show that the risk of SSI varies with patient and procedure factors, as well as antibiotic properties, but is not significantly associated with the prophylactic timing of the antibiotic.
In general, the study of risk factors for SSIs has been analyzed in several articles. Fisichella et al. [
43] show that for orthopedic surgery there is a correlation with age, contrary to our study, and with diabetes and smoking. Among these, diabetes is then also significantly correlated with SSI risk for cardiothoracic surgery, as shown by Latham et al. [
44].
In sum, the goal of the work is to increase the knowledge of health professionals on such an important topic as SSIs. In particular, classic statistical analysis is combined with the construction of predictive algorithms to determine whether or not a patient is suffering from an infection. Although the use of Machine Learning in this field is not new, e.g., Montella et al. [
45] have already used it to study healthcare-associated bloodstream infection in neonatal intentional care or Tunthanathip et al. [
46] specifically for neurosurgical operation, in our work we analyze the situation of the whole hospital by including both organizational and clinical-demographic factors of the patients. Knowing a priori through these algorithms the risk of SSI could have a direct impact on the care practices implemented by the hospital.
The limitations of our work are several. One of the main limitations of the study is the retrospectiveness of the analysis. In fact, although the surveillance is active and carried out in real-time, the data analysis is carried out retrospectively together with the data deriving from the hospital discharge forms. It would be desirable to be able to analyze the data in parallel with their collection to create an active, flexible and responsive surveillance system. Another limitation is related to the small period of observation and variables included in the model. As discussed earlier, several additional clinical factors, such as diabetes [
42,
43], have been shown to be significantly correlated to SSI risk but were not included in this work. In addition, further mathematical tools may be implemented on the new dataset [
46]. Finally, processes to identify causes and solutions of the most critical cases were not analyzed. Approaches such as Lean Six Sigma [
47,
48,
49] or Fuzzy Logic [
50] have been shown to be valid supports for reducing the risk of infection and will, therefore, be the subject of future studies.