Prediction on Domestic Violence in Bangladesh during the COVID-19 Outbreak Using Machine Learning Methods

Hossain, Md. Murad; Asadullah, Md.; Rahaman, Abidur; Miah, Md. Sipon; Hasan, M. Zahid; Paul, Tonmay; Hossain, Mohammad Amzad

doi:10.3390/asi4040077

Open AccessArticle

Prediction on Domestic Violence in Bangladesh during the COVID-19 Outbreak Using Machine Learning Methods

by

Md. Murad Hossain

^1,2

,

Md. Asadullah

²

,

Abidur Rahaman

³

,

Md. Sipon Miah

⁴

,

M. Zahid Hasan

²

,

Tonmay Paul

²

and

Mohammad Amzad Hossain

^3,5,*

¹

Department of Computer Science, University of Turin, 10124 Turin, Italy

²

Department of Statistics, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh

³

Department of Information and Communication Engineering, Noakhali Science and Technology University, Noakhali 3814, Bangladesh

⁴

Department of Information and Communication Technology (ICT), Islamic University, Kushtia 7003, Bangladesh

⁵

School of Computer Science, National University of Ireland Galway, H91 TK33 Galway, Ireland

^*

Author to whom correspondence should be addressed.

Appl. Syst. Innov. 2021, 4(4), 77; https://doi.org/10.3390/asi4040077

Submission received: 12 April 2021 / Revised: 30 September 2021 / Accepted: 1 October 2021 / Published: 13 October 2021

(This article belongs to the Section Information Systems)

Download

Browse Figures

Versions Notes

Abstract

:

The COVID-19 outbreak resulted in preventative measures and restrictions for Bangladesh during the summer of 2020—these unstable and stressful times led to multiple social problems (e.g., domestic violence and divorce). Globally, researchers, policymakers, governments, and civil societies have been concerned about the increase in domestic violence against women and children during the ongoing COVID-19 pandemic. In Bangladesh, domestic violence against women and children has increased during the COVID-19 pandemic. In this article, we investigated family violence among 511 families during the COVID-19 outbreak. Participants were given questionnaires to answer, for a period of over ten days; we predicted family violence using a machine learning-based model. To predict domestic violence from our data set, we applied random forest, logistic regression, and Naive Bayes machine learning algorithms to our model. We employed an oversampling strategy named the Synthetic Minority Oversampling Technique (SMOTE) and the chi-squared statistical test to, respectively, solve the imbalance problem and discover the feature importance of our data set. The performances of the machine learning algorithms were evaluated based on accuracy, precision, recall, and F-score criteria. Finally, the receiver operating characteristic (ROC) and confusion matrices were developed and analyzed for three algorithms. On average, our model, with the random forest, logistic regression, and Naive Bayes algorithms, predicted family violence with 77%, 69%, and 62% accuracy for our data set. The findings of this study indicate that domestic violence has increased and is highly related to two features: family income level during the COVID-19 pandemic and education level of the family members.

Keywords:

family violence; machine learning; classification; ROC; accuracy; COVID-19

1. Introduction

COVID-19 is the most devastating global epidemic recorded in recent times. Although the Black Death—which first spread across Europe from 1347 to 1351—had a higher mortality rate than COVID-19, resulting in the deaths of 75–200 million people in Eurasia and North Africa, there are some parallels between the pandemics, including changes in society. COVID-19 first broke out in Wuhan, China, in December 2019, and has since spread throughout the world, causing an increase in global fatalities. Focusing on Bangladesh—the country has economic, psychological, medical, and social problems (including domestic violence). Family violence can be defined as a form of abuse/mistreatment that a family member experiences from another family member. It involves the establishment of control and fear in a relationship through violence and other forms of abuse. Physical assault, psychological abuse, social abuse, financial abuse, and sexual assault are all examples of family violence. The frequency of the violence can be occasional or chronic [1]. Suicides, violent events, and female and child torture—as a result of family violence—are typically reported [2]. Family violence has increased during pandemic circumstances. There is no simple concept in the literature that is able to provide valuable guidance for clinicians who treat family violence survivors [3].

By analyzing the recent literature, some studies have discussed the factors behind family abuse. Other studies have projected family violence using distinct verification and authentication methods involving children, older people, and couples. Some studies have clarified the forms in which COVID-19 has increased family violence during the lockdown period. However, a rare number of papers used machine learning techniques. None of the researchers have attempted to identify and forecast family violence through the use of machine learning software. In our study, using machine learning algorithms, we concentrate on predicting how family violence occurs and what the critical factors behind family violence are. Throughout this article, from a description, validation, and accuracy standpoint, we also utilize various advanced analyses. Our objective is to prove that our suggested model represents the used data set correctly, with precision, accuracy, sensitivity, specificity, an F1-score, and area under the curve (AUC). We hope to unveil the variables and explanations for the increase in family violence during the lockdown. We hope this paper helps to remedy these heinous incidents, and people in Bangladesh are able to lead stable lives.

How can we forecast family violence in Bangladesh during the COVID-19 outbreak?
What are the most important reasons for the increase in domestic violence in Bangladesh during the COVID-19 outbreak?
How can we use machine learning (ML) algorithms to measure family violence?
Which ML algorithm is more applicable for measuring family violence?

The rest of the chapter is organized as follows: the related works are summarized in Section 2. In Section 3, we describe the materials and methods, which consist of the data set collection and processing as well as a machine learning technique. In Section 4, we explain the proposed model architecture. In Section 5, we describe the performance evaluation criteria of our proposed model. The simulation results are given in Section 6. Finally, our conclusion is addressed in Section 7.

2. Related Work

Several researchers have analyzed the domestic violence activities during the COVID-19 outbreak. Babvey et al. [4] gathered information from social media users on family violence. Solórzano et al. [5] investigated the instances of domestic violence in Bangladesh during the ongoing COVID-19 global pandemic. Both quantitative–qualitative approaches were clarified through bibliographic analysis, taking into account accurate and recent sources. Berk et al. [6] considered whether it was possible to obtain reliable predictions of domestic abuse. They introduced algorithms to evidence more than 28,000 court date cases from major metro areas in which perpetrators faced sexual violence charges. As part of a broader initiative to improve pre-trial practices and findings in a wide major city, Xue, Jia et al. [7] investigated the pandemic-related debates, fears, and feelings shared by Twitter users. Machine learning tools were used throughout the collected tweets to identify joint embedding and post-tagging, popular topics, trends, and thoughts. The objective of the authors was to include a vast overview—a national conversation—on family abuse and the coronavirus on social media. They also used the Latent Dirichlet Allocation machine learning method and defined salient trends, issues, and representative tweets [8]. In order to test patients for intimate partner violence (IPV) and injury, Chen et al. [9] introduced machine learning models. The authors also provided information on the advanced models, involving diagnostic files with IPV tags centered on violence reduction initiatives and accident marks by emergency pediatrics congregation clinicians. Random forest, logistic regression, gradient enhanced forest, waistcoat frame neural network, and neural network clinical bidirectional encoder representations from transformer-42 (BERT-42) were used.

Gosangi et al. [10] determined the frequency, rates, and seriousness of injuries in intimate partner abuse relative to the previous three years at the time of the pandemic (in 2020). Gebrewahd et al. [11] determined the severity of domestic violence in northern Ethiopia towards pregnant women. For the causal relationship variables, binary and simultaneous logistic regressions were used to forecast. The authors used central tendency and descriptive analyses based on a cross-sectional community based study. Pfitzner et al. [12] surveyed 166 Victorian practitioners to share their voices and perspectives, in regard to the abuse experienced by women during the COVID-19 lockdown in Victoria, Australia. The authors in [3] determined the link between COVID-19 and domestic abuse. They also attempted to reveal the reasons for increased cases of violence due to COVID-19. The vacancies, loss of earnings, extended residential stays, and vulnerability to actions due to stay-at-home orders were deemed responsible for the increased incidences of family violence. Sediri et al. [13] determined the effects of the COVID-19 lockdown on the mental health and gender-based violence of Tunisian women. They performed an online survey using the depression, anxiety, and stress scale, and the Facebook Bergen scale through the sampling method of networking. Various statistical techniques, such as frequencies, mean, standard deviation, chi-squared tests, odds ratio, analysis of variance (ANOVA), and correlation were used. Evans et al. [14] examined the disparity in records of cases of domestic abuse from police statistics in Atlanta, Georgia, by compiling the residential felony counts mainstreamed to the metropolitan area. They analyzed the fluctuations and severity of the residents and crossed the rows with these studies. A summary of substance abuse and behavioral condition issues was identified to examine the hidden viewpoints in an emerging economic downturn in [15]. They discussed particular ideas that fostered integrity and accountability, security, respect for compatriots, solidarity, mutuality, autonomy, environmental, historical, and sexuality politics. Xiang, Xiaoling et al. [16] analyzed public debates and sentiment regarding older peoples, in regard to, e.g., the pandemic and social networks, and assessed the extent of age discrimination in civil debates. They used a mixture of qualitative thematic analyses from data science approaches and traditional statistics. However, the authors did not use machine learning techniques for data analysis.

Buttell et al. [17] ignored how other natural disasters are not identical to pandemics. During COVID-19, the authors discussed the shifts and times of intimate partner abuse. The authors also attempted to find the same violence scenarios, rather than find incidences of pandemic era teenagers who faced multiple types of bullying, ignorance, and domestic violence (which are U.S. public health problems of concern). It has a greater effect on low-income communities and race. In the recent public health crisis, researchers described adverse childhood experiences and avoided health and social issues. The main health issues most often identified with IPV are addressed at the start of this paper. However, the authors used traditional techniques for data analysis; it is time-consuming and less effective for large data sets than the machine learning techniques. Moreira et al. [18] outlined the current problems faced by healthcare practitioners and offered future guidance on steps to be taken to avoid such cases during and after the COVID-19 pandemic. The purpose of the authors was to outline the coronavirus documentation and juvenile psychological problems associated with the shutdown. They also identified psychiatric problems, such as post-traumatic stress, psychological, and severe anxiety, as well as signs relating to sadness (harmful to teens) when lockdowns extended due to COVID-19 [19]. Abuhammad et al. [20] defined the prejudice among women in Jordan to assess the potential correlation of violence among women during COVID-19. However, the authors did not apply machine learning techniques to detect violence. Amusa et al. [21] addressed the growing academic literature by monitoring the framework connected with the hazards of IPV exposure. The authors also aimed for machine learning approaches that understood concealed and dynamic data trends and regularities. The review by [22] aimed to establish a predictive method that is clinically applicable. However, the main focus of our research is that we used machine learning approaches to detect domestic abuse during the COVID-19 outbreak while taking into account oversampling (SMOTE) difficulties. No study has employed Bangladeshi survey data from students to predict family violence, which we used in our research.

3. Materials and Methods

During the COVID-19 outbreak, domestic violence has increased in Bangladesh. To forecast domestic violence in Bangladesh during this outbreak, we collected data through an internet questionnaire and applied a machine learning-based model to this data set. In this section, we broadly described data collection, data processing, and machine learning algorithms.

3.1. Data Description

To collect data, we surveyed an internet questionnaire on “Domestic Violence in Bangladesh During the COVID-19 Outbreak”. For this survey, we first developed a series of family violence-related questionnaires using the Google doc online platform. Questionnaires are available at https://bit.ly/3h12A7b (accessed on 31 January 2021). Thereafter, we forwarded this link to the respondent via email, messenger, and Facebook to collect the data. Within ten days, we received 511 replies. Our data set consisted of some distinct variables, such as age, gender, marital status, respondent education, profession, family type, number of family members, number of earning person, head of family, religion, residence location, wealth status, income before corona, income after corona, and lost job during coronavirus. All variables with corresponding definitions are detailed in Table 1. Based on the values of these variables, we predicted family violence in Bangladesh during the COVID-19 outbreak. For data processing and analysis, we used the R package version 4.03.

The descriptive characteristics of the study respondents are given in Table 2. The survey included 511 respondents, reflecting a 69.67% male and 30.33% female participation rate. The 438 respondents (85.71%) were Muslims, with respect to their religious inclinations, while 73 (14.28%) were Hindus. Unmarried respondents made up 80.43% of the population in the survey and 19.57% were married. From Table 2, we found that most of the participants were between the ages of 15 and 25 (77.49%). Regarding the educational qualification feature, the highest respondent (63.99%) was an undergraduate student. In relation to profession, the highest respondent (81.02%) was a student. In our data set, 70.45% of the respondents are members of a joint family, 59.30% of the respondents live in rural areas, 82.20% of the respondents belong to middle-class wealth status, and 89.04% of households have 1–2 household earners. On the other hand, 20.35% of respondents or any of their family members have lost their jobs due to the COVID-19 pandemic.

We discovered the variable importance to know which variable was more associated with family violence. The chi-square table of variables is presented in Table 3 to provide a more clear role of these variable in family violence prediction. From the Figure 1 and the chi-squared Table 3, we can easily see that the features: income after corona, income before corona, education, age, residence location, occupation, marital status, and wealth status are the most important reasons that caused an increase in domestic violence in Bangladesh during the COVID-19 outbreak. Whereas religion, gender, family type are less attributable in our data set.

3.2. Data Pre-Processing: Data Normalization

In the first step, the data should be pre-processed to reduce the implementation time and improve the results. For this purpose, we normalized the data, so that the attributes were normalized as follows. In this work, we performed min–max feature scaling (normalization) for all of the features. It is a scaling methodology where esteems are moved and re-scaled so they wind up somewhere in the range of 0 and 1. The principle for applying normalization is given as follows:

X_{n o r m a l i z e d} = \frac{(X - X_{m i n})}{(X_{m a x} - X_{m i n})}

(1)

3.3. Feature Importance Plot

The feature importance describes which features are more helpful or important than other features in the data set. It can help to better understand the solved problem and sometimes lead to model improvements by employing the feature selection. Basically, feature importance is a technique that assigns a score to input features based on how useful they are at predicting a target variable. We computed the feature importance and the chi-squared test from our data set. It is represented in Figure 1 and Table 3.

3.4. Machine Learning Technique

In this sub-section, we can briefly explain the three machine learning (ML) algorithms with mathematical expressions.

3.4.1. Random Forest

The random forest (RF) method is an easy way to include a classifier that, even without calibrating the hyper-parameter, most of the time induces a great outcome. It is also one of the most used frameworks because of its elegance and usability. A significant advantage of RF is that supervised learning questions can be utilized, which make up most cognitive computing programs. Similar to a decision tree or bagging classifier in random trees, there are almost the same hyper-parameters. We can also do regression tasks for random forests using the regression algorithm. The RF algorithm forms a multitude of tree classifiers during which a randomized vector computed individually of the input vector is used to construct each classification model, and each tree imposes a unit vote to assign the input vector according to the most common category.

3.4.2. Logistic Regression

Logistic regression (LR) is a parametric classification model with a certain fixed number of parameters that depend on input features and their output categorical predictions. It is a binary classification model. The LR model is performed based on a logistical function that is defined as follows [23]:

f (X) = \frac{1}{1 + e^{- X}}

(2)

where X is a weighted sum of the input feature, which is defined as

(X = w_{1} x_{1} + w_{2} x_{2} + \dots + w_{n} x_{n})

, here, w is the weight, and n is a number of input features.

Now, the logit form of the logistic model can be obtained by the following formula [24]:

y = l o g i t (p) = l n (\frac{p}{1 - p}) = w^{T} x + b

(3)

where logistic

(l o g i t)

is the ratio of class probabilities, x is the data feature vector, and b is the bias of the model.

Therefore, the benefits of using LR include its flexibility, reliability, and the ability to resist over-fitting without any hyper-parameter tuning in small-scaled data sets.

3.4.3. Naive Bayes

The Naive Bayes (NB) classifier framework is easy to construct for very large volumes of data. It is a mathematical model based on the Bayes’ rule, with the premise that determinants are distinct. In simple terms, an NB learning algorithm from a particular feature in a class is irrelevant to any other functionality being included. The key problem of the naive Bayes approach is the calculation of class conditional density [25]. The conditional class density is typically calculated depending on the data points. Therefore, we may know the conditional class density from unknown data objects identified by probability distributions for unknown classification problems. The equation provided a mechanism to calculate the likelihood function for

P (c)

,

P (x | c)

and,

P (x)

. Look beneath the equation:

P (c | x) = \frac{P (x | c) P (c)}{P (x)}

(4)

where

P (.)

and

P (|)

denotes the probability and the conditional probability, respectively,

P (c | x)

seems to be the posterior probability of group (target) includes integrated (attribute),

P (c)

is the reflection coefficient of class,

P (x | c)

is the probability of class received indicator, and

P (x)

is the likelihood of determinant.

4. Proposed Model Architecture

In this section, we describe how to use machine learning algorithms to measure domestic violence from our data set. Here we propose a machine learning based model to measure domestic violence. The visual demonstration of the proposed model architecture is illustrated in Figure 2. The architecture comprises of several blocks, such as data collection, data pre-processing, data slitter, Synthetic Minority Oversampling Technique (SMOTE), and the ML algorithm for model training and testing. Normalization and scaling techniques are used to pre-process the collected data. The pre-processed data are divided into two groups using data a splitter. The SMOTE is used to solve the imbalance problem of the data set, and it is trained and tested by using processed data. This balanced data set is divided into the training data set (

80 %

) and testing data set (

20 %

) for the ML algorithm. Each ML algorithm is trained and tested individually using the training data set and test data set, respectively. Finally, we calculate the performance of the proposed model using the ML algorithms for our data set.

5. Performance Evaluation Criteria

The results of applying the proposed model architecture on the data set are evaluated through accuracy, precision, recall/sensitivity, and F-Measure criteria calculated from confusion matrix values. A confusion matrix for a typical two-value classification problem is presented in Figure 3.

The values of accuracy, precision, recall, specificity, and F1-score are determined based on the true positive value, true negative value, false positive value, and false negative value.

TP (true positive): the actual observation indicates that domestic violence has occurred and the ML algorithm detects domestic violence from the given data (i.e., the detection result is true positive).
TN (true negative): the actual observation indicates that domestic violence has occurred whereas the ML algorithm cannot detect domestic violence from the given data (i.e., the detection result is a true negative).
FP (false positive): the actual observation indicates that no domestic violence occurred and the ML algorithm indicates that no domestic violence is detected from the given data (i.e., the detection result is a false positive).
FN (false negative): the actual observation indicates that no domestic violence occurred whereas the ML algorithm detects domestic violence from the given data (i.e., the detection result is a false negative).

Accuracy is one of the important classification evaluation criteria that is defined as follows:

\begin{matrix} Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \end{matrix}

(5)

Precision is the ratio between the true positives and all the positives. The precision is defined as follows:

Precision = \frac{TP}{TP + FP}

(6)

Recall/sensitivity is the true positive rate. It measures how frequently the experiment detects the domestic violence from the given data when the actual domestic violence has occurred. The recall is defined as follows:

Recall = \frac{TP}{TP + FN}

(7)

The F1-score is a measure of a model’s accuracy on a data set. It is used to evaluate binary classification systems, which classify examples into ‘positive’ or ‘negative’. The F-score is a way to combine the precision and recall of the model, and it is defined as the harmonic mean of the model’s precision and recall. The F1-score is defined as follows:

\begin{matrix} F 1 ‐ score = 2 \times \frac{Precision \times Recall}{Precision + Recall} \end{matrix}

(8)

Specificity is the ability of an experiment to classify the non-domestic violence case correctly. It is the ability to correctly classify the non-domestic violence case. The specificity is calculated as follows:

Specificity = \frac{TN}{TN + FP}

(9)

6. Simulation Results and Discussion

In this section, to evaluate the family violence prediction accuracy of three ML algorithms (RF, LR, and NB) for our data set, we conduct comprehensive experiments. For our study, 511 responses are considered from the data collection. In our data set, we consider that family violence occurred in 229 families, and family violence did not occur in 282 families. We break our array of data into two sections, where

80 %

is included in the training phase and the corresponding

20 %

is also included in the testing set. We used a 10-fold cross-validation approach to assess the prediction performance of the three ML algorithms.

We created a confusion matrix for the test data to evaluate the predication performance of the three ML algorithms. We assume an imbalanced data set and a balanced data set to create a confusion matrix.

Figure 4, Figure 5 and Figure 6 show the confusion matrix before the balanced data and normalization for RF, LR, and NB algorithms, respectively.

Now, we want to check the four measurement criteria of three ML algorithms for our data set. From Figure 7, we can see that the prediction accuracy of RF, LR, and NB algorithms for the imbalanced data set is

64 %

,

61 %

, and

58 %

, respectively. In this case, the RF algorithm provided the highest accuracy, precision, recall, and F1-score values than LR and NB algorithms. Whereas the NB algorithm provided the lowest accuracy, precision, recall, and F1-score values. Therefore, the LR algorithm provided better detection performance than the other two algorithms for our imbalanced data set.

Figure 8, Figure 9 and Figure 10 show the confusion matrix after balanced data and normalization for RF, LR, and NB algorithms, respectively.

From Figure 4, Figure 5 and Figure 6 and Figure 8, Figure 9 and Figure 10, it can be summarized that the prediction performances of RF, LR, and NB algorithms are better for the balanced data set compared to the imbalanced data set.

We used the SMOTE technique to handle the imbalanced data set. For the balanced data set, the prediction performances of RF, LR, and NB algorithms are improved when compared to the imbalanced data set. From Figure 11, we observe that the prediction accuracies of RF, LR, and NB algorithms for balance data set is

77 %

,

69 %

, and

62 %

, respectively. In this case, the RF algorithm provided the highest detection accuracy, precision, recall, and F1-score values than the LR and NB algorithms. Whereas the NB algorithm provided the lowest detection accuracy, precision, recall, and F1-score values. Therefore, the RL algorithm provided better detection performance than the other two algorithms for our balanced data set.

From Figure 7 and Figure 11, it can be summarized that the prediction performances of RF, LR, and NB algorithms are better for the balanced data set when compared to the imbalanced data set. Moreover, the RL algorithm provided good family violence detection accuracy for the imbalanced and balanced data set.

ROC Curve

The receiver operating characteristic (ROC) curve is a graphical plot used to show the diagnostic abilities of binary classifiers. The main objective of this paper was to identify the family violence and compare the results of a balanced data set and imbalanced data set. From Figure 12, we can observe that the diagnostic abilities for three ML algorithms are meager when we considered the imbalanced data set. It means that fitting the ML algorithm with the imbalanced data set then decreases the diagnostic ability while increasing the classification error. Although with an imbalanced data set, the RF algorithm is the best performer compared to the other ML algorithms, such as LR and NB.

After using SMOTE for balancing our imbalanced data. From Figure 13, we can observe that the diagnostic abilities of three ML algorithms increased with balanced data when compared to the imbalanced data. As a result, the classification error decreased. Here, the RF algorithm provided better results for the balanced data set when compared to both LR and NB algorithms.

7. Conclusions

Domestic violence is a critical social problem across the globe (in both developed and developing countries). The results of this study indicate that domestic violence increased in Bangladesh during the COVID-19 pandemic. In this paper, we proposed the ML algorithm-based model for predicting domestic violence in Bangladesh during the COVID-19 pandemic. We used the chi-squared statistical test to find the feature importance of our data set. We applied the SMOTE technique for data balancing to enhance model performance. We monitored the effectiveness of our model for the three ML algorithms (for the imbalanced and balanced data sets). Our model for the three algorithms provided better performance results for the balanced data set than the unbalanced data set. From the experimental results, for the imbalanced data, we observed that the accuracy of the domestic violence prediction of our model for the RF, LR, and NB algorithms is 64%, 63%, and 58%, respectively. For the balanced data, the accuracy of domestic violence prediction of the RF, LR, and NB classifiers is 77%, 69%, and 62%, respectively. Therefore, the maximum prediction accuracy of our model was achieved by the RF classifier and the lowest prediction accuracy was achieved by NB for both data sets. As a result, we can conclude that the RF algorithm is more applicable to our data set than the other two algorithms for measuring domestic violence. Until now, achieving such predictions has proven complex, but with the increased knowledge and application of ML algorithms, periodic data collections that reflect the state and evolution of society provide new ways to address the challenges of predicting family violence. In this work, the possibility of predicting family violence with acceptable accuracy was proven; we presented the most appropriate technique for selecting features and the best predictive algorithm performance. Moreover, this work, rather than showing concrete results in a specific period of time in Bangladesh, presents a specific methodology to study its viability. With the conclusions drawn, the aim of our study was to create a machine learning based model for predicting domestic violence, not only in Bangladesh, but also in other countries/regions.

In future work, we will use other oversampling techniques with ML algorithms to improve the results.

Author Contributions

Conceptualization: M.M.H. and M.A.; formal analysis: M.M.H., M.A. and M.A.H.; methodology, software, and validation: M.M.H., M.A., M.Z.H., T.P. and M.A.H.; writing—original draft preparation: M.M.H.; writing—review and editing; M.S.M., A.R. and M.A.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Not applicable.

Acknowledgments

This research was supported in part by the Dept. of Statistics, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, 8100, Bangladesh, in part by the Dept. of Information and Communication Engineering, Noakhali Science and Technology University, Noakhali, 3814, Bangladesh, and in part by Dept. of Information and Communication Technology, Islamic University, Kushtia, 7003, Bangladesh.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kaur, R.; Garg, S. Addressing domestic violence against women: An unfinished agenda. Indian J. Community Med. 2008, 33, 73. [Google Scholar] [CrossRef]
Campbell, A.M. An increasing risk of family violence during the Covid-19 pandemic: Strengthening community collaborations to save lives. Forensic Sci. Int. Rep. 2020, 2, 100089. [Google Scholar] [CrossRef]
Sharma, A.; Borah, S.B. Covid-19 and domestic violence: An indirect path to social and economic crisis. J. Fam. Violence 2020, 1–7. [Google Scholar] [CrossRef]
Babvey, P.; Capela, F.; Cappa, C.; Lipizzi, C.; Petrowski, N.; Ramirez-Marquez, J. Using social media data for assessing children’s exposure to violence during the COVID-19 pandemic. Child Abus. Negl. 2021, 116, 104747. [Google Scholar] [CrossRef]
Solórzano, D.A.N.; Gamez, M.R.; de Corcho, O. Gender violence on pandemic of COVID-19. Int. J. Health Sci. 2020, 4, 10–18. [Google Scholar] [CrossRef]
Berk, R.A.; Sorenson, S.B.; Barnes, G. Forecasting domestic violence: A machine learning approach to help inform arraignment decisions. J. Empir. Leg. Stud. 2016, 13, 94–115. [Google Scholar] [CrossRef]
Xue, J.; Chen, J.; Hu, R.; Chen, C.; Zheng, C.; Su, Y.; Zhu, T. Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach. J. Med. Internet Res. 2020, 22, e20550. [Google Scholar] [CrossRef] [PubMed]
Xue, J.; Chen, J.; Chen, C.; Hu, R.; Zhu, T. The Hidden Pandemic of Family Violence During COVID-19: Unsupervised Learning of Tweets. J. Med. Internet Res. 2020, 22, e24361. [Google Scholar] [CrossRef] [PubMed]
Chen, I.Y.; Alsentzer, E.; Park, H.; Thomas, R.; Gosangi, B.; Gujrathi, R.; Khurana, B. Intimate Partner Violence and Injury Prediction From Radiology Reports. arXiv 2020, arXiv:2009.09084. [Google Scholar]
Gosangi, B.; Park, H.; Thomas, R.; Gujrathi, R.; Bay, C.P.; Raja, A.S.; Seltzer, S.E.; Balcom, M.C.; McDonald, M.L.; Orgill, D.P.; et al. Exacerbation of Physical Intimate Partner Violence during COVID-19 Pandemic. Radiology 2021, 298, E38–E45. [Google Scholar] [CrossRef] [PubMed]
Gebrewahd, G.T.; Gebremeskel, G.G.; Tadesse, D.B. Intimate partner violence against reproductive age women during COVID-19 pandemic in northern Ethiopia 2020: A community-based cross-sectional study. Reprod. Health 2020, 17, 1–8. [Google Scholar] [CrossRef] [PubMed]
Pfitzner, N.; Fitz-Gibbon, K.; True, J. Responding to the ‘Shadow Pandemic’: Practitioner Views on the Nature of and Responses to Violence against Women in Victoria, Australia during the COVID-19 Restrictions; Monash University: Melbourne, Australia, 2020. [Google Scholar]
Sediri, S.; Zgueb, Y.; Ouanes, S.; Ouali, U.; Bourgou, S.; Jomli, R.; Nacef, F. Women’s mental health: Acute impact of COVID-19 pandemic on domestic violence. Arch. Women’s Ment. Health 2020, 23, 749–756. [Google Scholar] [CrossRef] [PubMed]
Evans, D.P.; Hawk, S.R.; Ripkey, C.E. Domestic Violence in Atlanta, Georgia Before and During COVID-19. Violence Gend. 2020, 8, 140–147. [Google Scholar] [CrossRef]
Collin-Vézina, D.; Brend, D.; Beeman, I. When it counts the most: Trauma-informed care and the COVID-19 global pandemic. Dev. Child Welf. 2020, 2, 172–179. [Google Scholar] [CrossRef]
Xiang, X.; Lu, X.; Halavanau, A.; Xue, J.; Sun, Y.; Lai, P.H.L.; Wu, Z. Modern senicide in the face of a pandemic: An examination of public discourse and sentiment about older adults and COVID-19 using machine learning. J. Gerontol. Ser. B 2021, 76, e190–e200. [Google Scholar] [CrossRef] [PubMed]
Buttell, F.; Ferreira, R.J. The hidden disaster of COVID-19: Intimate partner violence. Psychol. Trauma Theory Res. Pract. Policy 2020, 12, S197. [Google Scholar] [CrossRef]
Moreira, D.N.; da Costa, M.P. The impact of the Covid-19 pandemic in the precipitation of intimate partner violence. Int. J. Law Psychiatry 2020, 71, 101606. [Google Scholar] [CrossRef]
Guessoum, S.B.; Lachal, J.; Radjack, R.; Carretier, E.; Minassian, S.; Benoit, L.; Moro, M.R. Adolescent psychiatric disorders during the COVID-19 pandemic and lockdown. Psychiatry Res. 2020, 291, 113264. [Google Scholar] [CrossRef]
Abuhammad, S. Violence against Jordanian Women during COVID-19 Outbreak. Int. J. Clin. Pract. 2021, 75, e13824. [Google Scholar] [CrossRef] [PubMed]
Amusa, L.B.; Bengesai, A.V.; Khan, H.T. Predicting the vulnerability of women to intimate partner violence in South Africa: Evidence from tree-based machine learning techniques. J. Interpers. Violence 2020, 0886260520960110. [Google Scholar] [CrossRef]
Wang, K.Z.; Bani-Fatemi, A.; Adanty, C.; Harripaul, R.; Griffiths, J.; Kolla, N.; Gerretsen, P.; Graff, A.; De Luca, V. Prediction of physical violence in schizophrenia with machine learning algorithms. Psychiatry Res. 2020, 289, 112960. [Google Scholar] [CrossRef] [PubMed]
Miah, M.S.; Hossain, M.A.; Ahmed, K.M.; Rahman, M.; Calhan, A.; Cicioglu, M. Machine Learning-Based Malicious User Detection in Energy Harvested Cognitive Radio-Internet of Things. TechRxiv 2021. [Google Scholar] [CrossRef]
Borucka, A.; Grzelak, M. Application of logistic regression for production machinery efficiency evaluation. Appl. Sci. 2019, 9, 4770. [Google Scholar] [CrossRef] [Green Version]
Abubakar, A. Comparative Analysis of Classification Algorithms Using CNN Transferable Features: A Case Study Using Burn Datasets from Black Africans. Appl. Syst. Innov. 2020, 3, 43. [Google Scholar] [CrossRef]

Figure 1. Feature importance score.

Figure 2. A block diagram of the proposed model architecture.

Figure 3. A confusion matrix of the proposed model architecture.

Figure 4. Confusion matrix before balanced data and normalization for RF.

Figure 5. Confusion matrix before balanced data and normalization for RF.

Figure 6. Confusion matrix before balanced data and normalization for NB.

Figure 7. Percentage of classification results with imbalanced data set.

Figure 8. Confusion matrix after balanced data and normalization for RF.

Figure 9. Confusion matrix after balanced data and normalization for RF.

Figure 10. Confusion matrix after balanced data and normalization for NB.

Figure 11. Percentage of classification results with balanced data set.

Figure 12. The diagnostic result with imbalanced data.

Figure 13. The diagnostic result with the balanced data set (using the SMOTE technique).

Table 1. Series of questions in the study dealing with family violence.

Items	Corresponding Definitions
Age	The participant’s age
Gender	The participant’s gender
Marital status	Marital status of the participant
Respondent education	Educational qualification of the participant
Profession	Occupation of the respondent
Family type	Family type of the respondent
Number of family members	Number of family members of the respondent
Number of earners	Number of earners in the family of the respondent
Head of family	Head of family of the respondent
Religion	Religion of the respondent
Residence location	Place of residence of the respondent
Wealth status	Wealth status of the participant
Income before coronavirus	The participant’s monthly family income before coronavirus
Income after coronavirus	The participant’s monthly family income after coronavirus
Lost job during coronavirus	Respondent or any family member lost job during the pandemic

Table 2. Descriptive characteristics of study respondents.

Indicator’s		N = 511, n (%)
Age	15–25	396 (77.49)
	26–35	100 (19.57)
	36–45	11 (2.15)
	46–55	3 (0.59)
	56–65	1 (0.19)
Gender	Male	356 (69.67)
Gender	Female	155 (30.33)
Marital Status	Unmarried	411 (80.43)
Marital Status	Married	100 (19.57)
Respondent Education	HSC	34 (6.65)
	Undergraduate	327 (63.99)
	Graduate	46 (9.00)
	Post Graduate	93 (18.20)
	PhD	11 (2.15)
Profession	Student	414 (81.02)
	Private Employee	51 (9.98)
	Government Employee	46 (9.00)
Family Type	Joint	360 (70.45)
Family Type	Single	151 (29.55)
Number of Family Members	1–4	82 (16.05)
	5–8	348 (68.10)
	8+	81 (15.85)
Number of Earners	1–2	455 (89.04)
	3–4	48 (9.40)
	4+	8 (1.56)
Head of Family	Father	405 (79.28)
	Husband	43 (8.41)
	Mother	34 (6.65)
	Others	29 (5.67)
Religion	Muslim	438 (85.71)
Religion	Hinduism	73 (14.28)
Residence Location	Rural	303 (59.30)
Residence Location	Urban	208 (40.70)
Wealth Status	Middle	420 (82.20)
	Poor	68 (13.71)
	Poorest	14 (2.74)
	Rich	9 (1.76)
Income before Coronavirus	5000	31 (6.07)
	5000–15,000	147 (28.77)
	16,000–25,000	107 (20.94)
	26,000–35,000	66 (12.91)
	36,000–50,000	66 (12.91)
	50,000+	94 (18.40)
Income after Coronavirus	5000	115 (22.50)
	5000–15,000	125 (24.46)
	16,000–25,000	80 (16.66)
	26,000–35,000	60 (11.74)
	36,000–50,000	57 (11.15)
	50,000+	74 (14.48)
Lost Job in Coronavirus Pandemic	No	407 (79.65)
Lost Job in Coronavirus Pandemic	Yes	104 (20.35)

Table 3. Chi-squared table for the variables.

Variables	p-Value	Association with Family Violence
Age	$3.964 \times 10^{- 5 *}$	Associated
Gender	0.5029	Not Associated
Marital Status	$8.886 \times 10^{- 6 *}$	Associated
Education	$1.717 \times 10^{- 6 *}$	Associated
Profession	$1.536 \times 10^{- 5 *}$	Associated
Family Type	0.7448	Not Associated
Family Members	0.1010	Not Associated
Number of Earners	0.4894	Not Associated
Head of Family	0.2999	Not Associated
Religion	0.3451	Not Associated
Wealth Status	$8.053 \times 10^{- 5 *}$	Associated
Residence Location	$1.164 \times 10^{- 5 *}$	Associated
Income before Coronavirus	$5.058 \times 10^{- 9 *}$	Associated
Income after Coronavirus	$1.979 \times 10^{- 11 *}$	Associated
Lost Job in Coronavirus	$1.829 \times 10^{- 5 *}$	Associated

* Indicates the statistical significance of the variable. If the p-value of the variable falls below the significance level (5%), then the variable is statistically significant.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hossain, M.M.; Asadullah, M.; Rahaman, A.; Miah, M.S.; Hasan, M.Z.; Paul, T.; Hossain, M.A. Prediction on Domestic Violence in Bangladesh during the COVID-19 Outbreak Using Machine Learning Methods. Appl. Syst. Innov. 2021, 4, 77. https://doi.org/10.3390/asi4040077

AMA Style

Hossain MM, Asadullah M, Rahaman A, Miah MS, Hasan MZ, Paul T, Hossain MA. Prediction on Domestic Violence in Bangladesh during the COVID-19 Outbreak Using Machine Learning Methods. Applied System Innovation. 2021; 4(4):77. https://doi.org/10.3390/asi4040077

Chicago/Turabian Style

Hossain, Md. Murad, Md. Asadullah, Abidur Rahaman, Md. Sipon Miah, M. Zahid Hasan, Tonmay Paul, and Mohammad Amzad Hossain. 2021. "Prediction on Domestic Violence in Bangladesh during the COVID-19 Outbreak Using Machine Learning Methods" Applied System Innovation 4, no. 4: 77. https://doi.org/10.3390/asi4040077

Article Menu

Prediction on Domestic Violence in Bangladesh during the COVID-19 Outbreak Using Machine Learning Methods

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Data Description

3.2. Data Pre-Processing: Data Normalization

3.3. Feature Importance Plot

3.4. Machine Learning Technique

3.4.1. Random Forest

3.4.2. Logistic Regression

3.4.3. Naive Bayes

4. Proposed Model Architecture

5. Performance Evaluation Criteria

6. Simulation Results and Discussion

ROC Curve

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI