Next Article in Journal
Hypo-Osmotic Swelling Test and Male Factor
Previous Article in Journal
Should Endometriosis-Associated Ovarian Cancer Alter the Management of Women with an Intact Endometrioma in the Reproductive Age?
Order Article Reprints
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

A Machine Learning Algorithm Predicting Infant Psychomotor Developmental Delay Using Medical and Social Determinants

School of Medicine, Faculty of Health Sciences, Bond University, Gold Coast, QLD 4226, Australia
Reprod. Med. 2023, 4(2), 106-117;
Received: 29 March 2023 / Revised: 30 May 2023 / Accepted: 2 June 2023 / Published: 5 June 2023


Psychomotor developmental delay in infants includes failure to acquire abilities such as sitting, walking, grasping objects and communication at the ages when most infants have acquired these abilities. Known risk factors include a large number of aspects of family environment, socioeconomic position, problems in pregnancy and birth and maternal health. It is clinically useful to be able to screen for developmental delay so that healthcare interventions can be considered. The present research used machine learning (random forest) to create an algorithm predicting psychomotor delay in 9-month-old infants using information ascertainable at birth and in early infancy. The dataset was the UK longitudinal Millennium Cohort study. In total, 53 predictors measuring socioeconomic indicators, paternal, family and social support for the mother, beliefs about good parenting, maternal health, pregnancy and birth were included in the initial algorithm. Feature reduction showed that of the 53 variables, birthweight, gestational age at birth, pre-pregnancy BMI, family income and parents’ ages had the highest feature importance scores and could alone correctly predict developmental delay with over 99% sensitivity and 100% specificity. No features measuring aspects of early infant care or environment meaningfully added to algorithm performance. The relationships between delay and some of the predictors, particularly income, were nonlinear and complex. The results suggest that the risk of psychomotor developmental delay can be identified in early infancy using machine learning, and that the best predictors are factors present prior to and at birth.

1. Introduction

Children’s progress in achieving developmental milestones in infancy and childhood is dependent on a large number of factors. These include growth in utero, size at birth, maternal health, socioeconomic position, genetically inherited developmental patterns, and many family and social factors [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17]. This makes predicting developmental delay in advance so that steps can be taken to avoid it difficult, as there are so many potentially important causes, and the relative importance of each is not clear. For an increasing number of health conditions with complex aetiologies, artificial intelligence (AI) has been successfully applied to identify when an individual is at high risk for a future adverse health outcome, e.g., [18,19]. In the discipline of developmental psychology, the machine learning approach of random forests (RF) has been applied to predict future psychiatric conditions [20] and to predict infant growth using inflammatory markers [21]. The present study applied RF to predict psychomotor developmental delay in 9-month-old infants using data on a wide array of factors in pregnancy, birth and early infancy. The intent was to achieve higher sensitivity and specificity than has been achieved in prior studies approaching similar problems using regression methods, which rarely have greater than 80% sensitivity [15].

1.1. Predictors of Developmental Delay

Developmental delay has been found to be statistically associated with an array of factors both temporally (prior to pregnancy, during the perinatal period and later in infancy) and from both maternal and foetal causes. Preterm birth and low birthweight have been demonstrated to increase the risk of delay by double or greater [11,12,13,14]. Maternal factors including health, gravidity, young motherhood, use of assisted reproductive technologies and bodyweight have also been found to predict delay [8,12,13,14,15,16]. Other factors with large statistical effects on the likelihood of developmental delay include maternal education and socioeconomic position [12,13,15,17]. Some of the reported effects are large; for example, Ozkan et al. found more than a tenfold increased risk of developmental delay in infants of mothers with the lowest level of completed education [13]. The neonatal period and early childcare may also be important determinants of delay, including social support provided, paternal involvement, breastfeeding and other infant care behaviour [1,2,3,4,5,6,7].

1.2. Data Analytic Approach

Several statistical techniques are potentially appropriate for classification problems including predicting developmental delay using a large number of predictors. Most commonly, regression-based approaches have been used. Van Dokkum et al. [15] used logistic regression to predict developmental delay at age four, producing an algorithm with 73% sensitivity and 80% specificity. Another promising linear modelling approach when there is a large number of predictor variables is principal component analysis (PCA). However, both statistical techniques assume linear relationships between values of the predictor variable and the outcome: PCA is based on linear transformation using orthogonal matrices, and logistic regression assumes that the log-odds of the relationship between each predictor and the outcome is linear. There is no reason to believe that predictors have linear associations with developmental delay: for example, birthweight has negative associations with developmental delay at both low and very high levels [15,22], and socioeconomic position may not be important for health outcomes above a threshold level [23]. For the present research, random forest (RF), which is an ensemble decision-tree classifier, was chosen. RF can handle large numbers of predictors (features) simultaneously and does not assume linear or monotonous relationships between predictors and an outcome [24,25]. While most AI approaches have a barrier to entry in that they cannot be implemented in statistical programs commonly used by researchers, RF is straightforward to implement in Stata, as well as in some open-source statistical software such as BlueSky Statistics. Here, RF was applied to an existing national dataset on the birth and lives of a sample of infants born in the UK, the UK Millennium cohort.

2. Methods

2.1. Population and Sample

The UK Millennium cohort sample (henceforth MCS) consists of infants born in the United Kingdom from September 2000 to August 2001, identified using Universal Child Benefit records and NHS Health Visitors [26]. In the British healthcare system, Health Visitors are usually registered nurses who provide ante- and post-natal care and advice in the home. The sample was not a random sample: ethnic minority and low socio-economic groups were oversampled to compensate for loss to follow-up of these segments of the population that occurred in Britain’s earlier longitudinal cohort studies. Here, data were analysed using the first survey of the cohort, which took place when the infants were around 9 months old. The maximum possible sample size for analysis using this cohort is 18,467. A cohort profile is available providing far more detail about the sample and sampling methods [27].

2.2. Outcome Variable

Developmental delay is typically identified in clinical settings using parental questionnaires. The 9-month MCS interview with parents or the main care giver included questions about infant psychomotor development which are very similar in content and format to the Ages and Stages 12-month questionnaire [28]. The aim in creating the dependent variable was to capture infant development across a number of cognitive and motor skill domains. Second, variation in reaching developmental milestones has the most practical or clinical significance if a statistical model is created to predict substantial delay versus the range of normal development. With these aims in mind, a dependent variable was created using parental or main caregiver reports of achievement of developmental milestones. The interview contained 12 questions on cognitive and motor skills development. Responses to the 12 questions were on three-point scales, coded as “1” for the infant frequently demonstrates the developmental milestone, “2” for sometimes, and “3” for the infant has not yet demonstrated the milestone. The 12 items were the following: sits up; smiles; stands up holding on; puts hands together; grabs objects; holds small objects; passes a toy; walks a few steps; gives toy; waves bye-bye; extends arms; nods for yes. The responses were summed into a single score, followed by statistical correction for age in days of the infant. The resulting standardised residuals representing age-corrected developmental delay were then split into a binary variable using two standard deviations as the cut-point.

2.3. Predictor Variables (Features)

The first MCS survey was broad in scope, covering aspects of pregnancy, labour, birth and children’s and their parents’ social, work and economic situations. Many of the variables included in the MCS have been demonstrated to be or could plausibly be associated with child development. Covariates were selected by reading through the MCS variable list and selecting all that appeared appropriate for analysis. The variable selection process is illustrated in Figure 1. Some additive combining of variables was performed where two or more variables were repeated information about a single concept. For example, paternal involvement in infant care was represented in the original data as questions about each individual act of care, such as nappy changing, getting up in the night, etc. These were additively combined to create a single variable. Of note, a decision was made to combine medical problems in pregnancy into a single variable. In descending order of their prevalence in the dataset, the most common were as follows: bleeding in pregnancy, eclampsia, hyperemesis, urinary tract infections, anaemia, and non-trivial infections. These were combined because, conceptually, they should all affect foetal nutrition, and because in initial testing of algorithms they performed very poorly as predictors of developmental delay when included separately. In total, 53 variables were included. For ease of reading, variables were classified into groupings based on the concept that each represented: family and social support; socioeconomic indicators; infant characteristics; beliefs about parenting; medical circumstances in pregnancy and birth; maternal factors; and paternal and family factors. Supplementary material Table S1 includes details of variable coding, the MCS names and any changes made to the original MCS variables.

2.4. Data Analysis

The MCS data were analysed using random forests (RF), a supervised machine learning decision tree algorithm easily implemented in statistical software such as Stata. In building each decision tree, the RF algorithm used half of the data (the training set) and with bootstrapping created sets of decision trees with the bootstrapped subsets of the data which comprise a decision rule at each branch node. The remaining half of the data for each tree (the test set) was used to test how well the algorithm performed classifying observations correctly. Missing data occurred due to unanswered interview items on a small number of variables, particularly paternal support. The RF algorithm contained a proximity algorithm to handle missing observations for features. Observations with a missing value for the outcome variable were dropped from the analysis, and continuous predictors were transformed to z-scores.
All analyses were carried out in Stata 16. For the RF model, the plugin Rforest was used [29]. Algorithm hypertuning of the number of variables included at each split and number of iterations were performed using Stata code developed by Schonlau and Zou [29]. Forwards elimination was applied to produce a reduced model which maximised the number correctly classified using the fewest variables.

3. Results

3.1. Descriptive Statistics

Descriptive statistics are displayed for all variables in Table 1, split into groups of variables as described above.

3.2. RF Algorithms

After hyper-tuning to 23 variables at each split and 30 iterations, the RF algorithm for all 53 predictors had an out of bag error rate of 0.001. Only 19 of 18,432 infants were classified incorrectly, with all incorrect classifications being cases of delay which were not classified correctly (false negatives). Given that an algorithm with 53 features would take significant computing time to make a prediction for new cases, the algorithm was reduced by forwards selection beginning with the most important feature identified in the 53 feature algorithm, which was birthweight. Figure 2 displays the results of this process, in which including six features resulted in misclassification of 24 infants. Figure 3 displays feature importance scores in the 53-feature algorithm. Higher scores indicate higher importance in the algorithm. When running algorithms between 7 and all 53 features, there was only a very gradual drop in the number of cases misclassified (not shown in Figure 2), and thus, 6 features were deemed to successfully combine computational efficiency and classification accuracy. The top six features in relation with developmental delay are shown in Figure 4. Decision tree algorithms do not produce a statistic or parameter estimate showing the direction of association, as they are not linear models. To overcome this, two-way prediction plots are displayed for the reduced (six-variable) algorithm in Figure 4 and for all features in Supplementary Material Figure S1. The plots shown are two-way prediction plots with either a Lowess smooth fit line, a quadratic fit line or as a linear plot for binary predictors (whichever best described the observed relationship). The direction or shape of relationships between developmental delay and all predictors are described in writing in Figure 3.

4. Discussion

4.1. Summary

The RF machine learning approach allowed simultaneous analysis of a large number of maternal, paternal, social and health-related factors. The algorithm performed very well when applied to the test data, with sensitivity at the level of a very good diagnostic medical test. The results were consistent with developmental delay having a complex aetiology: 47 variables had importance scores above 0.2. However, prediction did not improve substantially beyond the top six features, which were birthweight, the infant’s gestational age at birth, maternal BMI, household income and maternal and paternal ages. None of the six most important features had linear relationships with developmental delay. Maternal pre-pregnancy BMI was a risk factor for developmental delay below a BMI of 18 and above 45. Household income had an important but complex nonlinear relationship with developmental delay.

4.2. Comparisons to Prior Studies

Preterm birth and low birth weight often occur together and are associated with at least a doubling of the odds of developmental delay [11,12,13,14] The results reported here showed that these two variables were not only the most important in the RF algorithms but were the most important variables by a substantial margin (see Figure 2, Figure 3 and Figure 4). Figure 4 displays the proportion of infants with developmental delay at 9 months of age for the top six most important predictors. It shows that infants born below 1 kg had around a 50% probability of delay, while birth weights above 3 kg were associated with less than a 5% probability of delay. Preterm birth had a similarly large effect on the probability of delay (see Figure 4, top right).
While effect of low birth weight and preterm birth in the Millennium Cohort were consistent with patterns observed in prior research, the effects of socioeconomic indicators on developmental delay were not as clear. For example, Ozkan et al. [13] found that across the range of maternal education there was a tenfold increase in risk of delay. In the RF model, parental income was an important but non-linear predictor of delay which was associated with around a 3% risk of developmental delay at its lowest (middle income) and a 4% risk at low income (see Figure 4, centre right). This indicates that while income was a useful variable for classifying cases of delay in the Millennium Cohort despite having a non-linear association, the size of the effect was small. It should be noted that the increased risk of delay at high incomes visible in Figure 4 may be confounded if older mothers are more likely to have high household income.
Maternal age has previously been found to have the opposite relationship to developmental delay to what was found here: there was a monotonic trend towards lower risk of delay beginning with the youngest mothers (see Figure 4, bottom left). In prior research, infants of teenage mothers had an increased risk of delay [13]. Prior research additionally highlighted the importance of maternal education [16,17]. Here, income had a higher importance score than maternal education.
Pre-existing maternal obesity has been found to predict developmental delay in linear statistical models [15]. Maternal pre-pregnancy BMI was an important predictor of developmental delay using RF, but a nonlinear association was present in the Millennium Cohort, where increased risk is evident below a BMI of 18 and above 45. Risk was relatively constant between a BMI of 18 and 45, and the majority of women in the sample fell within this range: a BMI of 18 represents the third percentile, and a BMI of 45 represents the 99.8th percentile.

4.3. Study Limitations

A prospective longitudinal study design would be necessary to confirm algorithm performance in a clinical setting. Psychomotor delay in the MCS 9-month interview was measured using fewer items than are typically found in established scales such as Age and Stages. The same data quality issue applies more generally to most of the concepts in this analysis: national cohort study data allows for large analysis sample sizes and the potential for high statistical power, but this comes at a cost to the level of detail gathered about each concept: for example, family support variables were from interview rather than methods which directly measure social support. Methods that directly measure or change social support would be preferable.

5. Conclusions

RF can be easily implemented in statistical software such as Stata, as well as in open source software such as BlueSky Statistics and JASP. It is preferable to regression when there is a large number of potentially important predictors of an outcome and substantial nonlinearity in relationships between predictors and an outcome. A disadvantage is that other than producing classification accuracy, sensitivity and specificity values, the underlying concepts and results interpretation are not familiar to the majority of medical and social science researchers. The results of the RF modelling here showed remarkably high sensitivity and specificity of almost 100%, which is far in excess of existing regression-based algorithms predicting developmental delay [15]. The features with the highest importance scores can all be discerned at birth: no features measuring aspects of early infant care or environment meaningfully added to algorithm performance. This implies that screening for developmental delay can be successfully implemented in the neonatal period. Maternal health problems during pregnancy, including eclampsia, bleeding and non-trivial infections also had lower importance scores than expected. This may be because eclampsia and other problems during pregnancy are associated with preterm birth and low birthweight rather than directly predicting developmental delay [30].

Supplementary Materials

The following supporting information can be downloaded at:, Figure S1: Two-way prediction plots for all features; Table S1: Variable coding and data decsions.


This research received no external funding.

Institutional Review Board Statement

Detailed information on ethical approval can be accessed here: (accessed on 1 November 2022).

Informed Consent Statement

Informed consent was given by the MCS cohort members. For details see: (accessed on 1 November 2022).

Data Availability Statement

The data used in this study are available free of charge via the UK Data Service.!?Search=&Rows=10&Sort=0&DataTypeFacet=Cohort%20and%20longitudinal%20studies&Page=1&DateFrom=440&DateTo=2022 (accessed on 15 January 2021).

Conflicts of Interest

The author declare no conflict of interest.


  1. Sadruddin, A.F.; Ponguta, L.A.; Zonderman, A.L.; Wiley, K.S.; Grimshaw, A.; Panter-Brick, C. How do grandparents influence child health and development? A systematic review. Soc. Sci. Med. 2019, 239, 112476. [Google Scholar] [CrossRef]
  2. Erel, O.; Oberman, Y.; Yirmiya, N. Maternal versus nonmaternal care and seven domains of children’s development. Psychol. Bull. 2000, 126, 727–747. [Google Scholar] [CrossRef] [PubMed]
  3. Crnic, K.A.; Greenberg, M.T.; Ragozin, A.S.; Robinson, N.M.; Basham, R.B. Effects of Stress and Social Support on Mothers and Premature and Full-Term Infants. Child Dev. 1983, 54, 209. [Google Scholar] [CrossRef]
  4. Shaver, J.H.; Power, E.A.; Purzycki, B.G.; Watts, J.; Sear, R.; Shenk, M.K.; Sosis, R.; Bulbulia, J.A. Church attendance and alloparenting: An analysis of fertility, social support and child development among English mothers. Philos. Trans. R. Soc. B 2020, 375, 20190428. [Google Scholar] [CrossRef] [PubMed]
  5. Sacker, A.; Quigley, M.A.; Kelly, Y.J. Breastfeeding and Developmental Delay: Findings from the Millennium Cohort Study. Pediatrics 2006, 118, e682–e689. [Google Scholar] [CrossRef] [PubMed][Green Version]
  6. Chiu, W.C.; Liao, H.F.; Chang, P.J.; Chen, P.C.; Chen, Y.C. Duration of breast feeding and risk of developmental delay in Taiwanese children: A nationwide birth cohort study. Paediatr. Perinat. Epidemiol. 2011, 25, 519–527. [Google Scholar] [CrossRef]
  7. Belsky, J. Early child care and early child development: Major findings of the NICHD study of early child care. Eur. J. Dev. Psychol. 2006, 3, 95–110. [Google Scholar] [CrossRef]
  8. Waynforth, D. Effects of Conception Using Assisted Reproductive Technologies on Infant Health and Development: An Evolutionary Perspective and Analysis Using UK Millennium Cohort Data. Yale J. Biol. Med. 2018, 91, 225–235. [Google Scholar] [PubMed]
  9. Brown, M.A.; McIntyre, L.L.; Crnic, K.A.; Baker, B.L.; Blacher, J. Preschool Children With and Without Developmental Delay: Risk, Parenting, and Child Demandingness. J. Ment. Health Res. Intellect. Disabil. 2011, 4, 206–226. [Google Scholar] [CrossRef][Green Version]
  10. Conde-Agudelo, A.; Castaño, F.; Norton, M.H.; Rosas-Bermudez, A. Effects of Birth Spacing on Maternal, Perinatal, Infant, and Child Health: A Systematic Review of Causal Mechanisms. Stud. Fam. Plan. 2012, 43, 93–114. [Google Scholar] [CrossRef]
  11. McIntire, D.D.; Bloom, S.L.; Leveno, K.; Casey, B.M. Birth Weight in Relation to Morbidity and Mortality among Newborn Infants. N. Engl. J. Med. 1999, 340, 1234–1238. [Google Scholar] [CrossRef] [PubMed]
  12. Ketterlinus, R.D.; Henderson, S.H.; Lamb, M.E. Maternal age, sociodemographics, prenatal health and behavior: Influences on neonatal risk status. J. Adolesc. Health Care 1990, 11, 423–431. [Google Scholar] [CrossRef] [PubMed]
  13. Ozkan, M.; Senel, S.; Arslan, E.A.; Karacan, C.D. The socioeconomic and biological risk factors for developmental delay in early childhood. Eur. J. Pediatr. 2012, 171, 1815–1821. [Google Scholar] [CrossRef] [PubMed]
  14. Liu, X.; Sun, Z.; Neiderhiser, J.M.; Uchiyama, M.; Okawa, M. Low birth weight, developmental milestones, and behavioral problems in Chinese children and adolescents. Psychiatry Res. 2001, 101, 115–129. [Google Scholar] [CrossRef] [PubMed]
  15. Van Dokkum, N.H.; Reijneveld, S.A.; Heymans, M.W.; Bos, A.F.; de Kroon, M.L.A. Development of a Prediction Model to Identify Children at Risk of Future Developmental Delay at Age 4 in a Population-Based Setting. Int. J. Environ. Res. Public Health 2020, 17, 8341. [Google Scholar] [CrossRef]
  16. Abubakar, A.; Holding, P.; Van de Vijver, F.J.R.; Newton, C.; van Baar, A. Children at risk for developmental delay can be recognised by stunting, being underweight, ill health, little maternal schooling or high gravidity. J. Child Psychol. Psychiatry 2009, 51, 652–659. [Google Scholar] [CrossRef][Green Version]
  17. Najman, J.M.; Bor, W.; Morrison, J.; Andersen, M.; Williams, G. Child developmental delay and socio-economic disadvantage in Australia: A longitudinal study. Soc. Sci. Med. 1992, 34, 829–835. [Google Scholar] [CrossRef][Green Version]
  18. Myszczynska, M.A.; Ojamies, P.N.; Lacoste, A.M.B.; Neil, D.; Saffari, A.; Mead, R.; Hautbergue, G.M.; Holbrook, J.D.; Ferraiuolo, L. Applications of machine learning to diagnosis and treatment of neurodegenerative diseases. Nat. Rev. Neurol. 2020, 16, 440–456. [Google Scholar] [CrossRef]
  19. Goecks, J.; Jalili, V.; Heiser, L.M.; Gray, J.W. How Machine Learning Will Transform Biomedicine. Cell 2020, 181, 92–101. [Google Scholar] [CrossRef]
  20. Usta, M.B.; Karabekiroglu, K.; Say, G.N.; Gumus, Y.Y.; Aydın, M.; Sahin, B.; Bozkurt, A.; Karaosman, A.A.; Cobanoglu, C.; Kurt, D.A.; et al. Can We Predict Psychiatric Disorders at the Adolescence Period in Toddlers? A Machine Learning Approach. Psychiatry Behav. Sci. 2020, 10, 7–12. [Google Scholar] [CrossRef]
  21. Harrison, E.; Syed, S.; Ehsan, L.; Iqbal, N.T.; Sadiq, K.; Umrani, F.; Ahmed, S.; Rahman, N.; Jakhro, S.; Ma, J.Z.; et al. Machine learning model demonstrates stunting at birth and systemic inflammatory biomarkers as predictors of subsequent infant growth—A four-year prospective study. BMC Pediatr. 2020, 20, 498. [Google Scholar] [CrossRef] [PubMed]
  22. Vora, N.; Bianchi, D.W. Genetic considerations in the prenatal diagnosis of overgrowth syndromes. Prenat. Diagn. 2009, 29, 923–929. [Google Scholar] [CrossRef][Green Version]
  23. Patel, M.; Waynforth, D. Influences of zero hour contracts and disability—Analysis of the 1970 British Cohort study. SSM-Popul. Health 2022, 19, 101182. [Google Scholar] [CrossRef] [PubMed]
  24. Grömping, U. Variable Importance Assessment in Regression: Linear Regression versus Random Forest. Am. Stat. 2009, 63, 308–319. [Google Scholar] [CrossRef]
  25. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef][Green Version]
  26. Ketende, S.; Jones, E. User Guide to Analysing MCS Data Using Stata; Centre for Longitudinal Studies: London, UK, 2011. [Google Scholar]
  27. Connelly, R.; Platt, L. Cohort Profile: UK Millennium Cohort Study: MCS. Int. J. Epidemiol. 2014, 43, 1719–1725. [Google Scholar] [CrossRef][Green Version]
  28. Bricker, D.; Squires, J.; Mounts, L.; Potter, L.; Nickel, R.; Twombly, E.; Farrell, J. Ages and Stages Questionnaire; Paul, H. Brookes: Baltimore, MD, USA, 1999. [Google Scholar]
  29. Schonlau, M.; Zou, R.Y. The Random Forest Algorithm for Statistical Learning. Stata J. 2020, 20, 3–29. [Google Scholar] [CrossRef]
  30. Bilano, V.L.; Ota, E.; Ganchimeg, T.; Mori, R.; Souza, J.P. Risk Factors of Pre-Eclampsia/Eclampsia and Its Adverse Outcomes in Low- and Middle-Income Countries: A WHO Secondary Analysis. PLoS ONE 2014, 9, e91198. [Google Scholar] [CrossRef] [PubMed][Green Version]
Figure 1. Variable selection procedure for RF algorithm.
Figure 1. Variable selection procedure for RF algorithm.
Reprodmed 04 00012 g001
Figure 2. Number of correctly classified observations plotted with out of bag error for RF algorithms. The top 1 feature model includes birthweight only, and the top 2 includes birthweight and gestational age at birth. Beyond the six features with the highest importance scores in the full model, very little improvement in classification was evident.
Figure 2. Number of correctly classified observations plotted with out of bag error for RF algorithms. The top 1 feature model includes birthweight only, and the top 2 includes birthweight and gestational age at birth. Beyond the six features with the highest importance scores in the full model, very little improvement in classification was evident.
Reprodmed 04 00012 g002
Figure 3. Importance plot using the feature importance scores from the 53-feature RF algorithm. Red bars = family and social support variables; green = socioeconomic indicators; dark blue = infant characteristics; light blue = beliefs about parenting; purple = medical factors in pregnancy and birth; yellow = maternal factors; orange = paternal and family factors.
Figure 3. Importance plot using the feature importance scores from the 53-feature RF algorithm. Red bars = family and social support variables; green = socioeconomic indicators; dark blue = infant characteristics; light blue = beliefs about parenting; purple = medical factors in pregnancy and birth; yellow = maternal factors; orange = paternal and family factors.
Reprodmed 04 00012 g003
Figure 4. Two-way prediction plots displaying the shapes of the associations between the features with the highest importance scores and psychomotor delay. Lines are Lowess smoothed.
Figure 4. Two-way prediction plots displaying the shapes of the associations between the features with the highest importance scores and psychomotor delay. Lines are Lowess smoothed.
Reprodmed 04 00012 g004
Table 1. Variable coding and descriptive statistics. All variables are from maternal or main care provider interviews.
Table 1. Variable coding and descriptive statistics. All variables are from maternal or main care provider interviews.
VariableCodingObsMean (Std.Dev.)Min–Max
Outcome and its constituent child development measures
Development below 2SD (age-adjusted) 18,4320.038 (0.191)0–1
Smiles1 = often, 2 = sometimes, 3 = not yet18,4321.006 (0.082)1–3
Sits up18,4321.066 (0.318)1–3
Stands up holding on18,4321.475 (0.78)1–3
Puts hands together18,4321.209 (0.532)1–3
Grabs objects18,4321.01 (0.117)1–3
Holds small objects18,4321.147 (0.454)1–3
Passes a toy18,4321.065 (0.295)1–3
Walks a few steps18,4322.81 (0.519)1–3
Gives a toy18,4321.52 (0.717)1–3
Waves bye-bye18,4321.912 (0.839)1–3
Extends arms18,4321.205 (0.499)1–3
Nods for yes18,4322.72 (0.617)1–3
Family and social support
Frequency mother sees her mother0 = lives with mother, 1 = every day, to 8 = never18,5443.277 (2.352)0–8
Mother has other parents to talk to1 = most, to 5 = least17,8052.096 (1.016)1–5
Family would help if financial problemsStrongly agree = 1 to strongly disagree = 517,8031.747 (0.971)1–5
Number of types of financial help from grandparentsGifts, money for daycare, essentials, trust funds, household items, other 18,5471.235 (1.057)0–6
Frequency mother reports spending time with friends1 = every day, to 5 = never or no friends18,5272.958 (0.974)1–5
Number of people who attended birth 18,4321.12 (0.495)0–4
Family-based infant care in work hours1 = no, 2 = yes18,3871.17 (0.375)1–2
Grandparent lives in household1 = yes, 2 = no18,4321.921 (0.269)1–2
Socioeconomic indicators
Equivalised household incomeMcClement’s equivalised income18,432296.833 (217.102)14.31–1250.78
Age mother left full time education 18,34117.578 (2.848)5–36
Partner’s SES from jobNS-SEC 7 classes, 1 = highest, 7 = lowest, 8 = not in work18,4325.352 (2.641)1–8
Partner’s employment status1 = employed, 2 = self-employed, 3 = looking for work, 4 = not seeking work due to health, 5 = New Deal/apprenticeship, 6 = student, 7 = no partner/unknown18,4323.388 (3.084)1–8
Mother employedMother in paid work at 9 month interview = 1, else = 218,3991.448 (0.497)1–2
Winter temperature in room where baby sleeps5-point scale where 1 = warmest and 5 = cold18,3102.301 (0.745)1–5
Mother’s report of pollution & grime in neighbourhoodReported on a 4-point scale, 1 = most, to 4 = least pollution18,2183.089 (0.892)1–4
Infant characteristics
Infant’s sex1 = male, 2 = female18,4321.487 (0.5)1–2
Infant has all immunisations1 = yes, 2 = no18,1751.039 (0.194)1–2
Infant’s age in days when mother was interviewed 18,432295.487 (15.23)243–382
Infant’s number of reported illness 18,4221.633 (1.992)0–50
Infant’s number of accidents 18,4300.083 (0.296)0–5
Beliefs about parenting & parenting practices
Beliefs: Baby should be picked up when cries1 = strongly agree, to 5 = strongly disagree17,8102.966 (1.045)1–5
Beliefs: Stimulation is important for infant development1 = strongly agree, to 5 = strongly disagree17,8061.431 (0.626)1–5
Beliefs: Talking to infants is important1 = strongly agree, to 5 = strongly disagree17,8141.200 (0.448)1–5
Beliefs: cuddling infants is important1 = strongly agree, to 5 = strongly disagree17,8151.191 (0.452)1–5
Bed co-sleeping main sleeping arrangement in first 9 months1 = no, 2 = yes18,4311.089 (0.285)1–2
Breastfed at least 1 week1 = no, 2 = yes18,4311.536 (0.499)1–2
Work hours infant care is daycare centre1 = no, 2 = yes18,4321.115 (0.319)1–2
Main work hours infant care is mother1 = no, 2 = yes18,4321.691 (0.462)1–2
Factors in pregnancy & birth
Birthweight (kg) 18,3823.344 (0.589)0.39–7.23
Estimated gestational age at birth (days) 18,201275.727 (14.056)168–301
Number of pharmacological pain interventions in labour 18,2930.731 (0.667)0–4
Infant conceived using fertility treatment1 = no, 2 = yes18,4251.974 (0.159)1–2
Duration of labourIn hours, C-section = 017,6809.160 (11.145)0–100
Type of delivery1 = normal, C-section & emergency = 218,3981.313 (0.464)1–2
Singleton birth1 = singleton, 2 = twin, 3 = triplet18,4321.014 (0.123)1–3
Pregnancy illnesses (e.g., preeclampsia)1 = yes, 2 = no18,3961.623 (0.485)1–2
Place of birthHospital = 1, else 218,4011.020 (0.142)1–2
How long mother and infant stayed in hospital after birth1 = weeks, 2 = days, 3 = hours18,0202.046 (0.421)1–3
Received full ante-natal care1 = yes, 2 = no18,3911.038 (0.192)1–2
Maternal factors
Mother’s pre-pregnancy body mass index 16,81323.649 (4.451)11.65–59.18
Mother’s birth year 18,4261972 (5.95)1949–1987
Mother reports being tired all the time1 = yes, 2 = no17,8051.509 (0.5)1–2
Mother reports being depressed1 = yes, 2 = no17,8021.849 (0.358)1–2
Average number of cigarettes mother smokes per day 18,4203.315 (6.271)0–60
Frequency mother drinks alcoholEvery day = 1 to never = 718,4295.134 (1.49)1–7
Mother has longstanding illness1 = yes, 2 = no18,4251.789 (0.408)1–2
Number of months pregnant at interview 18,4230.196 (1.013)0–10
Paternal & family factors
Ethnicity1 = white, 2 = mixed, 3 = India, 4 = Pakistani, 5 = Bangladeshi, 6 = Caribbean, 7 = African, 8 = East Asian & others18,4021.627 (1.609)1–8
Father present in household0 = yes, 1 = no18,4030.172 (0.378)0–1
Father’s age when infant was born 18,39531.91 (5.713)15–68
Paternal involvement score: how much help father isSummed score of how often father does: general childcare, feeding, getting up in night, changing nappies. 1 = least, to 21 = most16,25510.205 (5.868)1–21
Birth interval in months from older sibling 899742.803 (27.86)9–318
Number of siblings in household 18,4320.938 (1.081)0–9
Mother reports partner sensitive and aware of her needsStrongly agree = 1 to strongly disagree = 514,3581.986 (0.929)1–5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Waynforth, D. A Machine Learning Algorithm Predicting Infant Psychomotor Developmental Delay Using Medical and Social Determinants. Reprod. Med. 2023, 4, 106-117.

AMA Style

Waynforth D. A Machine Learning Algorithm Predicting Infant Psychomotor Developmental Delay Using Medical and Social Determinants. Reproductive Medicine. 2023; 4(2):106-117.

Chicago/Turabian Style

Waynforth, David. 2023. "A Machine Learning Algorithm Predicting Infant Psychomotor Developmental Delay Using Medical and Social Determinants" Reproductive Medicine 4, no. 2: 106-117.

Article Metrics

Back to TopTop