Next Article in Journal
Validation of Visually Identified Muscle Potentials during Human Sleep Using High Frequency/Low Frequency Spectral Power Ratios
Next Article in Special Issue
Recognition of Maize Phenology in Sentinel Images with Machine Learning
Previous Article in Journal
MEMS-Scanner Testbench for High Field of View LiDAR Applications
Previous Article in Special Issue
Early Detection and Classification of Tomato Leaf Disease Using High-Performance Deep Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Over 20 Years of Machine Learning Applications on Dairy Farms: A Comprehensive Mapping Study

Department of Process, Energy and Transport Engineering, Munster Technological University, T12 P928 Cork, Ireland
*
Author to whom correspondence should be addressed.
Sensors 2022, 22(1), 52; https://doi.org/10.3390/s22010052
Submission received: 2 December 2021 / Revised: 17 December 2021 / Accepted: 19 December 2021 / Published: 22 December 2021
(This article belongs to the Collection Machine Learning in Agriculture)

Abstract

:
Machine learning applications are becoming more ubiquitous in dairy farming decision support applications in areas such as feeding, animal husbandry, healthcare, animal behavior, milking and resource management. Thus, the objective of this mapping study was to collate and assess studies published in journals and conference proceedings between 1999 and 2021, which applied machine learning algorithms to dairy farming-related problems to identify trends in the geographical origins of data, as well as the algorithms, features and evaluation metrics and methods used. This mapping study was carried out in line with PRISMA guidelines, with six pre-defined research questions (RQ) and a broad and unbiased search strategy that explored five databases. In total, 129 publications passed the pre-defined selection criteria, from which relevant data required to answer each RQ were extracted and analyzed. This study found that Europe (43% of studies) produced the largest number of publications (RQ1), while the largest number of articles were published in the Computers and Electronics in Agriculture journal (21%) (RQ2). The largest number of studies addressed problems related to the physiology and health of dairy cows (32%) (RQ3), while the most frequently employed feature data were derived from sensors (48%) (RQ4). The largest number of studies employed tree-based algorithms (54%) (RQ5), while RMSE (56%) (regression) and accuracy (77%) (classification) were the most frequently employed metrics used, and hold-out cross-validation (39%) was the most frequently employed evaluation method (RQ6). Since 2018, there has been more than a sevenfold increase in the number of studies that focused on the physiology and health of dairy cows, compared to almost a threefold increase in the overall number of publications, suggesting an increased focus on this subdomain. In addition, a fivefold increase in the number of publications that employed neural network algorithms was identified since 2018, in comparison to a threefold increase in the use of both tree-based algorithms and statistical regression algorithms, suggesting an increasing utilization of neural network-based algorithms.

1. Introduction

Animal agriculture is responsible for 14.5% of global anthropogenic greenhouse gases emissions, 20% of which are due to dairy production [1]. With a 22% increase in global milk production forecasted between 2018 and 2027 [2], it is essential that the dairy sector adequately addresses the significant challenges ahead to ensure the future sustainability of the global dairy industry. This is coupled with the rapid intensification of milk production systems that has taken place over the past 20 years. This increased intensification may be due to the principles associated with modern agricultural systems that define progress in terms of efficiency and productivity [3]. This has led to economies of scale throughout the dairy industry, with increasing herd sizes reducing fixed costs per unit output, coupled with an emphasis on maximizing output per hectare of farmland and per unit of input (e.g., concentrate feed). However, increased numbers of dairy livestock will naturally result in an increased workload for farmers, which may reduce income per hour worked or potentially reduce animal health and wellbeing, as farmers must care for increased numbers of livestock. Thus, dairy farmers are required to improve productivity (e.g., reduced production costs per litre of milk) without sacrificing milk production volumes, milk quality, or animal health and wellbeing. To achieve this, every aspect of the milk production cycle must be continuously monitored, evaluated and corrected to minimise the probability of undesirable farm events that can impact productivity and profitability.
The use of software and hardware technologies that support dairy farmers through the automation of on-farm decision making can help farmers facilitate increased herd sizes without added labor requirements. Machine learning algorithms and cognate methodologies can provide the necessary prediction accuracy to power these technologies through the ability to self-learn and improve over time when new data become available. Thus, there has also been an increased prevalence of machine learning algorithms employed throughout the dairy literature. As on-farm data collection technologies improve and become more commonplace in line with the rollout of the 5G network, the potential of these machine learning powered technologies will also increase [4]. Machine learning algorithms provide flexibility regarding data multicollinearity, input data distributions and missing data points while also having the ability to quantify interactions and non-linearities between features (i.e., independent variables) for regression and classification problems [5,6,7]. Machine learning algorithms include both supervised techniques (e.g., random forests), which require training data to find patterns, and unsupervised techniques (e.g., k-means clustering), which do not require training data to find patterns [8]. The ability of a machine learning model to provide accurate predictions and/or insights for on-farm decision making is directly related to the quality of input data used for model training. In addition, careful consideration must be given to ensure that a robust validation procedure is carried out (supervised learning), as model overfitting may result in a drastic overestimation of predictive capabilities. To realise the full potential of these algorithms, it is essential that best practice methodologies are identified and employed throughout the entire dairy research domain.
With the increased prevalence of machine learning algorithms throughout the dairy literature, the future direction of the research domain can be guided through the systematic mapping of the problems, features, algorithms and evaluation metrics and methods that have been employed to date. Recently, two studies have focused on reviewing the literature related to the applications of machine learning on dairy farms. Cockburn [9] summarised 97 studies related to dairy farm management, animal physiology, cow reproduction (animal husbandry), behavior analysis and feeding. The author followed a pre-defined search strategy and selection criteria, reviewed articles published between 2015 and 2020, and individually discussed each study within each subdomain. Study parameters, such as the dataset used, dependent variables, features and algorithms used, the prediction accuracy calculated and research design pitfalls, were discussed. Concurrently, Slob et al. [10] carried out a systematic mapping study of 38 primary studies published between 2010 and 2020 that applied machine learning for either disease detection in milk, forecasting milk production, or quantifying milk quality on dairy farms. Slob et al. [10] also followed a pre-defined search protocol and selection criteria to allow for reproducibility, as per the review guidelines outlined by Kitchenham et al. [11]. Similar to Cockburn [9], Slob et al.’s review focused on the problems addressed, the features and the machine learning algorithms used. However, Slob et al.’s mapping study contained a broader overview of the methodologies employed by highlighting common trends throughout the literature, as opposed to discussing the methods used by each individual study. Slob et al. also investigated the types of problems addressed (e.g., regression or classification), the evaluation parameters and validation approaches used, the most accurate algorithm per study and the challenges identified. However, Slob et al.’s mapping study did not incorporate studies from other subdomains within the dairy literature, such as animal health and wellbeing, farm management, feeding, animal reproduction and behavior analysis.
This systematic mapping study focused on collating and assessing studies published in journals and conference proceedings between 1999 and 2021, which applied machine learning algorithms to dairy farming related problems. Similar to Slob et al. [10], this mapping study followed guidelines outlined by Kitchenham et al. [11], whereby the research questions, search strategy and selection criteria were pre-defined. However, in contrast to Slob et al., (i) this study was not limited to publications solely within the cow milk subdomain; and (ii) this study did not exclude publications based on a quality score to ensure maximum coverage. Concurrently, in contrast to Cockburn [9]; (iii) this study was a mapping study, not a summary of the literature; and (iv) this study assessed the geographical location through categorising studies according to the continent of origin. In addition, in contrast to both Slob et al. and Cockburn, this study; (v) employed a much larger search period and far broader search strategy allowing for a greater number of publications to be identified and assessed; (vi) assessed publications over time in terms of research areas, algorithms and validation methods used to identify trends throughout the study period; (vii) quantified and presented research areas according to publication sources, as well as the feature categories used with different types of algorithms through the use of Sankey diagrams. Lastly; (viii) the top 10 most frequently used evaluation metrics for both classification and regression problems were assessed separately, as opposed to assessing these metrics together to allow for a more accurate representation of their respective popularities.
This article has four primary components; (1) a methodology section detailing said research questions, search strategy and selection criteria, in conjunction with data collection and data synthesis procedures; (2) a results section presenting findings that help answer each research question; (3) a discussion section highlighting common trends evident throughout the dairy literature, and (4) provides a concise conclusion to this review.

2. Methodology

This mapping study followed three primary stages, including: planning, conducting and reporting stages, as outlined by Kitchenham et al. [11]. In the planning stage, the research questions were defined, suitable databases identified and a robust search strategy was selected to identify the journal articles and conference papers (hereby referred to as publications) that could be used to answer the research questions. The databases were selected based on institutional access, their use in prior systematic literature reviews in the dairy research domain [9,10,12] and in conjunction with the ability or inability to easily carry out bulk downloads of publications. A heuristic approach was taken to identify the search string that provided a broad and unbiased search of the dairy literature without returning an unfeasible number of publications. During the conducting stage, the document search was carried out using the specifically defined search strings within each online database. The identified publications were filtered according to pre-determined selection criteria prior to analysis, whereby no quality assessment was performed in order to ensure maximum coverage. Relevant data required to answer each research question were then extracted from each publication and synthesised in the reporting stage via applicable charts, figures and tables.

2.1. Research Questions

The following research questions (RQ) were defined:
RQ1.
What countries/regions are responsible for the largest number of publications?
RQ2.
What journals and conference proceedings are research publications being published in?
RQ3.
What problem areas are being addressed using machine learning in the dairy farming domain?
RQ4.
What features are being employed to develop the machine learning models?
RQ5.
What machine learning algorithms are being utilised to develop the models?
RQ6.
Which evaluation metrics and methods are used?

2.2. Databases and Search Strategy

The literature search was carried out using five databases, Scopus, Science Direct, IEEE, Google Scholar and MDPI. These databases were selected as each allowed for the bulk downloading of publications (except for google scholar) while providing wide coverage of dairy-related research publications. Google Scholar returned a small number of publications; thus, bulk downloading was not required. A broad and unbiased search of the literature was undertaken to capture a wide range of publications from various areas within the dairy research domain [11] by using the search string “Dairy” AND (“machine learning” OR “artificial intelligence”). By default, each database’s search function also searched for the approximate search phrase “machine-learning”. This search string ensured that (1) preference was not given to any particular machine learning algorithm, and (2) a broad search of the literature was carried out without returning an unfeasible number of publications. Publications that contained the search string in either their abstract, title and/or keywords fields were identified using each database’s search function. However, Google Scholar did not allow for searches to be carried out on the publication’s abstract and keywords fields, so only the publication title field was used. The search strategy focused on identifying studies published between 1999 and 2021, whereby the last search was carried out on 9 June 2021. Initially, the search strategy aimed to identify studies published between 1990 and 2021; however, no studies were found prior to 1999. In total, 749 studies were identified between Scopus (n = 382), ScienceDirect (n = 109), IEEE (n = 189), Google Scholar (n = 45) and MDPI (n = 21) databases.

2.3. Selection Criteria

To filter only relevant publications required to answer the research questions (defined in Section 2.1), exclusion and inclusion criteria were determined, similar to Slob et al. [10]. To be included in the study, all exclusion criteria must be false, and all inclusion criteria must be true [11].
The exclusion criteria were:
  • The publication was not related to machine learning applied to dairy farming
  • The publication did not report empirical findings
  • The publication was not written in English
  • The publication was a duplicate study
  • There was no full text available
  • The publication was a review or survey study
  • The publication was published before 1999
The inclusion criteria were:
  • The publication features the development of machine learning models related to dairy farming
  • The publication is a primary study

2.4. Data Collection

Each publication identified by the search strategy outlined in Section 2.2 was analyzed relative to the exclusion and inclusion criteria (Section 2.3). The search strategy was carried out in line with PRISMA guidelines, as shown in Figure 1 [13]. The flow of documents from initial identification to the manuscript screening/filtering stage to the final subset of documents included in the mapping study is shown in Figure 1. The number of studies excluded due to each exclusion criterion is also highlighted at the screening stage. Of the 746 documents initially identified, 10 were not written in English, 78 were review/survey studies and 294 had no full text available for downloading from the database website. In addition, 210 publications were found to be outside the scope of developing machine learning models for dairy farming, while 32 documents were removed due to being duplicate studies. In addition, seven publications were included through snowballing, as employed by Slob et al. [10]. Cumulatively, 129 individual publications passed the selection criteria stage and were then included in the mapping study (Appendix E).
Relevant data were extracted from each of the 129 studies to respond to each of the six research questions. This was carried out by reading each publication and extracting the following information: (1) the year of publication, (2) publication source (name of the journal or conference proceedings), (3) whether the publication was a journal article or conference paper, (4) the country of origin (identified as country or countries where data collection took place), (5) the dependent variable or variables used, (6) the problem type (e.g., classification, regression or clustering), (7) the features employed, (8) the machine learning algorithms utilised, (9) the evaluation metrics used for synthesising model performance and (10) the validation technique used to quantify model performance.

2.5. Data Analysis

To ease with the synthesis of information, research categories, algorithm categories and feature categories were determined for each study. Categorisation was necessary to ensure each research question was addressed clearly and concisely. Firstly, each study was categorised according to its specific area of dairy research, whereby six categories were identified (RQ3) based on cognate review studies in the field: physiology and health, animal husbandry, milk, feeding, management and behavior analysis. Cockburn [9] employed physiology and health, animal husbandry, feeding, management and behavior analysis, while Slob et al. [10] assessed studies milk disease detection, quantifying milk production and milk quality. The range of dependent variables that were used to determine which of the six categories of dairy research each study related to is shown in Appendix A. Secondly, the machine learning algorithms used within each study were also categorised accordingly, whereby eight categories were identified (RQ5): trees (e.g., decision trees), statistical regression (e.g., multiple linear regression, ridge regression), neural networks (e.g., multi-layered perceptron, deep learning networks), Bayes (e.g., naïve Bayes, Bayesian-LASSO), meta (e.g., bagging, boosting), rule-based (e.g., Jrip, OneR), clustering (e.g., k-means, DBSCAN) and other (e.g., support vector machine, KNN). The full list of machine learning algorithms used and their corresponding category is shown in Appendix C. Additionally, the features used within each study were categorised accordingly, whereby 11 categories were identified (RQ4): calving/pregnancy information, cow characteristics and clinical information, diet/feeding, farm characteristics and management, lactation information, meteorological conditions, milk characteristics, milking parameters, sensors, soil characteristics and other. The full list of features used and their corresponding categories are shown in Appendix B. Lastly, the categorisation of journals and conference proceedings was also carried out to help improve data synthesis. The other journal category represented journals that had less than four published articles included in this study, while all conference papers were included in a conference paper category (RQ2). The full lists of journals and conferences proceedings are shown in Appendix D.
Categorisation was straightforward when a publication focused on only one dependent variable. However, 13 publications focused on the prediction of multiple dependent variables. In these cases, the problem type, algorithms employed and features used were recorded for each dependent variable. Each dependent variable was categorised according to its specific area of dairy research. When a publication focused on the prediction of multiple dependent variables, each attributable to a different area of dairy research, each dependent variable was treated as a separate study. Otherwise, information would be excluded when; assessing the frequency of studies published in different research areas over time (RQ3), investigating the geographical locations attributed to different research areas (RQ1) and when evaluating the research problem type and popular journals and conference proceedings associated with different areas of dairy research (RQ2).
When a publication focused on the prediction of multiple dependent variables in the same dairy research area but utilising different features, each study involving unique sets of features were treated as a separate study. However, this was only applicable when addressing RQ4, whereby the features employed in different research areas in conjunction with the machine learning algorithms used was investigated. Otherwise, information related to the features used within each research area would be excluded.
Three studies involved the collection of data in more than one country/region. In such instances, each country was treated as though it had independently carried out the study. This was applicable when assessing the geographical distribution of the publications (RQ1). Assessing the geographical locations of publications was carried out on an individual publication basis, irrespective of the number of dependent variables. Likewise, assessing the algorithms used (RQ5), validation methods and model performance metrics used (RQ6) throughout the literature were carried out on an individual publication basis, as these were found to be consistent throughout each publication irrespective of the number of dependent variables.

3. Results

3.1. Geographical Distribution

The geographical distribution of the publications included in this study is shown in Figure 2. The geographical location was determined by the origin of the data used for model development. In total, 30 countries contributed data to machine learning in the dairy farming research domain. Data originated from one single country for 126 of the studies, with the remaining three studies having cross-border collaboration. These included collaborations between: (1) the United Kingdom, Italy, Sweden and Finland; (2) Australia, Canada, Denmark and Ireland; and (3) Belgium, Canada, Ireland, Denmark and Germany. In relation to RQ1, the largest number of studies utilised data originating from the United States (n = 19), followed by Ireland (n = 15), Germany (n = 13) and the United Kingdom (n = 13), and Australia (n = 10) and China (n = 10). The remaining 24 countries contributed data to five or fewer research publications. However, from a continental perspective, Europe (n = 60) was by far the largest contributor of data, followed by North America (n = 24) and Asia (n = 27), Oceania (n = 13), South America (n = 8) and Africa (n = 2). Data originating from Europe were used in studies focusing on the physiology and health of dairy cattle (n = 19), analysing animal behavior (n = 13), animal husbandry (n = 12), farm management (n = 8), milk (n = 5) and feeding (n = 3), as shown in Figure 3. Applying machine learning algorithms to assess the physiology and health of dairy cattle was also the most popular research category for the North America (n = 10) and Asia (n = 8) continents and joint most popular category in Oceania (n = 3) and South America (n = 3).

3.2. Publications Timeline

The number of research studies published per year from 1999 to 2021, categorised according to each research area, is shown in Figure 4. Prior to 2018, the animal husbandry category was the largest research area representing 35% of all publications in that period, followed by behavior analysis (19%), management (15%) and physiology and health (15%). A significant increase in the number of publications occurred in 2018, whereby a total of 15 journal articles and conference papers were published, representing a 114% increase compared to 2017. This trend continued in 2019 and 2020, whereby year-on-year increases of 80% and 41% were recorded, respectively. This resulted in 74% of the publications included in this mapping study being published after 2017, representing a threefold increase. On average, between 2018 and 2021, the physiology and health research category was the largest research area (38%) (up from 15% between 1999 and 2017), followed by research related to behavior analysis (19%) and animal husbandry (14%). The physiology and health research category represented the largest research area in each year between 2018 and 2021, representing 40%, 37%, 39% and 35% of publications, respectively. Behavior analysis was the second-largest research category in 2018 (27%), 2019 (22%) and the first five months of 2021 (24%), while animal husbandry was the second-largest research category in 2020 (21%).

3.3. Publications Breakdown

The following section has two primary components: the first component provides a breakdown of the type of problems addressed in relation to the source journals that published the research studies and the areas of research that machine learning has been applied to throughout the literature. The second component provides a breakdown of each research area in relation to the features considered for model development and machine learning algorithms employed.

3.3.1. Problem Type, Journals/Conferences and Research Area

The flow of research studies from the type of problem addressed, to the publication destination, to the area of research carried out is shown in Figure 5. Overall, 65% of the research studies focused on addressing classification problems, 33% addressed regression problems, while 2% and 1% focused on clustering and tree analysis problems, respectively. In relation to RQ2, the Computers and Electronics in Agriculture journal was responsible for publishing the largest number of research studies (21%), followed by the Journal of Dairy Science (16%). In addition, 27% of all research studies were published in other journals (Appendix D), whereby each journal was responsible for publishing less than four research articles included in this study, while 15% of all publications (20 conference papers) were published in 18 different conference proceedings. Concurrent with Section 3.2, and in relation to RQ3, the majority of studies focused on physiology and health research (32%), followed by animal husbandry (20%), behavior analysis (18%), milk (13%), management (11%) and feeding (6%). No clear trend or bias was found between the types of problems addressed and the publication sources, whereby the most popular destination for both classification and regression problems was the other journals category, followed by the Computers and Electronics in Agriculture journal. Regarding the destination of each publication in relation to the research area, the largest number of research publications published in other journals and the Computers and Electronics in Agriculture journal focused on physiology and health applications (n = 12 and n = 8, respectively). However, this varied from articles published in the Journal of Dairy Science, where the largest number of research articles focused on animal husbandry applications (n = 9).

3.3.2. Research Area, Features and Algorithms Used

The flow of research studies from a research category to the category of features considered to the category of machine learning algorithms is shown in Figure 6. Overall, 48% of research studies utilised sensor data for model development (RQ4), predominantly for physiology and health (n = 24) and behavior analysis (n = 24) applications. Accelerometer (n = 27), image (n = 7) and pedometer (n = 6) data were the three most frequently employed types of data collected by sensors, as shown in Appendix C. Sensor data were most frequently employed as feature data when developing artificial neural network models (n = 35), tree-based models (n = 32) and other model types (n = 31), whereby other models included the application support vector machine and k-nearest neighbor algorithms (full list shown in Appendix C). In addition, cow characteristics (34%), milk characteristics (37%), calving information (23%) and lactation information (19%) were also commonly employed as feature data followed by meteorological data (14%), diet and feeding (10%), farm characteristics (16%), milking parameters (10%), soil characteristics (1%) and other variables (7%). Regarding the algorithms employed (RQ5), tree-based algorithms were employed in the largest number of studies (54%), followed by neural network algorithms (50%), statistical regression-based algorithms (43%), other model types (37%), Bayes algorithms (17%), meta (10%), rule (4%) and clustering (1%). A full breakdown of the specific algorithms employed within each algorithm category is shown in Appendix C, in conjunction with the number of studies that each algorithm was employed.
The number of research studies published per year from 1999 to 2021, categorised according to each algorithm method, is shown in Figure 7. Prior to 2018, tree-based algorithms were the most frequently employed algorithm category (employed in 25% of all publications), followed by statistical regression-based algorithms (22%). This trend continued in the period between 2018 and 2021, whereby the percentage of publications that employed tree-based algorithms increased to 26%. However, the percentage of publications that employed statistical regression algorithms reduced to 17%, while the percentage of publications that employed neural network-based algorithms increased to 25% during the 2018 and 2021 period (up from 16% between 1999 and 2017). This equated to a fivefold (5.2), or a 420% increase in the number of publications that employed neural network algorithms since 2018, in comparison to a threefold (3.3) increase in the number of publications that employed tree-based algorithms and statistical regression algorithms (2.5).

3.4. Evaluation Metrics Used

In relation to RQ6, the ten most frequently used evaluation metrics for assessing regression and classification problems are shown in Table 1, in conjunction with the percentage of studies each metric was used in. For studies that focused on regression problems (n = 41), root mean squared error (RMSE) was the most frequently employed metric, whereby it was used in 56% of studies, followed by the coefficient of determination (R2) used in 46% of studies, correlation coefficient (r) (27%), mean absolute error (MAE) (24%), concordance correlation coefficient (CCC) (17%), mean absolute percentage error (MAPE) (15%), mean squared error (MSE) (15%), relative prediction error (RPE) (15%), mean percentage error (MPE) (10%) and mean squared percentage error (MSPE) (7%). In relation to studies that focused on classification problems (n = 85), classification accuracy was the most commonly employed evaluation metric (77%), followed by recall (66%), specificity (49%), positive predictive value (PPV) (48%), F1 Score (27%), the area under the ROC curve (AUC) (26%), negative predictive value (NPV) (15%), Cohen’s K (12%), false positive (FP) (9%) and false negative (FN) (6%).

3.5. Validation Methods

In relation to RQ6, six evaluation methods were identified throughout the 127 studies that addressed classification, regression and clustering (n = 1) problems: hold-out cross-validation (n = 49), leave-out-one-animal (LOOA) (n = 4), leave-one-out cross-validation (LOOCV) (n = 3), nested cross-validation (Nested CV) (n = 7), Train/Validation/Test (n = 17) and k-fold cross-validation (n = 30), as shown in Table 2. The k-fold cross-validation method was employed with a mean k value of 10, the hold-out method was employed with 71% of data used for training and 29% of data used for a test dataset, while the train/validation/test method used 65%, 17% and 18% of data for training, validation and testing, respectively. In 21 research studies, these evaluation methods were repeatedly carried out to reduce the probability of biased results associated with a single hold-out, train/validation/test or k-fold CV split. The number of studies that repeatedly carried out each particular evaluation method is highlighted in brackets. On average, the hold-out method was repeated 38 times, the train/validation/test method was repeated 10 times and k-fold cross-validation was repeated 14 times. In addition, 16 research studies employed a combination of two evaluation methods to further separate training and testing stages, particularly important for when tuning hyper-parameters. For example, 15 studies employed k-fold CV for model training to select features and/or hyper-parameters and calculated prediction accuracy on separate test data using hold-out cross-validation. One study employed two different evaluation methods for two different dependent variables.
The number of research studies published per year from 1999 to 2021, categorised according to each validation method, is shown in Figure 8. Prior to 2018, the hold-out method was the most frequently employed validation method (employed in 43% of all publications), followed by k-fold cross-validation (30%) and train/validation/test validation (19%). This trend continued throughout the 2018 to 2021 period, whereby the percentage of publications that employed the hold-out method increased slightly to 46%, as did the use of k-fold cross-validation (33%). However, this period also saw a reduction in the percentage of publications that employed the train/validation/test validation (10%). The hold-out cross-validation method was the most frequently employed method each year between 2014 and 2020, while the k-fold cross-validation method was the most frequently used method (45%) in the first five months of 2021. In 2019 and 2020, the use of the hold-out method increased by 100% and 19%, year-on-year, respectively, while the use of k-fold cross-validation increased by 80% and 33%, year-on-year, respectively.

4. Discussion Overview

This study represents the largest and broadest systematic mapping review to date, focusing on published literature related to the application of machine learning algorithms in the dairy research domain. In total, 129 publications were included and assessed, made possible due to a combination of broad search terms and an increased search period spanning over 21 years. However, it is still plausible that additional publications that focused on the application of machine learning algorithms on dairy farms were not captured by the search strategy employed. The search strategy involved five databases chosen to provide wide coverage of dairy-related research while allowing for the bulk downloading of publications. It is likely that some publications located in other databases were not included. Snowballing was carried out to help reduce the number of publications not included. However, the largest barrier to including publications in this study was the availability of a full text from the Scopus database. This was due to restrictions on the publisher’s side, which accounted for 93% of the total number of excluded publications.
Throughout the 129 publications included in this mapping study, a considerably wide range of dependent variables (n = 66), features (n = 251) and algorithms (n = 90) were employed in 35 journals and 18 conference proceedings. It was, therefore, necessary to categorise dependent variables, features, algorithms and journal articles and conference papers accordingly to ensure findings could be easily digested and each research question could be adequately addressed. Categorisation was based on the experience of the authors while considering the categorisation approaches employed in cognate studies. This included the categorisation of: (1) each dependent variable into one of six research categories, (2) each feature into 1 of 11 feature categories, (3) each algorithm into one of eight algorithm categories and (4) journals that published four or fewer articles included in this study into the other journals category, and all conference papers into a separate Conference Paper category. For full transparency, the full lists of dependent variables, features and algorithms employed and their respective categories, as well as the journal/conference proceedings, are presented in Appendix A, Appendix B, Appendix C and Appendix D respectively.
All neural network-based models, including multilayer perceptron networks, convolutional neural networks and long-short term memory networks, were included in the Neural Network category to minimise the over-categorisation of algorithms. The number of studies that employed each neural network-based algorithm can be found in Appendix C.
The research categories, algorithm categories and validation methods employed per year were assessed between 1999 and 2021 to allow for trends in research areas and methodologies to be identified over time. Firstly, regarding the research categories, the largest number of publications prior to 2018 were related to animal husbandry (35%). However, since 2018, the largest number of publications have been related to physiology and health (38%), with the percentage of publications focusing on animal husbandry research reducing to 14%. This suggested a trend throughout this research domain, with studies moving away from animal husbandry-related problems to focus on improving the physiology and health of dairy cows. The number of studies that focused on the physiology and health of dairy cows has increased seven-fold since 2018. Concurrently, the smallest number of publications both prior to 2018 (6%) and after 2018 (6%) were related to feeding, suggesting an opportunity for future research to be carried out in this largely unexplored subdomain. Secondly, in relation to the types of algorithms employed, tree-based algorithms were the most frequently employed algorithm category, being used in 25% and 26% of studies prior to 2018 and since 2018, respectively. However, the use of statistical regression-based algorithms reduced from 22% to 17%, before and after 2018, respectively, while at the same time, the use of neural network-based algorithms increased from 16% to 25%. This suggested a move away from statistical regression-based algorithms towards the utilisation of neural network-based algorithms. Lastly, regarding the validation methods employed, both prior to 2018 and after 2018, hold-out cross-validation was the most frequently employed validation method, being used in 43% and 46%, respectively. In addition, the use of k-fold cross-validation also increased from 30% to 33% during these periods. However, the percentage of studies that used the train/validation/test validation method reduced from 19% to 10% before and after 2018, respectively, suggesting a trend away from the train/validation/test method towards hold-out and k-fold cross-validation.
This mapping study was carried out in line with PRISMA guidelines, with six pre-defined research questions outlined in Section 2.1. The search strategy produced results that adequately addressed each research question. In relation to RQ1, the country responsible for the greatest number of publications was the USA (n = 19); however, when the geographical location of studies was assessed on a continent basis, Europe was by far the greatest region, producing 60 publications. Regarding RQ2, the greatest number of publications was published in the Computers and Electronics in Agriculture journal (21%), followed by the Journal of Dairy Science (16%). Additionally, 35 publications (27%) were published across 28 other journals that each published less than four papers included in this study, while the 20 conference papers were published in 18 different conference proceedings. RQ3 focused on determining what research areas were being addressed in the dairy research domain using machine learning methodologies, where results showed that the greatest number of studies addressed problems focused on the physiology and health of dairy cows (32%). In relation to RQ4, the most frequently employed feature data throughout the literature were derived from sensor data (48%), with 27 studies employing accelerometer data. Additionally, RQ5 focused on identifying the most frequently utilised machine learning algorithms used throughout the dairy literature. The greatest number of studies employed tree-based algorithms (54%), followed by neural network-based algorithms (50%). Lastly, RQ6 focused on identifying the evaluation metrics and methods employed throughout the dairy literature. Assessing the literature showed that RMSE (56%) and R2 (46%) were the most frequently employed metrics used for regression problems, while accuracy (77%) and recall (66%) were the most frequently employed metrics used for classification problems. In addition, hold-out cross-validation was the most frequently employed evaluation method throughout the literature.

5. Conclusions

The results show that there has been a considerable increase in the prevalence of published literature applying machine learning algorithms to help solve problems on dairy farms, with 74% of the publications included in this study published since 2018. Europe was responsible for the production of data utilised in 45% of the research studies assessed, highlighting the need for an increase in research studies in other regions, in particular Africa, Oceania and South America. In addition, 32% of the studies included in this review applied machine learning to problems related to the physiology and health of dairy cows, with a seven-fold increase in publications in this area occurring since 2018. Concurrently, this study has also highlighted a reduction in the percentage of studies that used statistical regression algorithms coupled with an increased percentage of studies that used neural network-based algorithms since 2018, when compared with the 1999 to 2017 period. As machine learning algorithms are more-frequently applied to problems in the dairy domain, it is important that best practice guidelines are followed to ensure their potential impact is realised. This mapping study may be used as the basis for future research in the dairy domain to identify studies that may have focused on a similar problem, whereby an identical, similar or improved methodology may be suitable.

Author Contributions

Conceptualisation, P.S. and M.D.M.; methodology, P.S. and M.D.M.; software, P.S.; validation, P.S. and M.D.M.; formal analysis, P.S.; investigation, P.S.; resources, P.S. and M.D.M.; data curation, P.S.; writing—original draft preparation, P.S.; writing—review and editing, P.S. and M.D.M.; visualisation, P.S.; supervision, M.D.M.; project administration, M.D.M.; funding acquisition, P.S. and M.D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been funded by the Sustainable Energy Authority of Ireland under the SEAI Research, Development and Demonstration Funding Program 2019, Grant number 19/RDD/453.

Data Availability Statement

Data were compiled from five databases: Scopus (https://www.scopus.com/, accessed on 9 June 2021), Science Direct (https://www.sciencedirect.com/, accessed on 9 June 2021), IEEE (https://ieeexplore.ieee.org/, accessed on 9 June 2021), Google Scholar (https://scholar.google.com/, accessed on 9 June 2021) and MDPI (https://www.mdpi.com/, accessed on 9 June 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The specific dependent variables used per research category are shown in Table A1, with the number of studies that used each dependent variable presented in brackets next to each variable. The total number of dependent variables per category is also presented.
Table A1. Specific dependent variables used per research category.
Table A1. Specific dependent variables used per research category.
CategoryDependent Variables (Number of Studies)n
Animal HusbandryEstrus Detection (7), Pregnancy Status (6), Calving Prediction (3), Cow Survival (2), Abortion Incidence (1), Calving Difficulty (1), Conception Performance (1), Conception Probability (1), Conception Success (1), Conception Rate (1), First-Service Conception Rate (1), Genomic Evaluation (1), Service Rates (1), Submission Rate (1)14
Behavior AnalysisCow Activity (17), Cow Detection (3), Cow Identification (2), Jaw Movements (1), Sleep Stages (1)6
FeedingDry Matter Intake (2), Concentrate Feed Intake (1), Diet Energy Digestion (1), Feeding Behavior (1), Insufficient Herbage Allowance (1), Residual Feed Intake (1), Volatile Fatty Acids (1)7
ManagementElectricity Use (6), Energy Output (3), Methane Emissions (2), Water Use (2), Diesel Use (1), Faecal Nitrogen (1), Faeces Output (1), Herbage Production (1), Manure Temperature (1), Nutrient Concentration (1), Urinary Nitrogen (1), Urine Output (1)12
MilkMilk Production (6), Milk Adulteration (4), Milk Quality Parameters (2), Fat EBV (1), Milk Bacterial Index (1), Milk EBV (1), Milk Metabolites (1), Milk Parameters (1), Outlier Lactations (1)9
Physiology and HealthMastitis Detection (11), Lameness Detection (10), Body Condition Score (7), Heat Stress (4), Bodyweight (2), Metabolic Status (2), Animal Dimensions (1), Digital Dermatitis (1), Ketosis Detection (1), Milk Productivity (1), Noxious Events (1), Respiration Rate (1), Rumen and Blood Metabolites (1), Skin Temperature (1), Teat Cleanliness (1), Tuberculosis Status (1), Vaginal Temperature (1)17

Appendix B

The specific features used per feature category are shown in Table A2, with the number of studies that utilised each feature presented in brackets next to each feature. The total number of features per category is also shown.
Table A2. Specific features used per feature category.
Table A2. Specific features used per feature category.
Independent Variable CategoryFeatures (Number of Studies)n
Calving/Pregnancy InformationParity (24), Calving Interval (5), Previous Calving (2), AI Season (1), AI Stage (1), Calf Sex (1), Calving Age (1), Calving Month (1), Conception Rate (1), Days Since Previous AI (1), Displaced Abomasum (1), Duration of The Voluntary Waiting Period (1), Fertility EBI (1), Length of Pregnancy (1), Month of Insemination (1), Negative Energy Balance (1), No. of Heifers Calved (1), No. of Lactating Cows (1), No. of Previous Inseminations (1), Number of Cows In The Maternity Pen (1), Pregnancy Status (1), Pregnancy Stage (1), Previous Abortion (1), Previous Year’s Conception Rate (1), Reproduction Performance (1), Strategy For Using A Clean-Up Bull (1), Temperature For Thawing Semen (1)27
Cow Characteristics and Clinical InformationBodyweight (11), Age (5), Breed (5), Genetics (5), BCS (4), Heart Rate (4), Body Temperature (3), Mastitis Detected (3), Phenotype Data (3), Breeding Values (2), Core Rumen Microbiome (2), Ketosis (2), Survival (2), Veterinary Treatments (2), Accumulated Number of Mastitis Cases (1), Back Fat Thickness (1), Bacteriological analysis (1), Blood Oxygen Saturation (1), Body Mass (1), Bodyweight Leg Distribution (1), Breathing Rate (1), Clinical Case Ratio (1), Clinical Mastitis (1), Core Temperature (1), Cytometric Fingerprint (1), EBV (1), Estrus Detected (1), Health (1), Lameness (1), Longevity (1), Medical Conditions (1), Medication (1), Metritis (1), Microrna Gene Expression Data (1), Percentage of Cows With Low BCSs (1), Previous BCS (1), Proportion of Hf Genes In Cow Genotype (1), Retained Placenta (1), Reticulorumen Temperature (1), Ruminal pH (1), Sire and Dam Fat EBV (1), Sire And Dam Milk EBV (1), Teat Sanitation (1), The Frequency of Hoof Trimming Maintenance (1), Udder Depth (1)45
Diet/FeedingDiet Composition (3), Feed Intake (2), Programmed Concentrate Feed (2), Concentrate Feed (1), DMI (1), Drinking Duration (1), Eating Duration (1), Feed Bin Visits (1), Forage Species (1), Mean Duration of Trough Visits (1), Nutrient Management (1), Pasture Composition (1), Roughage Feed (1), Rumination Time (1), TMR Composition (1), Total Feed Intake (1), Vitamins (1), Water Bin Visits (1), Water Intake (1)19
Farm Characteristics and ManagementHerd Size (9), No. of Parlour Units (7), Frequency of Hot Wash (6), Hot Water Tank Volume (6), Milk Cooling System (4), Milk Tank Volume (4), No. of Air Compressors (4), No. of Scrapers (3), Electricity Energy (2), Field Troughs (2), Flow Rate (2), Fossil Fuel Energy (2), Housing (2), Milk Pre-Cooling (2), Parlour Washing (2), Rainwater Collection (2), Air Conditioning (1), Bunk Space Per Cow (1), Facilities (1), Fan (1), Farm Management (1), Feed Energy (1), Feed Supply Energy (1), Fuel Energy (1), Grazing Management (1), Hectares (1), Herd Management (1), Human Labour Energy (1), Indoor Temperature (1), Labour (1), Labour Energy (1), Lime Management (1), Logistics Pickup (1), Machinery Energy (1), Manure Depth (1), Mechanised Feeding (1), No. of Scrapers (1), Pasture Management (1), Room Temperature (1), Stocking Rate (1), Tank Cleaning (1), Tank Level (1), Type of Bedding In The Dry Cow Pen (1), Type of Cow Restraint System (1), Water Energy (1)45
Lactation InformationDIM (19), Complete Lactation (1), Dry Period (1), Dry Period Cure Rate (1), Dry Period Length (1), Early Lactation (1), Freshening Date (1), Lactation Stage (1), Week of Lactation (1)9
Meteorological ConditionsAmbient Temperature (15), Relative Humidity (11), Rainfall (6), Wind Speed (6), Wind Direction (4), Dewpoint Temperature (3), Solar Radiation (3), Wet Bulb Temperature (3), Dry Bulb Temperature (2), Air Pressure (1), Air Temperature (1), Black Globe Temperature (1), Degree Days Below 15 C (1)13
Milk CharacteristicsMilk Yield (34), Milk Fat (20), Milk Protein (19), Milk Lactose (10), SCC (10), Milk Conductivity (5), Milk MIR Spectral Data (5), Milk Temperature (5), Milk Fatty Acids (3), Milk Flow (3), 305 Day MY Equivalent (2), Milk Density (2), Milk Ph (2), Milk SNF (2), 305 Day FPCM Equivalent (1), Blood In Milk (1), Fat Corrected Milk (1), Max Fat/Protein Ratio of Previous Lactation (1), Metabolite Data (1), Milk Acetone (1), Milk Casein (1), Milk Fever (1), Milk Freezing Point (1), Milk Genetics (1), Milk Infrared Spectroscopy Data (1), Milk Mineral Content (1), Milk Persistency (1), Milk Urea (1), Non-Esterified Fatty Acids (1), Saturated Fatty Acids (1), Single Nucleotide Polymorphism Markers (1), Specific Gravity (1), Unsaturated Fatty Acids (1), Urea (1)34
Milking ParametersMilking Frequency (4), No. of Vacuum Pumps (3), Milking Duration (2), Milking Time (2), Peak Milk Flow (2), Cups Kicked off During Milking (Yes/No) (1), Expected Milk Yield (1), No. of Clusters (1), Start/End of Milking (1)9
OtherMonth Number (3), Time (2), Cow ID (1), Date (1), Day Length (1), Herd ID (1), Test Day (1), Weekday (1), Year (1)9
SensorsAccelerometer (27), Image Data (7), Pedometer (6), Depth Image Data (4), GPS Data (4), Magnetometer Data (3), Gyroscope Data (2), Mass Spectrometry Data (2), RGB Image Data (2), Sound Data (2), 2D Image Data (1), 3D Depth Image Data (1), Audio Data (1), Differential Scanning Calorimetry (DSC) Data (1), Ear Surface Temperature (1), ECG (1), Electromyography (1), Fourier Transformed Infrared Spectroscopy (FTIR) Data (1), Locomotion Score (1), Near Infrared Reflectance (NIR) Spectrophotometer Data (1), NIR Image Data (1), Pressure Sensor (1), Radar (1), RFID Data (1), Spectroscopic Data (1), Thermal Imaging Data (1), Thermo-Hygrometric Sensor Data (1)27
Soil CharacteristicsSoil Boron (1), Soil Calcium (1), Soil Characteristics (1), Soil Copper (1), Soil Iron (1), Soil Magnesium (1), Soil Manganese (1), Soil Organic Matter (1), Soil Ph (1), Soil Phosphorus (1), Soil Potassium (1), Soil Sodium (1), Soil Sulphur (1), Soil Zinc (1)14

Appendix C

The specific algorithms used per algorithm category are shown in Table A3, with the number of studies that employed each algorithm presented in brackets next to the algorithm. The total number of features per category is also shown.
Table A3. Specific algorithms used per algorithm category.
Table A3. Specific algorithms used per algorithm category.
Algorithm CategoryAlgorithms (Number of Studies)n
BayesNaïve Bayes (21), Bayes net (5), Gaussian Naïve Bayes (2), Bayes-A (1), Bayesian-LASSO (1), Naïve Bayes updatable (1)6
ClusteringDBSCAN (1), k-means clustering (1)2
MetaBagging (5), Adaboost (4), Random Subspace (2), rotation forest (2), Boosting (1), Bootstrap Aggregation (1), Super Learner (1), Stacking (1), Voting (1)9
Neural NetworkANN (46), CNN (10), LSTM (5), Adaptive Neuro-Fuzzy Inference System (2), Faster R-CNN (2), YOLOv2 CNN (2), ANFIS (1), Bi-LSTM (1), CNN Ensemble (1), Extreme Learning Machine (1), Kernel Extreme Learning Machine (1), MLANFIS (1), Mask R-CNN (1), Neuro-Fuzzy Systems (1), Radial Basis Function Network (1), YOLOv3 CNN (1)16
OtherSVM (31), KNN (20), ANOVA (2), SMO (2), 3-dimensional surface fitting (1), Genetic Algorithm (1), Gaussian Processes (1), Kstar (1), LWL (1), multi-class SVM (1), Multivariate Adaptive Regression Spline (1), one-class SVM (1), Quick Classifier (1)13
RuleOneR (3), Jrip (2), PART (2), Classification Based on Associations (1), Majority Voting Rule (1), ZeroR (1)6
Statistical RegressionLogistic Regression (18), Multiple Linear Regression (13), Linear Discriminant Analysis (6), PLS (6), Linear Regression (4), GAM (3), Multivariate Logistic Regression (3), Ridge Regression (2), Genomic BLUP (1), General Linear Model (1), Logistics (1), MLR with Regularization (1), Multinomial Regression (1), Penalised Linear Regression (1), PLS Discriminant Analysis (1), PLS Regression (1), Simple Logistic (1), Stochastic Gradient Descent (1)18
TreeRF (50), DT (26), Gradient Boosting Machine (4), C4.5 (3), CART (3), XGBoost (3), Alternating DT (2), Binary Tree (2), ExtraTrees (2), Gradient Boosted DT (2), J48 (2), M5P Tree (2), Decision Stumps (1), Hoeffding (1), Logistic Model Trees (1), Parallel DT (1), Predictive Clustering Trees (1), Random Tree (1), REPTree (1), Stump DT (1)20
ANFIS = Adaptive Neuro-Fuzzy Inference System; ANN = Artificial Neural Network; ANOVA = Analysis of variance; CART = Classification and Regression Tree; CNN = Convolutional Neural Network; DBSCAN = Density-Based Spatial Clustering of Applications with Noise; DT = Decision Tree; GAM = Generalised Additive Model; KNN = k-Nearest Neighbor; LSTM = Long Short Term Memory Network; LWL = Locally weighted learning; MLANFIS = Multi-Layered Adaptive Neural Fuzzy Inference System; PART = Projective Adaptive Resonance Theory; PLS = Partial Least Squares; RF = Random Forest; SMO = Sequential Minimal Optimization; SVM = Support Vector Machine.

Appendix D

The journals that published less than four studies included in this mapping study and all conference proceedings are shown in Table A4, with the number of studies published in each journal/conference presented in brackets next to each journal/conference.
Table A4. List of journals that published less than four studies included in this study and all conference proceedings.
Table A4. List of journals that published less than four studies included in this study and all conference proceedings.
CategorySource (Number of Studies)n
JournalsApplied Animal Behavior Science (3), Biosystems Engineering (3), International Journal of Agricultural and Biological Engineering (2), Irish Veterinary Journal (2), Science Advances (2), African Journal of Science, Technology, Innovation and Development (1), Agricultural Systems (1), Agronomy (1), Animal (1), Applied Energy (1), Applied Sciences (1), Archives Animal Breeding (1), BMC Veterinary Research (1), BioData Mining (1), Ciencia Rural (1), Computational and Mathematical Methods in Medicine (1), Food Control (1), Genetics Selection Evolution (1), Genetics and Molecular Research (1), IEEE Geoscience and Remote Sensing Letters (1), Information Processing in Agriculture (1), Journal of Energy Technology and Policy (1), Journal of Food Composition and Analysis (1), Journal of Systems Architecture (1), Livestock Science (1), Multimodal Technologies and Interaction (1), Research in Veterinary Science (1), Theriogenology (1)28
ConferencesIEEE Sensors (2), International Conference on Unmanned Systems and Artificial Intelligence (ICUSAI) (2), ABASE Annual International Meeting (1), Africa Week Conference (IST) (1), Consumer Communications and Networking Conference (CCNC) (1), European Conference on Electrical Engineering and Computer Science (EECS) (1), International Conference on Big Data Computing Service and Applications (1), International Conference on Biometrics Theory, Applications and Systems (BTAS) (1), International Conference on Computers and Their Applications (CATA) (1), International Conference on Computing for Sustainable Global Development (INDIACom) (1), International Conference on Data Mining Workshops (1), International Conference on Data and Software Engineering (ICoDSE) (1), International Conference on Intelligent Robots and Systems (IROS) (1), International Electronics Symposium (IES) (1), International Seminar on Application for Technology of Information and Communication (iSemantic) (1), International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) (1), International conference on Bio Signals, Images, and Instrumentation (ICBSII) (1), Journal of Physics: Conference Series (1)18

Appendix E

The specific feature data, dependent variables, machine learning algorithms, evaluation metrics and evaluation methods used per research category for each of the 129 publications included in this mapping study are shown in Table A5.
Table A5. Feature data, dependent variables, algorithms, evaluation metrics and methods used per study.
Table A5. Feature data, dependent variables, algorithms, evaluation metrics and methods used per study.
Animal Husbandry
StudyFeaturesDependentAlgorithms aEvaluation Metrics bEvaluation Methods c
[14]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Milk Characteristics, Sensor DataCalving Difficultymultinomial regression, DT, RF, ANNRecall, Specificity, F1 Score, AccuracyHold-Out
[15]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk CharacteristicsSubmission RateDT, KNN, RF, ANN, LRAccuracy, Balanced Accuracy, Recall, Specificity, PPV, NPV, F1 Score, Cohen’s Kappa, Prevalence, AUC, MAERepeated k-fold CV, Hold-Out
[5]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Farm Characteristics and ManagementFirst-Service Conception RateAlternating DT, LRAccuracy, FP, FNk-fold CV
[5]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Farm Characteristics and Management, Milk CharacteristicsPregnancy StatusAlternating DT, LRAccuracy, FP, FNk-fold CV
[16]Diet/FeedingEstrus DetectionGLM, ANN, RFAccuracy, Recall, Specificity, PPV, NPV, Error RateNested CV
[17]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Milk CharacteristicsPregnancy StatusDTAccuracy, Recall, Specificity, PPV, NPVHold-Out
[18]Cow Characteristics and Clinical InformationCow SurvivalNaïve Bayes, RF, LRAccuracy, Recall, Specificity, AUCk-fold CV, Hold-Out
[19]Cow Characteristics and Clinical Information, Milk CharacteristicsGenomic EvaluationRandom-Boosting, Genomic BLUP, Bayesian-LASSO, Bayes-AMSE, rHold-Out
[20]Cow Characteristics and Clinical Information, Milk Characteristics, Milking ParametersEstrus DetectionDT, Naïve Bayes, SVM, RF, LRAccuracy, PPV, Recall, F1 Score, SpecificityTrain/Validation/Test
[21]Calving/Pregnancy Information, Cow Characteristics and Clinical InformationConception PerformanceANN, multivariate adaptive regression spline, LRRMSE, AIC, AUC, Bayesian Information Criterion, Generalized Cross-Validation Error, Accuracyk-fold CV, Hold-Out
[22]Sensor DataCalving PredictionLSTM, Bi-LSTMRecall, Specificity, PPV, NPVHold-Out
[23]Calving/Pregnancy InformationEstrus DetectionMultivariate LRAccuracyHold-Out
[24]Sensor DataEstrus DetectionPre-trainedRecall, Specificity, PPV, NPV, Accuracy, Error RateHold-Out
[25]Sensor DataEstrus DetectionK-means clusteringn/aHold-Out
[26]Cow Characteristics and Clinical InformationCow Survivalmajority voting rule, multivariate LR, RF, Naïve BayesPPV, Recall, Balanced Accuracy, AUCRepeated k-fold CV, Hold-Out
[27]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk Characteristics, OtherConception SuccessC4.5 DT, Naïve Bayes, Bayesian network, LR, SVM, PLS, RF, rotation forestAUCRepeated k-fold CV
[28]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk Characteristics, OtherAbortion IncidenceNaïve Bayes, Bayesian network, DT, RF, OneR, PART, LR, ANN, stochastic gradient descent, bagging, boosting, rotation forestF1 Score, AUC, PPV, MCC, Recall, LiftHold-Out
[29]Sensor DataEstrus DetectionKNN, ANN, LDA, DTRecall, Specificity, PPV, NPV, Accuracy, F1 Scorek-fold CV
[30]Sensor DataCalving PredictionRF, LDA, ANNAccuracy, Recall, SpecificityLOOCV, Hold-Out
[31]Diet/Feeding, Farm Characteristics and ManagementConception rateM5P Tree, ANOVAr, RMSEk-fold CV
[31]Diet/Feeding, Farm Characteristics and ManagementService RatesM5P Tree, ANOVAr, RMSEk-fold CV
[32]Sensor DataEstrus DetectionLSTM, CNN, KNNRecall, Specificity, PPVHold-Out
[33]Milk CharacteristicsPregnancy StatusPLS discriminant analysis, CNNPPV, Recall, F1 Scorek-fold CV
[34]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Milk CharacteristicsPregnancy StatusNaïve Bayes, Bayesian networks, DT, DT ensemble, RFAUC, FP, TPk-fold CV
[35]Sensor DataPregnancy Statusnot specifiedRecall, SpecificityHold-Out
[36]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk CharacteristicsPregnancy StatusGAM, LR, baggingPPV, Recall, F1 Score, AUCHold-Out
[37]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Milk Characteristics, OtherConception ProbabilityGAM, LRRecall, Specificity, Accuracy, PPV, NPV, AUC, MCCHold-Out
[38]Sensor DataCalving PredictionRFMCC, AUC, Recall, SpecificityHold-Out
Behavior Analysis
StudyFeaturesDependentAlgorithmsEvaluation MetricsEvaluation Methods
[39]Sensor DataCow ActivityRF, Naïve Bayes, Jrip, J48Accuracy, FP, F1 Score, AUCRepeated k-fold CV
[40]Sensor DataCow ActivityRFAccuracyk-fold CV
[41]Diet/Feeding, Sensor DataJaw MovementsDT, RF, ANN, radial basis function network, SVM, extreme learning machineAccuracy, Recall, PPVLOOCV
[42]Sensor DataCow DetectionYOLOv2 CNNAccuracyHold-Out
[43]Sensor DataCow ActivityKNN, SVM, ANNAccuracy, PPV, Recall, Specificity, F1 Score, Cohen’s KappaLOOA
[44]Sensor DataCow ActivitySVM, Naïve Bayes, KNN, RF, LRF1 Score, Recall, PPVNested CV
[45]Cow Characteristics and Clinical Information, Sensor DataCow ActivityRF, LDA, ANNRecall, Specificity, Accuracyk-fold CV, Hold-Out
[46]Sensor DataCow ActivityBagging, Random Subspace, AdaBoost, Binary Tree, LDA classifier, Naïve Bayes, KNN, Adaptive Neuro-Fuzzy Inference SystemAccuracy, Recall, Specificity, F1 Score, FDRHold-Out
[47]Sensor DataCow ActivityDT, SVMPPV, Recall, SpecificityNested CV
[48]Cow Characteristics and Clinical Information, Sensor DataCow ActivitySVM, DTAccuracyHold-Out
[49]Sensor DataCow DetectionANN, KNNPPV, Recall, F1 Score, Accuracy, Hamming lossHold-Out
[50]Sensor DataCow ActivityDT, ANNAccuracy, Recall, Specificityk-fold CV, Train/Validation/Test
[51]Sensor DataCow ActivityExtreme Boosting Algorithm, SVM, Adaboost, RFAccuracy, Cohen’s Kappa, Recall, SpecificityRepeated k-fold CV
[52]Sensor DataCow ActivityBagging, Random Subspace, AdaBoost, Binary Tree, LDA, Naïve Bayes, KNN, Adaptive Neuro-Fuzzy Inference SystemAccuracy, Recall, Specificity, F1 Score, FDRHold-Out
[53]Sensor DataCow DetectionFaster Region CNN, k-means clustering, DBSCANn/an/a
[54]Sensor DataCow IdentificationMask R-CNNTP, FP, FN, IoU, PPV, Recall, Averaged PPV, mAP, ARHold-Out
[55]Sensor DataCow ActivityKNNPPV, RecallRepeated Hold-Out
[56]Sensor DataCow ActivityAdaboostAccuracy, Specificity, Recall, PPV, F1 Score, Cohen’s Kappak-fold CV
[57]Cow Characteristics and Clinical Information, Sensor DataSleep StagesANN, RFAUC, Accuracy, F1 Score, PPV, Recallk-fold CV
[58]Cow Characteristics and Clinical Information, Sensor DataCow Udder AnomaliesKNN, ANN, LSTM, DTRecall, FPRRepeated Train/Validation/Test
[59]Sensor DataCow ActivityKNN, Naïve Bayes, SVMPPV, Recall, AccuracyLOOA
[60]Sensor DataCow ActivityCNN, LSTMAccuracyTrain/Validation/Test
[61]Sensor DataCow IdentificationKNN, SVM, RF, DT, LRAccuracyHold-Out
[62]Sensor DataCow ActivityNaïve Bayes, Bayes net, SVM, ANN, Jrip, PART, OneR, Naïve Bayes, J48, logistic model trees, meta (super learner), LR, Simple LogisticAccuracy, Recall, Specificity, PPV, F1 Score, Training Speedk-fold CV
Feeding
TitleFeaturesDependentAlgorithmsEvaluation MetricsEvaluation Methods
[63]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Diet/Feeding, Lactation Information, Meteorological Conditions, Milking ParametersConcentrate Feed IntakeANNMSEHold-Out
[64]Milk CharacteristicsVolatile Fatty AcidsANN, MLRMSPE, RMSE, RMSE %Train/Validation/Test
[65]Sensor DataInsufficient Herbage AllowanceSVM, RF, XGBoostAUC, Recall, Specificity, Accuracy, PPV, F1 ScoreLOOA
[66]Cow Characteristics and Clinical Information, Lactation Information, Milk CharacteristicsDry Matter IntakeANN, PLSCCC, RMSE, Mean Bias, R2k-fold CV, Hold-Out
[67]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk CharacteristicsDry Matter IntakeANN, PLSR2, RMSE, RPDRepeated k-fold CV
[68]Diet/FeedingDiet Energy Digestionkernel extreme learning machine, Linear Regression, ANN, SVM, Extreme Learning MachineMAE, MAPE, RMSE, R2, Training Speedk-fold CV, Repeated Hold-Out
[69]Sensor DataFeeding BehaviorCNNAccuracyHold-Out
[70]Cow Characteristics and Clinical Information, Diet/Feeding, Milk CharacteristicsResidual Feed IntakeSVMMSE, rRepeated Hold-Out
Management
TitleFeaturesDependentAlgorithmsEvaluation MetricsEvaluation Methods
[71]Cow Characteristics and Clinical InformationMethane EmissionsRidge Regression, RFR2Repeated k-fold CV
[72]Farm Characteristics and Management, Milk CharacteristicsElectricity useSVMRPE, CCC, MAPE, MAE, MPE, r, RMSEHold-Out
[73]Farm Characteristics and ManagementEnergy OutputANNR2, RMSE, MAPETrain/Validation/Test
[74]Farm Characteristics and Management, Meteorological Conditions, Milk Characteristics, Milking ParametersElectricity useMLR, SVMRPE, CCC, MPE, RMSEHold-Out
[75]Calving/Pregnancy Information, Farm Characteristics and ManagementElectricity useMLRRPE, R2LOOCV
[75]Calving/Pregnancy Information, Farm Characteristics and ManagementDiesel useMLRRPE, R2LOOCV
[76]Farm Characteristics and Management, Meteorological Conditions, Milk Characteristics, Milking ParametersElectricity useANN, RF, DT, SVM, MLRRMSE, RPE, CCC, MSPE, MPE, rNested CV
[76]Farm Characteristics and Management, Meteorological Conditions, Milk CharacteristicsWater useANN, RF, DT, SVM, MLRRMSE, RPE, CCC, MSPE, MPE, rNested CV
[77]Farm Characteristics and ManagementEnergy OutputMLANFISR2, RMSE, MAPETrain/Validation/Test
[78]Farm Characteristics and ManagementEnergy OutputANFISR2, RMSE, MAPETrain/Validation/Test
[79]Farm Characteristics and Management, Meteorological Conditions, Milk CharacteristicsElectricity useMLRR2Hold-Out
[80]Farm Characteristics and Management, Meteorological Conditions, Milk Characteristics, Milking ParametersElectricity useMLRRMSE, RPE, CCC, MSPE, MPE, rk-fold CV
[80]Farm Characteristics and Management, Meteorological Conditions, Milk CharacteristicsWater useMLRRMSE, RPE, CCC, MSPE, MPE, rk-fold CV
[81]Diet/FeedingFaeces OutputSVM, ANN, LRRMSE, norm-RMSERepeated k-fold
[81]Diet/FeedingUrine OutputSVM, ANN, LRRMSE, norm-RMSERepeated k-fold
[81]Diet/FeedingFaecal NitrogenSVM, ANN, LRRMSE, norm-RMSERepeated k-fold
[81]Diet/FeedingUrinary NitrogenSVM, ANN, LRRMSE, norm-RMSERepeated k-fold
[82]Meteorological Conditions, OtherMethane EmissionsSVM, RF, ensemble, gradient boosting, ridge regression, ANN, gaussian processes, MLR with regularization, MLRRMSE, R2, MAENested CV
[83]Farm Characteristics and Management, Meteorological Conditions, OtherManure Temperaturegradient boosted trees, bagged tree ensembles, RF, ANNMAE, RMSE, R2Train/Validation/Test
[84]Diet/Feeding, Farm Characteristics and Management, Meteorological Conditions, Soil Characteristics1Herbage Productionpredictive clustering trees, RFR2, RRMSEk-fold CV
[84]Diet/Feeding, Farm Characteristics and Management, Meteorological Conditions, Soil Characteristics1Nutrient Concentrationpredictive clustering trees, RFR2, RRMSEk-fold CV
Milk
StudyFeaturesDependentAlgorithmsEvaluation MetricsEvaluation Methods
[85]Farm Characteristics and Management, Milk Characteristics, Milking Parameters, OtherMilk Bacterial IndexC4.5, REPTree, RF, Random Tree, Hoeffding, Decision Stumps, ANN, SVM, Logistics, SMO, LWL, Kstar, KNN, Naïve Bayes, Naïve Bayes updateable, OneR, ZeroR, Adaboost, Bagging, Stacking, VotingMAPEHold-Out
[86]Cow Characteristics and Clinical Information, Meteorological ConditionsMilk ProductionANNMSETrain/Validation/Test
[63]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Diet/Feeding, Lactation Information, Meteorological Conditions, Milking ParametersMilk ParametersANNMSEHold-Out
[87]Diet/Feeding, Farm Characteristics and Management, Soil Characteristics1Milk ProductionCARTn/aTree Analysis
[88]Calving/Pregnancy Information, Diet/Feeding, Lactation Information, Milking ParametersMilk ProductionSVM, ANN, RF, MLRRMSE, MAE, R2k-fold CV
[89]Calving/Pregnancy Information, Lactation Information, Milk Characteristics, Milking ParametersMilk Quality ParametersGAM, RF, ANNMSEk-fold CV
[90]Sensor DataMilk AdulterationDT, Naïve Bayes, LDA, SVM, ANNAccuracy, Recall, Specificity, FP, FN, FPR, AUCTrain/Validation/Test
[91]Sensor DataMilk AdulterationRF, gradient boosting machine, ANNAccuracy, Specificity, RecallHold-Out
[92]Milk CharacteristicsMilk ProductionDT, ANNAccuracyk-fold CV, Hold-Out
[93]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk CharacteristicsOutlier LactationsCARTRecall, Specificity, TP, FP, PPVk-fold CV
[94]Milk CharacteristicsMilk MetabolitesRF, PLSrk-fold CV
[95]Milk CharacteristicsMilk AdulterationANNrTrain/Validation/Test
[96]Sensor DataMilk Quality ParametersANN, PLSMSETrain/Validation/Test
[97]Sensor DataMilk AdulterationCNN, RF, Gradient Boosting Machine, LR, Linear Regression, PLSAccuracy, AUCHold-Out
[98]Cow Characteristics and Clinical Information, Lactation Information, OtherMilk ProductionRF, ANN, MLRCCC, rk-fold CV, Hold-Out
[99]Cow Characteristics and Clinical Information, Lactation Information, Meteorological Conditions, Milk Characteristics, OtherFat EBVANN, neuro-fuzzy systemsRMSE, rTrain/Validation/Test
[99]Cow Characteristics and Clinical Information, Lactation Information, Meteorological Conditions, Milk Characteristics, OtherMilk EBVANN, neuro-fuzzy systemsRMSE, rTrain/Validation/Test
[100]Calving/Pregnancy Information, Farm Characteristics and Management, Lactation Information, Meteorological Conditions, Milk Characteristics, Sensor DataMilk ProductionRFRPEk-fold CV, Hold-Out
Physiology and Health
StudyFeaturesDependentAlgorithmsEvaluation MetricsEvaluation Methods
[101]Sensor DataAnimal DimensionsMLRR2, RMSE, MRAEHold-Out
[71]Cow Characteristics and Clinical InformationMilk ProductivityRidge Regression, RFR2Repeated k-fold CV
[71]Cow Characteristics and Clinical InformationRumen and Blood MetabolitesRidge Regression, RFR2Repeated k-fold CV
[102]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Farm Characteristics and Management, Lactation Information, Milk CharacteristicsLameness DetectionCART, gradient boosted machine, extreme gradient boosting, RF, Multivariate LRAUC, Recall, SpecificityRepeated k-fold CV
[103]Sensor DataLameness Detectionone-class SVMAccuracy, Specificity, RecallLOOCV
[104]Sensor DataBody Condition ScoreCNN, YOLO-v3 CNNIoU, Mean IoU, Accuracy, PPV, fps, Model SizeHold-Out
[105]Sensor DataLameness DetectionSVM, KNNAccuracy, TN, TP, FN, FPRepeated Hold-Out
[106]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk CharacteristicsMastitis DetectionRFAccuracy, Recall, Specificity, F1 Score, Cohen’s Kappa, PPV, NPVRepeated k-fold CV, Hold-Out
[107]Sensor DataBody Condition ScoreDT, ANN, Linear RegressionMAE, R2k-fold CV
[108]Sensor DataBody Condition Score3-dimensional surface fittingMAE, MBE, R2Hold-Out
[109]Sensor DataBody Condition ScoreCNNAccuracy, PPV, Recall, F1 ScoreHold-Out
[110]Cow Characteristics and Clinical Information, Diet/Feeding, Farm Characteristics and Management, Meteorological Conditions, Milk CharacteristicsHeat StressDTAccuracyHold-Out
[111]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Milk Characteristics, Sensor DataKetosis DetectionNaïve BayesAccuracy, Recall, Specificity, PPV, Youdens Index, Cohen’s Kappa, MCC, NPVk-fold CV
[112]Sensor DataBody Condition ScoreFaster R-CNNIoU, TP, TN, FP, FN, Accuracy, PPV, Average PPV, Average PPV, fpsHold-Out
[113]Cow Characteristics and Clinical InformationMastitis DetectionSVM, RF, Naïve Bayes, ANNAccuracy, AUCNested CV
[114]Sensor DataBody Condition ScoreCNN (pre-trained)Accuracy, Training Speed, Model SizeHold-Out
[115]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk CharacteristicsLameness DetectionANNAccuracyHold-Out
[116]Sensor DataMastitis DetectionGA, Supervised ANN, quick classifierCohen’s Kappa, Recall, Specificity, PPV, NPV, AccuracyRepeated Hold-Out
[117]Sensor DataLameness Detectionmulti-class SVMAccuracy, PPVk-fold CV
[118]Sensor DataBody Condition ScoreCNN, ensembleAccuracy, PPV, Recall, F1 Score,Hold-Out
[119]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk CharacteristicsBodyweightRFr, CCC, R2, RMSE, MAE, RPD, RPIQRepeated k-fold CV
[120]Milk Characteristics, Milking ParametersMastitis DetectionDT, Stump DT, Parallel DT, RFAccuracy, Info Gain, Gini Index, Gain Ratiok-fold CV
[121]Sensor DataDigital DermatitisYOLOv2 architectureAccuracy, Cohen’s KappaHold-Out
[122]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk CharacteristicsMastitis DetectionM5P Tree, ANOVAAccuracyTrain/Validation/Test
[123]Sensor DataLameness DetectionSVM, RF, KNN, DTAccuracyHold-Out
[124]Cow Characteristics and Clinical Information, Farm Characteristics and Management, Milk Characteristics, Milking ParametersMastitis DetectionC4.5AccuracyRepeated k-fold CV
[125]Sensor DataLameness DetectionRF, KNN, SVM, DTAccuracyHold-Out
[126]Milk CharacteristicsMetabolic StatusSMO, RF, alternating DT, Naïve Bayes UpdatableAccuracy, Recall, Specificity, PPV, F1 ScoreLOOA
[127]Cow Characteristics and Clinical Information, Lactation Information, Meteorological Conditions, Milk CharacteristicsHeat StressDT, MLRRecall, Specificity, Balanced Accuracy, AccuracyHold-Out
[128]Cow Characteristics and Clinical Information, Lactation InformationMastitis DetectionDT, RF, Naïve BayesAccuracy, Recall, Specificity, AUCk-fold CV, Hold-Out
[129]Milk CharacteristicsTuberculosis StatusCNNAccuracy, Specificity, PPV, NPV, Recall, MCCHold-Out
[130]Sensor DataMastitis DetectionSVM, RF, ANN, Adaboost, Naïve Bayes, LRRecall, Specificity, Accuracy, Cohen’s KappaNested CV
[131]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk Characteristics, Sensor DataLameness DetectionGradient Boosted DTAccuracy, AUC, Recall, Specificityk-fold CV, Hold-Out
[132]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Lactation Information, Milk CharacteristicsMetabolic StatusDT, Naïve Bayes, Bayesian Network, SVM, ANN, Bootstrap Aggregation, RF, KNNPPV, NPV, Recall, Specificity, Error RateRepeated k-fold CV
[133]Meteorological ConditionsRespiration Ratepenalized linear regression, RF, gradient boosted machines, ANNRMSE, MAE, R2Train/Validation/Test
[133]Meteorological ConditionsSkin Temperaturepenalized linear regression, RF, gradient boosted machines, ANNRMSE, MAE, R2Train/Validation/Test
[133]Meteorological ConditionsVaginal Temperaturepenalized linear regression, RF, gradient boosted machines, ANNRMSE, MAE, R2Train/Validation/Test
[134]Sensor DataTeat CleanlinessKNNCohen’s Kappak-fold CV, Hold-Out
[135]Milk Characteristics, Milking ParametersMastitis Detectionclassification based on associationsAccuracy, Recall, Specificity, F1 Score, PPV, AUCRepeated k-fold CV
[136]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Diet/Feeding, Lactation Information, Milk Characteristics, Milking Parameters, Sensor DataMastitis DetectionRF, Gaussian Naïve Bayes, ExtraTrees, LRPPV, AUC, Recall, SpecificityRepeated k-fold CV
[136]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Diet/Feeding, Lactation Information, Milk Characteristics, Milking Parameters, Sensor DataLameness DetectionRF, Gaussian Naïve Bayes, ExtraTrees, LRPPV, AUC, Recall, SpecificityRepeated k-fold CV
[137]Meteorological Conditions, Sensor DataHeat StressANN, Linear RegressionMean Error, RMSE, R2Train/Validation/Test
[138]Cow Characteristics and Clinical Information, Diet/Feeding, Sensor DataNoxious EventsRF, SVM, DT, KNN, Naïve BayesPPV, NPV, AccuracyHold-Out
[139]Sensor DataHeat StressLSTMMAE, RMSETrain/Validation/Test
[140]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Diet/Feeding, Lactation Information, Meteorological Conditions, Milk Characteristics, Milking Parameters, Other, Sensor DataMastitis DetectionRF, SVM, KNN, Gaussian Naïve Bayes, Extra Trees Classifier, LRAUC, Recall, Specificity, Accuracy, PPV, F1 ScoreHold-Out
[140]Calving/Pregnancy Information, Cow Characteristics and Clinical Information, Diet/Feeding, Lactation Information, Meteorological Conditions, Milk Characteristics, Milking Parameters, Other, Sensor DataLameness DetectionRF, SVM, KNN, Gaussian Naïve Bayes, Extra Trees Classifier, LRAUC, Recall, Specificity, Accuracy, PPV, F1 ScoreHold-Out
[141]Calving/Pregnancy Information, Lactation Information, Milk CharacteristicsBodyweightPLS RegressionRMSEk-fold CV, Hold-Out
a Description of algorithm abbreviations can be found in Appendix D. b AR = Averaged Recall Score; AUC = area under the ROC curve; CCC = concordance correlation coefficient; FDR = False Discovery Rate; FPR = False Positive Rate; FN = false negative; FP = false positive; fps = Frame per Second; IoU = Intersection over Union; mAP = Averaged Precision Score; MAE = mean absolute error; MAPE = mean absolute percentage error; MBE = Mean Bias Error; MCC = Matthew’s Correlation Coefficient; MPE = mean percentage error; MSE = mean square error; MSPE = mean square percentage error; NPV = negative predictive value; PPV = positive predictive value; R2 = coefficient of determination; RMSE = root mean squared error; RPE = relative prediction error; RPD = the ratio of performance to deviation; RPIQ = the ratio of performance to the interquartile range; TN = True Negative; TP = True Positive. c LOOA = leave-out-one-animal; LOOCV = leave-one-out cross-validation; Nested CV = nested cross-validation; k-fold CV = k-fold cross-validation.

References

  1. Gerber, P.J.; Steinfeld, H.; Henderson, B.; Mottet, A.; Opio, C.; Dijkman, J.; Falcucci, A.; Tempio, G. Tackling Climate Change through Livestock—A Global Assessment of Emissions and Mitigation Opportunities; Food and Agriculture Organization of the United Nations (FAO): Rome, Italy, 2013. [Google Scholar]
  2. OECD/FAO. Agricultural Outlook 2018–2027; Paris/Food and Agriculture Organization of the United Nations, OECD Publishing: Rome, Italy, 2018. [Google Scholar]
  3. Clay, N.; Garnett, T.; Lorimer, J. Dairy intensification: Drivers, impacts and alternatives. Ambio 2020, 49, 35–48. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Tang, Y.; Dananjayan, S.; Hou, C.; Guo, Q.; Luo, S.; He, Y. A survey on the 5G network and its impact on agriculture: Challenges and opportunities. Comput. Electron. Agric. 2021, 180, 105895. [Google Scholar] [CrossRef]
  5. Caraviello, D.Z.; Weigel, K.A.; Craven, M.; Gianola, D.; Cook, N.B.; Nordlund, K.V.; Fricke, P.M.; Wiltbank, M.C. Analysis of Reproductive Performance of Lactating Cows on Large Dairy Farms Using Machine Learning Algorithms. J. Dairy Sci. 2006, 89, 4703–4722. [Google Scholar] [CrossRef]
  6. Bishop, C.M. Pattern Recognition and Machine Learning, 2nd ed.; Jordan, M., Kleinberg, J., Scholkopf, B., Eds.; Springer: Cambridge, UK, 2006; ISBN 0-387-31073-8. [Google Scholar]
  7. Jeong, J.H.; Resop, J.P.; Mueller, N.D.; Fleisher, D.H.; Yun, K.; Butler, E.E.; Timlin, D.J.; Shim, K.M.; Gerber, J.S.; Reddy, V.R.; et al. Random forests for global and regional crop yield predictions. PLoS ONE 2016, 11, e0156571. [Google Scholar] [CrossRef]
  8. Bermúdez-Chacón, R.; Gonnet, G.H.; Smith, K. Automatic Problem-Specific Hyperparameter Optimization and Model Selection for Supervised Machine Learning: Technical Report; ETH Zurich: Zurich, Switzerland, 2015. [Google Scholar]
  9. Cockburn, M. Review: Application and prospective discussion of machine learning for the management of dairy farms. Animals 2020, 10, 1690. [Google Scholar] [CrossRef]
  10. Slob, N.; Catal, C.; Kassahun, A. Application of machine learning to improve dairy farm management: A systematic literature review. Prev. Vet. Med. 2021, 187, 105237. [Google Scholar] [CrossRef]
  11. Kitchenham, B.; Charters, S.; Budgen, D.; Brereton, P.; Turner, M.; Linkman, S.; Jorgensen, M.; Mendes, E.; Visaggio, G. Guidelines for Performing Systematic Literature Reviews in Software Engineering. 2007, Volume 2.3. Available online: http://citeseerx.ist.psu.edu/viewdoc/citations?doi=10.1.1.117.471 (accessed on 9 June 2021).
  12. Benos, L.; Tagarakis, A.C.; Dolias, G.; Berruto, R.; Kateris, D. Machine Learning in Agriculture: A Comprehensive Updated Review. Sensors 2021, 21, 3758. [Google Scholar] [CrossRef]
  13. Page, M.; McKenzie, J.; Bossuyt, P.; Boutron, I.; Hoffmann, T.; Mulrow, C. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
  14. Fenlon, C.; O’Grady, L.; Mee, J.F.; Butler, S.T.; Doherty, M.L.; Dunnion, J. A comparison of 4 predictive models of calving assistance and difficulty in dairy heifers and cows. J. Dairy Sci. 2017, 100, 9746–9758. [Google Scholar] [CrossRef] [Green Version]
  15. Bates, A.J.; Saldias, B. A comparison of machine learning and logistic regression in modelling the association of body condition score and submission rate. Prev. Vet. Med. 2019, 171, 104765. [Google Scholar] [CrossRef]
  16. Cairo, F.C.; Pereira, L.G.R.; Campos, M.M.; Tomich, T.R.; Coelho, S.G.; Lage, C.F.A.; Fonseca, A.P.; Borges, A.M.; Alves, B.R.C.; Dorea, J.R.R. Applying machine learning techniques on feeding behavior data for early estrus detection in dairy heifers. Comput. Electron. Agric. 2020, 179, 105855. [Google Scholar] [CrossRef]
  17. Bogado Pascottini, O.; Probo, M.; LeBlanc, S.J.; Opsomer, G.; Hostens, M. Assessment of associations between transition diseases and reproductive performance of dairy cows using survival analysis and decision tree algorithms. Prev. Vet. Med. 2020, 176, 104908. [Google Scholar] [CrossRef]
  18. Van der Heide, E.M.M.; Veerkamp, R.F.; van Pelt, M.L.; Kamphuis, C.; Athanasiadis, I.; Ducro, B.J. Comparing regression, naive Bayes, and random forest methods in the prediction of individual survival to second lactation in Holstein cattle. J. Dairy Sci. 2019, 102, 9409–9421. [Google Scholar] [CrossRef]
  19. Jiménez-Montero, J.A.; González-Recio, O.; Alenda, R. Comparison of methods for the implementation of genome-assisted evaluation of Spanish dairy cattle. J. Dairy Sci. 2013, 96, 625–634. [Google Scholar] [CrossRef] [Green Version]
  20. Hemalatha, R.; SonaShree, S.; Thamizhvani, T.; Vijayabaskar, V. Detection Of Estrus In Bovine Using Machine Learning. In Proceedings of the 2021 Seventh International Conference on Bio Signals, Images, and Instrumentation (ICBSII), Kalavakkam, India, 25–27 March 2021; pp. 1–5. [Google Scholar]
  21. Grzesiak, W.; Zaborski, D.; Sablik, P.; Zukiewicz, A.; Dybus, A.; Szatkowska, I. Detection of cows with insemination problems using selected classification models. Comput. Electron. Agric. 2010, 74, 265–273. [Google Scholar] [CrossRef]
  22. Keceli, A.S.; Catal, C.; Kaya, A.; Tekinerdogan, B. Development of a recurrent neural networks-based calving prediction model using activity and behavioral data. Comput. Electron. Agric. 2020, 170, 105285. [Google Scholar] [CrossRef]
  23. Romadhonny, R.A.; Gumelar, A.B.; Fahrudin, T.M.; Adi Setiawan, W.P.; Cahaya Putra, F.D.; Nugroho, R.D.; Budiani, J.R. Estrous Cycle Prediction of Dairy Cows for Planned Artificial Insemination (AI) Using Multiple Logistic Regression. In Proceedings of the 2019 International Seminar on Application for Technology of Information and Communication: Industry 4.0: Retrospect, Prospect, and Challenges, Semarang, Indonesia, 21–22 September 2019; pp. 157–162. [Google Scholar]
  24. Schweinzer, V.; Gusterer, E.; Kanz, P.; Krieger, S.; Süss, D.; Lidauer, L.; Berger, A.; Kickinger, F.; Öhlschuster, M.; Auer, W.; et al. Evaluation of an ear-attached accelerometer for detecting estrus events in indoor housed dairy cows. Theriogenology 2019, 130, 19–25. [Google Scholar] [CrossRef]
  25. Shahriar, M.S.; Smith, D.; Rahman, A.; Henry, D.; Bishop-Hurley, G.; Rawnsley, R.; Freeman, M.; Hills, J. Heat event detection in dairy cows with collar sensors: An unsupervised machine learning approach. In Proceedings of the 2015 IEEE SENSORS, Busan, Korea, 1–4 November 2015. [Google Scholar]
  26. Van der Heide, E.M.M.; Kamphuis, C.; Veerkamp, R.F.; Athanasiadis, I.N.; Azzopardi, G.; van Pelt, M.L.; Ducro, B.J. Improving predictive performance on survival in dairy cattle using an ensemble learning approach. Comput. Electron. Agric. 2020, 177, 105675. [Google Scholar] [CrossRef]
  27. Hempstalk, K.; McParland, S.; Berry, D.P. Machine learning algorithms for the prediction of conception success to a given insemination in lactating dairy cows. J. Dairy Sci. 2015, 98, 5262–5273. [Google Scholar] [CrossRef] [Green Version]
  28. Keshavarzi, H.; Sadeghi-Sefidmazgi, A.; Mirzaei, A.; Ravanifard, R. Machine learning algorithms, bull genetic information, and imbalanced datasets used in abortion incidence prediction models for Iranian Holstein dairy cattle. Prev. Vet. Med. 2020, 175, 104869. [Google Scholar] [CrossRef]
  29. Wang, J.; Bell, M.; Liu, X.; Liu, G. Machine-learning techniques can enhance dairy cow estrus detection using location and acceleration data. Animals 2020, 10, 1160. [Google Scholar] [CrossRef] [PubMed]
  30. Borchers, M.R.; Chang, Y.M.; Proudfoot, K.L.; Wadsworth, B.A.; Stone, A.E.; Bewley, J.M. Machine-learning-based calving prediction from activity, lying, and ruminating behaviors in dairy cattle. J. Dairy Sci. 2017, 100, 5664–5674. [Google Scholar] [CrossRef] [PubMed]
  31. Schefers, J.M.; Weigel, K.A.; Rawson, C.L.; Zwald, N.R.; Cook, N.B. Management practices associated with conception rate and service rate of lactating Holstein cows in large, commercial dairy herds. J. Dairy Sci. 2010, 93, 1459–1467. [Google Scholar] [CrossRef] [PubMed]
  32. Ma, N.; Pan, L.; Chen, S.; Liu, B. NB-IoT estrus detection system of dairy cows based on LSTM networks. In Proceedings of the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC, London, UK, 31 August–3 September 2020. [Google Scholar]
  33. Brand, W.; Wells, A.T.; Smith, S.L.; Denholm, S.J.; Wall, E.; Coffey, M.P. Predicting pregnancy status from mid-infrared spectroscopy in dairy cow milk using deep learning. J. Dairy Sci. 2021, 104, 4980–4990. [Google Scholar] [CrossRef]
  34. Shahinfar, S.; Page, D.; Guenther, J.; Cabrera, V.; Fricke, P.; Weigel, K. Prediction of insemination outcomes in Holstein dairy cattle using alternative machine learning algorithms. J. Dairy Sci. 2014, 97, 731–742. [Google Scholar] [CrossRef] [Green Version]
  35. Gargiulo, G.D.; Shephard, R.W.; Tapson, J.; McEwan, A.L.; Bifulco, P.; Cesarelli, M.; Jin, C.; Al-Ani, A.; Wang, N.; van Schaik, A. Pregnancy detection and monitoring in cattle via combined foetus electrocardiogram and phonocardiogram signal processing. BMC Vet. Res. 2012, 8, 164. [Google Scholar] [CrossRef] [Green Version]
  36. Heitmann, B.; Augustin, D.-S.; Hayes, C. Regression Techniques for Modelling Conception in Seasonally Calving Dairy Cows. In Proceedings of the 16th International Conference on Data Mining Workshops, Barcelona, Spain, 12–15 December 2016; pp. 1191–1196. [Google Scholar]
  37. Fenlon, C.; O’Grady, L.; Butler, S.; Doherty, M.L.; Dunnion, J. The creation and evaluation of a model to simulate the probability of conception in seasonal-calving pasture-based dairy heifers. Ir. Vet. J. 2017, 70, 32. [Google Scholar] [CrossRef] [Green Version]
  38. Miller, G.A.; Mitchell, M.; Barker, Z.E.; Giebel, K.; Codling, E.A.; Amory, J.R.; Michie, C.; Davison, C.; Tachtatzis, C.; Andonovic, I.; et al. Using animal-mounted sensor technology and machine learning to predict time-to-calving in beef and dairy cows. Animal 2020, 14, 1304–1312. [Google Scholar] [CrossRef]
  39. Williams, M.L.; Mac Parthaláin, N.; Brewer, P.; James, W.P.J.; Rose, M.T. A novel behavioral model of the pasture-based dairy cow from GPS data using data mining and machine learning techniques. J. Dairy Sci. 2016, 99, 2063–2075. [Google Scholar] [CrossRef] [Green Version]
  40. Ono, Y.; Ohwada, H.; Nishiyama, H. Status discrimination of dairy cows using activity meter and machine learning. In Proceedings of the 33rd International Conference on Computers and Their Applications, CATA, Las Vegas, NV, USA, 19–21 March 2018. [Google Scholar]
  41. Chelotti, J.O.; Vanrell, S.R.; Galli, J.R.; Giovanini, L.L.; Rufiner, H.L. A pattern recognition approach for detecting and classifying jaw movements in grazing cattle. Comput. Electron. Agric. 2018, 145, 83–91. [Google Scholar] [CrossRef]
  42. Andrew, W.; Greatwood, C.; Burghardt, T. Aerial Animal Biometrics: Individual Friesian Cattle Recovery and Visual Identification via an Autonomous UAV with Onboard Deep Inference. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems, Macau, China, 4–6 November 2019. [Google Scholar]
  43. Shen, W.; Cheng, F.; Zhang, Y.; Wei, X.; Fu, Q.; Zhang, Y. Automatic recognition of ingestive-related behaviors of dairy cows based on triaxial acceleration. Inf. Process. Agric. 2020, 7, 427–443. [Google Scholar] [CrossRef]
  44. Smith, D.; Rahman, A.; Bishop-Hurley, G.J.; Hills, J.; Shahriar, S.; Henry, D.; Rawnsley, R. Behavior classification of cows fitted with motion collars: Decomposing multi-class classification into a set of binary problems. Comput. Electron. Agric. 2016, 131, 40–50. [Google Scholar] [CrossRef]
  45. Dolecheck, K.A.; Silvia, W.J.; Heersche, G.; Chang, Y.M.; Ray, D.L.; Stone, A.E.; Wadsworth, B.A.; Bewley, J.M. Behavioral and physiological changes around estrus events identified using multiple automated monitoring technologies. J. Dairy Sci. 2015, 98, 8723–8731. [Google Scholar] [CrossRef]
  46. Dutta, R.; Smith, D.; Rawnsley, R.; Bishop-Hurley, G.; Hills, J. Cattle behaviour classification using 3-axis collar sensor and multi-classifier pattern recognition. In Proceedings of the IEEE Sensors, Valencia, Spain, 2–5 November 2014. [Google Scholar]
  47. Benaissa, S.; Tuyttens, F.A.M.; Plets, D.; Cattrysse, H.; Martens, L.; Vandaele, L.; Joseph, W.; Sonck, B. Classification of ingestive-related cow behaviours using RumiWatch halter and neck-mounted accelerometers. Appl. Anim. Behav. Sci. 2019, 211, 9–16. [Google Scholar] [CrossRef] [Green Version]
  48. Pratama, Y.P.; Kurnia Basuki, D.; Sukaridhoto, S.; Yusuf, A.A.; Yulianus, H.; Faruq, F.; Putra, F.B. Designing of a Smart Collar for Dairy Cow Behavior Monitoring with Application Monitoring in Microservices and Internet of Things-Based Systems. In Proceedings of the IES 2019—International Electronics Symposium: The Role of Techno-Intelligence in Creating an Open Energy System Towards Energy Democracy, Piscataway, NJ, USA, 27–28 September 2019; pp. 527–533. [Google Scholar]
  49. Salau, J.; Haas, J.H.; Junge, W.; Thaller, G. Determination of body parts in holstein friesian cows comparing neural networks and k nearest neighbour classification. Animals 2021, 11, 50. [Google Scholar] [CrossRef]
  50. Busch, P.; Stupmann, F.; Ewald, H. Determination of cattle standing-time with decscion trees and neural nets by using only acceleration data from collar. In Proceedings of the 2018 2nd European Conference on Electrical Engineering and Computer Science, EECS, Bern, Switzerland, 20–22 December 2018; pp. 178–182. [Google Scholar]
  51. Riaboff, L.; Poggi, S.; Madouasse, A.; Couvreur, S.; Aubin, S.; Bédère, N.; Goumand, E.; Chauvin, A.; Plantier, G. Development of a methodological framework for a robust prediction of the main behaviours of dairy cows using a combination of machine learning algorithms on accelerometer data. Comput. Electron. Agric. 2020, 169, 105179. [Google Scholar] [CrossRef]
  52. Dutta, R.; Smith, D.; Rawnsley, R.; Bishop-Hurley, G.; Hills, J.; Timms, G.; Henry, D. Dynamic cattle behavioural classification using supervised ensemble classifiers. Comput. Electron. Agric. 2015, 111, 18–28. [Google Scholar] [CrossRef]
  53. Ismail, Z.H.; Chun, A.K.K.; Shapiai Razak, M.I. Efficient Herd—Outlier Detection in Livestock Monitoring System Based on Density—Based Spatial Clustering. IEEE Access 2019, 7, 175062–175070. [Google Scholar] [CrossRef]
  54. Salau, J.; Krieter, J. Instance segmentation with mask R-CNN applied to loose-housed dairy cows in a multi-camera setting. Animals 2020, 10, 2402. [Google Scholar] [CrossRef]
  55. Wa Maina, C. IoT at the grassroots—Exploring the use of sensors for livestock monitoring. In Proceedings of the 2017 IST-Africa Week Conference, IST-Africa 2017, Windhoek, Namibia, 31 May–2 June 2017; pp. 1–8. [Google Scholar]
  56. Carslake, C.; Vázquez-Diosdado, J.A.; Kaler, J. Machine learning algorithms to classify and quantify multiple behaviours in dairy calves using a sensor–moving beyond classification in precision livestock. Sensors 2021, 21, 88. [Google Scholar] [CrossRef]
  57. Hunter, L.B.; Baten, A.; Haskell, M.J.; Langford, F.M.; O’Connor, C.; Webster, J.R.; Stafford, K. Machine learning prediction of sleep stages in dairy cows from heart rate and muscle activity measures. Sci. Rep. 2021, 11, 10938. [Google Scholar] [CrossRef]
  58. Wagner, N.; Antoine, V.; Mialon, M.M.; Lardy, R.; Silberberg, M.; Koko, J.; Veissier, I. Machine learning to detect behavioural anomalies in dairy cows under subacute ruminal acidosis. Comput. Electron. Agric. 2020, 170, 105233. [Google Scholar] [CrossRef]
  59. Benaissa, S.; Tuyttens, F.A.M.; Plets, D.; de Pessemier, T.; Trogh, J.; Tanghe, E.; Martens, L.; Vandaele, L.; Van Nuffel, A.; Joseph, W.; et al. On the use of on-cow accelerometers for the classification of behaviours in dairy barns. Res. Vet. Sci. 2019, 125, 425–433. [Google Scholar] [CrossRef] [Green Version]
  60. Ren, K.; Bernes, G.; Hetta, M.; Karlsson, J. Tracking and analysing social interactions in dairy cattle with real-time locating system and machine learning. J. Syst. Archit. 2021, 116, 102139. [Google Scholar] [CrossRef]
  61. Schilling, B.; Bahmani, K.; Li, B.; Banerjee, S.; Smith, J.S.; Moshier, T.; Schuckers, S. Validation of biometric identification of dairy cows based on udder NIR images. In Proceedings of the 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), Redondo Beach, CA, USA, 22–25 October 2018. [Google Scholar]
  62. Williams, M.L.; James, W.P.; Rose, M.T. Variable segmentation and ensemble classifiers for predicting dairy cow behaviour. Biosyst. Eng. 2019, 178, 156–167. [Google Scholar] [CrossRef]
  63. Fuentes, S.; Viejo, C.G.; Cullen, B.; Tongson, E.; Chauhan, S.S.; Dunshea, F.R. Artificial intelligence applied to a robotic dairy farm to model milk productivity and quality based on cow data and daily environmental parameters. Sensors 2020, 20, 2975. [Google Scholar] [CrossRef]
  64. Craninx, M.; Fievez, V.; Vlaeminck, B.; De Baets, B. Artificial neural network models of the rumen fermentation pattern in dairy cattle. Comput. Electron. Agric. 2008, 60, 226–238. [Google Scholar] [CrossRef]
  65. Shafiullah, A.Z.; Werner, J.; Kennedy, E.; Leso, L.; O’brien, B.; Umstätter, C. Machine learning based prediction of insufficient herbage allowance with automated feeding behaviour and activity data. Sensors 2019, 19, 4479. [Google Scholar] [CrossRef] [Green Version]
  66. Dórea, J.R.R.; Rosa, G.J.M.; Weld, K.A.; Armentano, L.E. Mining data from milk infrared spectroscopy to improve feed intake predictions in lactating dairy cows. J. Dairy Sci. 2018, 101, 5878–5889. [Google Scholar] [CrossRef] [Green Version]
  67. Tedde, A.; Grelet, C.; Ho, P.N.; Pryce, J.E.; Hailemariam, D.; Wang, Z.; Plastow, G.; Gengler, N.; Froidmont, E.; Dehareng, F.; et al. Multiple country approach to improve the test-day prediction of dairy cows’ dry matter intake. Animals 2021, 11, 1316. [Google Scholar] [CrossRef]
  68. Fu, Q.; Shen, W.; Wei, X.; Zhang, Y.; Xin, H.; Su, Z.; Zhao, C. Prediction of the diet energy digestion using kernel extreme learning machine: A case study with Holstein dry cows. Comput. Electron. Agric. 2020, 169, 105231. [Google Scholar] [CrossRef]
  69. Chen, Z.; Cheng, X.; Wang, X.; Han, M. Recognition method of dairy cow feeding behavior based on convolutional neural network. J. Phys. Conf. Ser. 2020, 1693, 012166. [Google Scholar] [CrossRef]
  70. Yao, C.; Zhu, X.; Weigel, K.A. Semi-supervised learning for genomic prediction of novel traits with small reference populations: An application to residual feed intake in dairy cattle. Genet. Sel. Evol. 2016, 48, 84. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  71. John Wallace, R.; Sasson, G.; Garnsworthy, P.C.; Tapio, I.; Gregson, E.; Bani, P.; Huhtanen, P.; Bayat, A.R.; Strozzi, F.; Biscarini, F.; et al. A heritable subset of the core rumen microbiome dictates dairy cow productivity and emissions. Sci. Adv. 2019, 5, eaav8391. [Google Scholar] [CrossRef] [Green Version]
  72. Shine, P.; Scully, T.; Upton, J.; Murphy, M.D.M. Annual electricity consumption prediction and future expansion analysis on dairy farms using a support vector machine. Appl. Energy 2019, 250, 1110–1119. [Google Scholar] [CrossRef]
  73. Sefeedpari, P.; Rafiee, S.; Akram, A. Application of artificial neural network to model the energy output of dairy farms in Iran. Int. J. Energy Technol. Policy 2013, 9, 82. [Google Scholar] [CrossRef]
  74. Shine, P.; Upton, J.; Scully, T.; Shalloo, L.; Murphy, M.D. Comparing multiple linear regression and support vector machine models for predicting electricity consumption on pasture based dairy farms. In Proceedings of the Annual International Meeting; American Society of Agricultural and Biological Engineers, Detroit, MI, USA, 29 July–1 August 2018. [Google Scholar]
  75. Todde, G.; Murgia, L.; Caria, M.; Pazzona, A. Dairy Energy Prediction (DEP) model: A tool for predicting energy use and related emissions and costs in dairy farms. Comput. Electron. Agric. 2017, 135, 216–221. [Google Scholar] [CrossRef]
  76. Shine, P.; Murphy, M.D.; Upton, J.; Scully, T. Machine-learning algorithms for predicting on-farm direct water and electricity consumption on pasture based dairy farms. Comput. Electron. Agric. 2018, 150, 74–87. [Google Scholar] [CrossRef]
  77. Sefeedpari, P.; Rafiee, S.; Akram, A.; Chau, K.-W.; Komleh, S.H.P. Modeling Energy Use in Dairy Cattle Farms by Applying Multi-Layered Adaptive Neuro-Fuzzy Inference System (MLANFIS). Int. J. Dairy Sci. 2015, 10, 173–185. [Google Scholar] [CrossRef]
  78. Sefeedpari, P.; Rafiee, S.; Akram, A.; Komleh, S.H.P. Modeling output energy based on fossil fuels and electricity energy consumption on dairy farms of Iran: Application of adaptive neural-fuzzy inference system technique. Comput. Electron. Agric. 2014, 109, 80–85. [Google Scholar] [CrossRef]
  79. Mhundwa, R.; Simon, M.; Tangwe, S.L. Modelling of an on-farm direct expansion bulk milk cooler to establish baseline energy consumption without milk pre-cooling: A case of Fort Hare Dairy Trust, South Africa. African J. Sci. Technol. Innov. Dev. 2017, 1338, 62–68. [Google Scholar] [CrossRef]
  80. Shine, P.; Scully, T.; Upton, J.; Murphy, M.D. Multiple linear regression modelling of on-farm direct water and electricity consumption on pasture based dairy farms. Comput. Electron. Agric. 2018, 148, 337–346. [Google Scholar] [CrossRef] [Green Version]
  81. Fu, Q.; Shen, W.; Wei, X.; Yin, Y.; Zheng, P.; Zhang, Y.; Su, Z.; Zhao, C. Predicting the excretion of feces, urine and nitrogen using support vector regression: A case study with holstein dry cows. Int. J. Agric. Biol. Eng. 2020, 13, 48–56. [Google Scholar] [CrossRef]
  82. Hempel, S.; Adolphs, J.; Landwehr, N.; Willink, D.; Janke, D.; Amon, T. Supervised machine learning to assess methane emissions of a dairy building with natural ventilation. Appl. Sci. 2020, 10, 6938. [Google Scholar] [CrossRef]
  83. Genedy, R.A.; Ogejo, J.A. Using machine learning techniques to predict liquid dairy manure temperature during storage. Comput. Electron. Agric. 2021, 187, 106234. [Google Scholar] [CrossRef]
  84. Nikoloski, S.; Murphy, P.; Kocev, D.; Džeroski, S.; Wall, D.P. Using machine learning to estimate herbage production and nutrient uptake on Irish dairy farms. J. Dairy Sci. 2019, 102, 10639–10656. [Google Scholar] [CrossRef] [Green Version]
  85. Zakeri, A.; Saberi, M.; Hussain, O.K.; Chang, E. An early detection system for proactive management of raw milk quality: An australian case study. IEEE Access 2018, 6, 64333–64349. [Google Scholar] [CrossRef]
  86. Sugiono, S.; Soenoko, R.; Andriani, D.P. Analysis the relationship of physiological, environmental, and cow milk productivity using AI. In Proceedings of the International Conference on Data and Software Engineering, ICoDSE, Denpasar, Indonesia, 26–27 October 2016. [Google Scholar]
  87. Zegler, C.H.; Renz, M.J.; Brink, G.E.; Ruark, M.D. Assessing the importance of plant, soil, and management factors affecting potential milk production on organic pastures using regression tree analysis. Agric. Syst. 2020, 180, 102776. [Google Scholar] [CrossRef]
  88. Nguyen, Q.T.; Fouchereau, R.; Frénod, E.; Gerard, C.; Sincholle, V. Comparison of forecast models of production of dairy cows combining animal and diet parameters. Comput. Electron. Agric. 2020, 170, 105258. [Google Scholar] [CrossRef] [Green Version]
  89. Anglart, D.; Hallén-Sandgren, C.; Emanuelson, U.; Rönnegård, L. Comparison of methods for predicting cow composite somatic cell counts. J. Dairy Sci. 2020, 103, 8433–8442. [Google Scholar] [CrossRef]
  90. Sowmya, N.; Ponnusamy, V. Development of Spectroscopic Sensor System for an IoT Application of Adulteration Identification on Milk Using Machine Learning. IEEE Access 2021, 9, 53979–53995. [Google Scholar] [CrossRef]
  91. Farah, J.S.; Cavalcanti, R.N.; Guimarães, J.T.; Balthazar, C.F.; Coimbra, P.T.; Pimentel, T.C.; Esmerino, E.A.; Duarte, M.C.K.H.; Freitas, M.Q.; Granato, D.; et al. Differential scanning calorimetry coupled with machine learning technique: An effective approach to determine the milk authenticity. Food Control. 2021, 121, 107585. [Google Scholar] [CrossRef]
  92. Rodríguez, E.; Waissman, J.; Mahadevan, P.; Villa, C.; Flores, B.L.; Villa, R. Genome-wide classification of dairy cows using decision trees and artificial neural network algorithms. Genet. Mol. Res. 2019, 18, gmr18407. [Google Scholar] [CrossRef]
  93. Pietersma, D.; Lacroix, R.; Lefebvre, D.; Wade, K.M. Induction and evaluation of decision trees for lactation curve analysis. Comput. Electron. Agric. 2003, 38, 19–32. [Google Scholar] [CrossRef]
  94. Melzer, N.; Wittenburg, D.; Hartwig, S.; Jakubowski, S.; Kesting, U.; Willmitzer, L.; Lisec, J.; Reinsch, N.; Repsilber, D. Investigating associations between milk metabolite profiles and milk traits of Holstein cows. J. Dairy Sci. 2013, 96, 1521–1534. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  95. Condé, V.A.; Silva Valente, G.d.F.; Minighin, E.C. Milk fraud by the addition of whey using an artificial neural network. Cienc. Rural 2020, 50, 1–8. [Google Scholar] [CrossRef]
  96. Muñiz, R.; Cuevas-Valdés, M.; de la Roza-Delgado, B. Milk quality control requirement evaluation using a handheld near infrared reflectance spectrophotometer and a bespoke mobile application. J. Food Compos. Anal. 2020, 86, 103388. [Google Scholar] [CrossRef]
  97. Neto, H.A.; Tavares, W.L.F.; Ribeiro, D.C.S.Z.; Alves, R.C.O.; Fonseca, L.M.; Campos, S.V.A. On the utilization of deep and ensemble learning to detect milk adulteration. BioData Min. 2019, 12, 13. [Google Scholar] [CrossRef] [PubMed]
  98. Dallago, G.M.; de Figueiredo, D.M.; Andrade, P.C.d.R.; dos Santos, R.A.; Lacroix, R.; Santschi, D.E.; Lefebvre, D.M. Predicting first test day milk yield of dairy heifers. Comput. Electron. Agric. 2019, 166, 105032. [Google Scholar] [CrossRef]
  99. Shahinfar, S.; Mehrabani-Yeganeh, H.; Lucas, C.; Kalhor, A.; Kazemian, M.; Weigel, K.A. Prediction of breeding values for dairy cattle using artificial neural networks and neuro-fuzzy systems. Comput. Math. Methods Med. 2012, 2012, 127130. [Google Scholar] [CrossRef]
  100. Bovo, M.; Agrusti, M.; Benni, S.; Torreggiani, D.; Tassinari, P. Random Forest Modelling of Milk Yield of Dairy Cows under Heat Stress Conditions. Animals 2021, 11, 1305. [Google Scholar] [CrossRef]
  101. Nir, O.; Parmet, Y.; Werner, D.; Adin, G.; Halachmi, I. 3D Computer-vision system for automatically estimating heifer height and body mass. Biosyst. Eng. 2018, 173, 4–10. [Google Scholar] [CrossRef]
  102. Warner, D.; Vasseur, E.; Lefebvre, D.M.; Lacroix, R. A machine learning based decision aid for lameness in dairy herds using farm-based records. Comput. Electron. Agric. 2020, 169, 105193. [Google Scholar] [CrossRef]
  103. Haladjian, J.; Haug, J.; Nüske, S.; Bruegge, B. A wearable sensor system for lameness detection in dairy cattle. Multimodal Technol. Interact. 2018, 2, 27. [Google Scholar] [CrossRef] [Green Version]
  104. Huang, X.; Hu, Z.; Wang, X.; Yang, X.; Zhang, J.; Shi, D. An improved single shot multibox detector method applied in body condition score for dairy cows. Animals 2019, 9, 470. [Google Scholar] [CrossRef] [Green Version]
  105. Shrestha, A.; Loukas, C.; Le Kernec, J.; Fioranelli, F.; Busin, V.; Jonsson, N.; King, G.; Tomlinson, M.; Viora, L.; Voute, L. Animal lameness detection with radar sensing. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1189–1193. [Google Scholar] [CrossRef]
  106. Hyde, R.M.; Down, P.M.; Bradley, A.J.; Breen, J.E.; Hudson, C.; Leach, K.A.; Green, M.J. Automated prediction of mastitis infection patterns in dairy herds using machine learning. Sci. Rep. 2020, 10, 4289. [Google Scholar] [CrossRef] [Green Version]
  107. Zhao, K.; Shelley, A.N.; Lau, D.L.; Dolecheck, K.A.; Bewley, J.M. Automatic body condition scoring system for dairy cows based on depth-image analysis. Int. J. Agric. Biol. Eng. 2020, 13, 45–54. [Google Scholar] [CrossRef]
  108. Li, W.Y.; Shen, Y.; Wang, D.J.; Yang, Z.K.; Yang, X.T. Automatic dairy cow body condition scoring using depth images and 3D surface fitting. In Proceedings of the 2019 IEEE International Conference on Unmanned Systems and Artificial Intelligence (ICUSAI), Xi’an, China, 22–24 November 2019; pp. 155–159. [Google Scholar]
  109. Rodríguez Alvarez, J.; Arroqui, M.; Mangudo, P.; Toloza, J.; Jatip, D.; Rodríguez, J.M.; Teyseyre, A.; Sanz, C.; Zunino, A.; Machado, C.; et al. Body condition estimation on cows from depth images using Convolutional Neural Networks. Comput. Electron. Agric. 2018, 155, 12–22. [Google Scholar] [CrossRef] [Green Version]
  110. Ilapakurti, A.; Vuppalapati, C. Building an IoT framework for connected dairy. In Proceedings of the 2015 IEE First International Conference on Big Data Computing Service and Applications (BigDataService), Redwood City, CA, USA, 30 March–2 April 2015; pp. 275–285. [Google Scholar] [CrossRef]
  111. Sturm, V.; Efrosinin, D.; Öhlschuster, M.; Gusterer, E.; Drillich, M.; Iwersen, M. Combination of sensor data and health monitoring for early detection of subclinical Ketosis in dairy cows. Sensors 2020, 20, 1484. [Google Scholar] [CrossRef] [Green Version]
  112. Huang, X.; Li, X.; Hu, Z. Cow tail detection method for body condition score using Faster R-CNN. In Proceedings of the 2019 IEEE International Conference on Unmanned Systems and Artificial Intelligence (ICUSAI), Xi’an, China, 22–24 November 2019; pp. 347–351. [Google Scholar]
  113. Dhoble, A.S.; Ryan, K.T.; Lahiri, P.; Chen, M.; Pang, X.; Cardoso, F.C.; Bhalerao, K.D. Cytometric fingerprinting and machine learning (CFML): A novel label-free, objective method for routine mastitis screening. Comput. Electron. Agric. 2019, 162, 505–513. [Google Scholar] [CrossRef]
  114. Cevik, K.K. Deep Learning Based Real-Time Body Condition Score Classification System. IEEE Access 2020, 8, 213950–213957. [Google Scholar] [CrossRef]
  115. Gupta, R.K.; Lathwal, S.S.; Mohanty, T.K.; Ruhil, A.P.; Singh, Y. Detection of lameness of cow based on body weight using artificial neural network. In Proceedings of the 2014 International Conference on Computing for Sustainable Global Development, INDIACom 2014, New Delhi, India, 5–7 March 2014. [Google Scholar]
  116. Esener, N.; Green, M.J.; Emes, R.D.; Jowett, B.; Davies, P.L.; Bradley, A.J.; Dottorini, T. Discrimination of contagious and environmental strains of Streptococcus uberis in dairy herds by means of mass spectrometry and machine-learning. Sci. Rep. 2018, 8, 17517. [Google Scholar] [CrossRef] [Green Version]
  117. Alsaaod, M.; Römer, C.; Kleinmanns, J.; Hendriksen, K.; Rose-Meierhöfer, S.; Plümer, L.; Büscher, W. Electronic detection of lameness in dairy cows through measuring pedometric activity and lying behavior. Appl. Anim. Behav. Sci. 2012, 142, 134–141. [Google Scholar] [CrossRef]
  118. Alvarez, J.R.; Arroqui, M.; Mangudo, P.; Toloza, J.; Jatip, D.; Rodriguez, J.M.; Teyseyre, A.; Sanz, C.; Zunino, A.; Machado, C.; et al. Estimating body condition score in dairy cows from depth images using convolutional neural networks, transfer learning and model ensembling techniques. Agronomy 2019, 9, 90. [Google Scholar] [CrossRef] [Green Version]
  119. Dettmann, F.; Warner, D.; Buitenhuis, B.; Kargo, M.; Kjeldsen, A.M.H.; Nielsen, N.H.; Lefebvre, D.M.; Santschi, D.E. Fatty acid profiles from routine milk recording as a decision tool for body weight change of dairy cows after calving. Animals 2020, 10, 1958. [Google Scholar] [CrossRef]
  120. Ebrahimie, E.; Ebrahimi, F.; Ebrahimi, M.; Tomlinson, S.; Petrovski, K.R. Hierarchical pattern recognition in milking parameters predicts mastitis prevalence. Comput. Electron. Agric. 2018, 147, 6–11. [Google Scholar] [CrossRef]
  121. Cernek, P.; Bollig, N.; Anklam, K.; Döpfer, D. Hot topic: Detecting digital dermatitis with computer vision. J. Dairy Sci. 2020, 103, 9110–9115. [Google Scholar] [CrossRef]
  122. Kim, T.; Heald, C.W. Inducing inference rules for the classification of bovine mastitis. Comput. Electron. Agric. 1999, 23, 27–42. [Google Scholar] [CrossRef]
  123. Byabazaire, J.; Olariu, C.; Taneja, M.; Davy, A. Lameness Detection as a Service: Application of Machine Learning to an Internet of Cattle. In Proceedings of the 2019 16th IEEE Annual Consumer Communications and Networking Conference (CCNC), Las Vegas, NV, USA, 11–14 January 2019. [Google Scholar]
  124. Goyache, F.; Díez, J.; López, S.; Pajares, G.; Santos, B.; Fernández, I.; Prieto, M. Machine Learning as an aid to management decisions on high somatic cell counts in dairy farms. Arch. Anim. Breed. 2005, 48, 138–148. [Google Scholar] [CrossRef] [Green Version]
  125. Taneja, M.; Byabazaire, J.; Jalodia, N.; Davy, A.; Olariu, C.; Malone, P. Machine learning based fog computing assisted data-driven approach for early lameness detection in dairy cattle. Comput. Electron. Agric. 2020, 171, 105286. [Google Scholar] [CrossRef]
  126. Ghaffari, M.H.; Jahanbekam, A.; Sadri, H.; Schuh, K.; Dusel, G.; Prehn, C.; Adamski, J.; Koch, C.; Sauerwein, H. Metabolomics meets machine learning: Longitudinal metabolite profiling in serum of normal versus overconditioned cows and pathway analysis. J. Dairy Sci. 2019, 102, 11561–11585. [Google Scholar] [CrossRef] [PubMed]
  127. Ji, B.; Banhazi, T.; Ghahramani, A.; Bowtell, L.; Wang, C.; Li, B. Modelling of heat stress in a robotic dairy farm. Part 2: Identifying the specific thresholds with production factors. Biosyst. Eng. 2020, 199, 43–57. [Google Scholar] [CrossRef]
  128. Srikok, S.; Patchanee, P.; Boonyayatra, S.; Chuammitri, P. Potential role of MicroRNA as a diagnostic tool in the detection of bovine mastitis. Prev. Vet. Med. 2020, 182, 105101. [Google Scholar] [CrossRef]
  129. Denholm, S.J.; Brand, W.; Mitchell, A.P.; Wells, A.T.; Krzyzelewski, T.; Smith, S.L.; Wall, E.; Coffey, M.P. Predicting bovine tuberculosis status of dairy cows from mid-infrared spectral data of milk using deep learning. J. Dairy Sci. 2020, 103, 9355–9367. [Google Scholar] [CrossRef]
  130. Maciel-Guerra, A.; Esener, N.; Giebel, K.; Lea, D.; Green, M.J.; Bradley, A.J.; Dottorini, T. Prediction of Streptococcus uberis clinical mastitis treatment success in dairy herds by means of mass spectrometry and machine-learning. Sci. Rep. 2021, 11, 7736. [Google Scholar] [CrossRef]
  131. Borghart, G.M.; O’Grady, L.E.; Somers, J.R. Prediction of lameness using automatically recorded activity, behavior and production data in post-parturient Irish dairy cows. Ir. Vet. J. 2021, 74, 4. [Google Scholar] [CrossRef]
  132. Xu, W.; van Knegsel, A.T.M.; Vervoort, J.J.M.; Bruckmaier, R.M.; van Hoeij, R.J.; Kemp, B.; Saccenti, E. Prediction of metabolic status of dairy cows in early lactation with on-farm cow data and machine learning algorithms. J. Dairy Sci. 2019, 102, 10186–10201. [Google Scholar] [CrossRef]
  133. Gorczyca, M.T.; Gebremedhin, K.G. Ranking of environmental heat stressors for dairy cows using machine learning algorithms. Comput. Electron. Agric. 2020, 168, 105124. [Google Scholar] [CrossRef]
  134. Douphrate, D.I.; Fethke, N.B.; Nonnenmann, M.W.; Rodriguez, A.; de Porras, D.G.R. Reliability of observational- and machine-based teat hygiene scoring methodologies. J. Dairy Sci. 2019, 102, 7494–7502. [Google Scholar] [CrossRef]
  135. Ebrahimie, E.; Mohammadi-dehcheshmeh, M.; Laven, R.; Petrovski, K.R. Rule Discovery in Milk Content towards Mastitis Diagnosis: Dealing with Farm Heterogeneity over Multiple Years through Classification Based on Associations. Animals 2021, 11, 1638. [Google Scholar] [CrossRef]
  136. Post, C.; Rietz, C.; Büscher, W.; Müller, U. The importance of low daily risk for the prediction of treatment events of individual dairy cows with sensor systems. Sensors 2021, 21, 1389. [Google Scholar] [CrossRef]
  137. Pacheco, V.M.; de Sousa, R.V.; da Silva Rodrigues, A.V.; de Souza Sardinha, E.J.; Martello, L.S. Thermal imaging combined with predictive machine learning based model for the development of thermal stress level classifiers. Livest. Sci. 2020, 241, 104244. [Google Scholar] [CrossRef]
  138. Salzer, Y.; Honig, H.H.; Shaked, R.; Abeles, E.; Kleinjan-Elazary, A.; Berger, K.; Jacoby, S.; Fishbain, B.; Kendler, S. Towards on-site automatic detection of noxious events in dairy cows. Appl. Anim. Behav. Sci. 2021, 236, 105260. [Google Scholar] [CrossRef]
  139. Chung, H.; Li, J.; Kim, Y.; Van Os, J.M.C.; Brounts, S.H.; Choi, C.Y. Using implantable biosensors and wearable scanners to monitor dairy cattle’s core body temperature in real-time. Comput. Electron. Agric. 2020, 174, 105453. [Google Scholar] [CrossRef]
  140. Post, C.; Rietz, C.; Büscher, W.; Müller, U. Using sensor data to detect lameness and mastitis treatment events in dairy cows: A comparison of classification models. Sensors 2020, 20, 3863. [Google Scholar] [CrossRef]
  141. Tedde, A.; Grelet, C.; Ho, P.; Pryce, J.; Hailemariam, D.; Wang, Z.; Plastow, G.; Gengler, N.; Brostaux, Y.; Froidmont, E.; et al. Validation of Dairy Cow Bodyweight Prediction Using Traits Easily Recorded by Dairy Herd Improvement Organizations and Its Potential Improvement Using Feature Selection Algorithms. Animals 2021, 11, 1288. [Google Scholar] [CrossRef]
Figure 1. The flow of documents from identification to inclusion stage, in line with exclusion criteria.
Figure 1. The flow of documents from identification to inclusion stage, in line with exclusion criteria.
Sensors 22 00052 g001
Figure 2. Geographical distribution of research studies (n = 139).
Figure 2. Geographical distribution of research studies (n = 139).
Sensors 22 00052 g002
Figure 3. The flow of studies from geographical location to research categories (n = 134).
Figure 3. The flow of studies from geographical location to research categories (n = 134).
Sensors 22 00052 g003
Figure 4. Number of publications per year labelled according to research category (n = 131). * Data collected up to June 2021.
Figure 4. Number of publications per year labelled according to research category (n = 131). * Data collected up to June 2021.
Sensors 22 00052 g004
Figure 5. The flow of studies from problem type to publication source to research categories (n = 131).
Figure 5. The flow of studies from problem type to publication source to research categories (n = 131).
Sensors 22 00052 g005
Figure 6. Flow of studies from research area to features categories to algorithm categories (n = 134).
Figure 6. Flow of studies from research area to features categories to algorithm categories (n = 134).
Sensors 22 00052 g006
Figure 7. Number of publications per year labelled according to algorithm category (n = 269). * Data collected up to June 2021.
Figure 7. Number of publications per year labelled according to algorithm category (n = 269). * Data collected up to June 2021.
Sensors 22 00052 g007
Figure 8. Number of publications per year labelled according to validation method used (n = 127). * Data collected up to June, 2021.
Figure 8. Number of publications per year labelled according to validation method used (n = 127). * Data collected up to June, 2021.
Sensors 22 00052 g008
Table 1. Percentage of studies using each evaluation metric for classification and regression problems.
Table 1. Percentage of studies using each evaluation metric for classification and regression problems.
Regression (n = 41)
RMSER2rMAECCCMAPEMSERPEMPEMSPE
% of studies56%46%27%24%17%15%15%15%10%7%
Classification (n = 85)
AccuracyRecallSpecificityPPVF1 ScoreAUCNPVCohen’s KFPFN
% of studies77%66%49%48%27%26%15%12%9%6%
RMSE = root mean squared error; R2 = coefficient of determination; MAE = mean absolute error; MSE = mean square error; CCC = concordance correlation coefficient; MAPE = mean absolute percentage error; RPE = relative prediction error; MPE = mean percentage error; MSPE = mean square percentage error; PPV = positive predictive value; AUC = area under the ROC curve; NPV = negative predictive value; FP = false positive; FN = false negative.
Table 2. Number of studies employing each evaluation method(s) (n = 127).
Table 2. Number of studies employing each evaluation method(s) (n = 127).
Evaluation Method aHold-OutLOOALOOCVNested CVTrain/Validation/Testk-Fold CV
Hold-Out49 (5) b-----
LOOA-4----
LOOCV1-3---
Nested CV---7--
Train/Validation/Test----17 (1)-
k-fold CV15 (4)---1 c30 (11)
LOOA = leave-out-one-animal; LOOCV = leave-one-out cross-validation; Nested CV = nested cross-validation; k-fold CV = k-fold cross-validation. a Values along the diagonal refer to the number of studies that used that particular evaluation method. Values not along the diagonal refer to the number of studies that used a combination of evaluation methods corresponding to the value’s vertical and horizontal position. b Bracketed values represent the number of studies where that particular evaluation method was carried out repeatedly (i.e., more than once). c One study employed two different evaluation methods for two different dependent variables.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Shine, P.; Murphy, M.D. Over 20 Years of Machine Learning Applications on Dairy Farms: A Comprehensive Mapping Study. Sensors 2022, 22, 52. https://doi.org/10.3390/s22010052

AMA Style

Shine P, Murphy MD. Over 20 Years of Machine Learning Applications on Dairy Farms: A Comprehensive Mapping Study. Sensors. 2022; 22(1):52. https://doi.org/10.3390/s22010052

Chicago/Turabian Style

Shine, Philip, and Michael D. Murphy. 2022. "Over 20 Years of Machine Learning Applications on Dairy Farms: A Comprehensive Mapping Study" Sensors 22, no. 1: 52. https://doi.org/10.3390/s22010052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop