1. Introduction
Coccidiosis, a disease caused by
Eimeria sp. parasites, is one of the most common diseases faced by the poultry industry worldwide [
1,
2]. Coccidia parasites can cause decreased growth rates, diarrhea, or even death. In addition,
Eimeria sp. harm gut health and enable other enteric conditions, including clostridial enteritis. It is estimated that the annual economic impact of coccidiosis, including prevention, treatment, and production loss, is almost USD 2 billion in the United States alone, and over USD 15.5 billion worldwide [
3].
The basis for successful control measures against coccidiosis relies on identification of these protozoal parasites. Typical methods for identifying and determining the severity of the protozoal infection include intestinal lesion scoring or enumeration of the protozoal oocysts in fecal samples [
4]. While many poultry producers give little consideration to the species of coccidia, knowledge of the species can be important for developing vaccines or improving management strategies [
5,
6]. Standard methods to manually identify and enumerate these protozoal parasites require highly skilled technicians or veterinarians. This process is labor intensive, time-consuming, and susceptible to human error.
Automatic and semi-automatic protocols, including molecular tests, cytometry, and digital image analysis, for the enumeration and identification of coccidia are available [
7,
8,
9,
10]. Most PCR reactions target the intergenic transcribed spacer region 1 (ITS1) of the ribosomal RNA (rRNA) gene operon [
7], a multi-copy gene, which makes these protocols unsuitable for estimation of the relative abundance of species in mixed infections of
Eimeria sp. Vrba et al. [
11] validated a quantitative PCR (qPCR) protocol to quantify samples with mixed
Eimeria populations. However, to improve the sensitivity, the oocysts need to be sporulated beforehand, which can generate a delay in the results [
12]. On the other hand, while there is no delay for protocols based on cytometry and digital image analysis, sample preparation can be cumbersome. Furthermore, if the samples contain debris of similar size to coccidia parasites, the number of oocysts can be overestimated. While improved preparation protocols may help to minimize these errors, the accuracy of the machine to differentiate between oocysts and debris cannot improve over time.
Poultry management and diagnostics is undergoing a significant transition with the introduction of machine learning or Artificial Intelligence models. These models have the potential to be an excellent diagnostic tool for coccidia identification, by exploiting Eimerian oocyst morphology. The human process of scanning for oocysts shapes and making decisions based on observation of morphological features (size, shape, and internal and external features) is simultaneous. This process is best modeled using a deep-learning-based approach, such as the Region-based Convolutional Neural Network-based (RCNN) model. This project aimed to demonstrate a computer vision model approach that would enable fast and accurate detection, speciation, and determination of sporulation status of coccidia oocysts without the need for extensive personnel training or subjectivity of traditional microscopic methods.
3. Results
To evaluate the performance of the trained model we used mRPD, which assessed the relative difference between the counts from the manually labelled images and the model’s automatic counts (
Table 1). Models trained on one species generally exhibited higher accuracy than models trained on multiple species. In a scale of 0–1, when the confidence threshold was 0.7, all the models for individual groups were over 90% accurate. In general, models recognized oocysts and ignored debris in images accurately (
Figure 2). While there were a few false positive instances and misclassifications of oocysts, the model indicated a lower likelihood that these predictions were accurate by giving them a lower confidence score (below 0.7).
Unsurprisingly, models produced the most accurate results when detecting only
E. maxima oocysts. The multi-species model, when set to a confidence threshold of 0.5, correctly identified all the instances marked in our sample ground-truth reference for
E. maxima (
Figure 3). In a few instances, in samples that only had
E. maxima, there was a mismatch between manual and automated identification. These mismatches generally occurred when the oocyst laid on the border of the image or two oocysts overlapped almost entirely. In the latter case, the automated system did not identify one of the instances. Very few of the visibly labeled instances were misclassified for sporulation.
On the other hand, when set to a confidence level of 0.5, the same multi-species model revealed discrepancies when identifying
E. acervulina (
Figure 4) and
E. tenella oocysts. Most notably, this model over-identified sporulated
E. acervulina oocysts, and under-identified non-sporulated
E. tenella instances. Despite these tendencies, reducing the confidence threshold for the multi-species model from 0.7 to 0.5 resulted in a measurable improvement in overall RPD metrics. Considering all groups, the multi-species model with a confidence threshold of 0.5 had mean RPD values that were generally closer to 0, meaning that the manual and automatic counts are more similar.
Visualizations in
Figure 5 show per-group correlation between manual and automated counting of oocysts for the most accurate multi-species model. The sporulated
E. acervulina (acer_spore) group dominates the upper region of the chart (
Figure 5b). This suggests that the models over-counted this group. On the other hand, the non-sporulated
E. acervulina (acer_non) and the non-sporulated
E. tenella (ten_non) groups dominate the lower region of this chart (
Figure 5c,g), which suggests that the model under-counted these groups. Furthermore, the undercounts were worse as manual counts increased along the x-axis. Notably, the correlations between the manual and automated counts, for both the sporulated and non-sporulated,
E. maxima (
Figure 5d,e), as well as the sporulated
E. tenella (tene_spore) (
Figure 5f), were better and did not show any obvious trend towards under- or over-counting for these groups.
The Bland–Altman plots (
Figure 6) show the agreement between manual and automatic counts. The solid line indicates the average difference between both counts. Under ideal conditions, the average between both counting methods is expected to be =0; dotted lines are the upper and lower 95% confidence interval for the mean. The closer the data points are to the mean line or are within the 95% confidence interval for the mean represent the agreement between the automated and manual (or ground-truth) counts.
Our model showed a high agreement (>95%) between the automated and manual counts of individual species. For E. acerulina oocysts, the agreement was 99.5% (or 729/732 samples within the 95% confidence internal), 99.79% (or 731/732 samples within the 95% confidence internal) for E. maxima, and 96.18% (or 728/732 samples within the 95% confidence internal) for E. tenella. For the Eimeria multi-species plot, the agreement was moderate (50–94%) between automated and manual counts, at 92.32% (or 128/1667 samples outside the 95% confidence internal). Furthermore, while each species showed high agreement, regardless of the sporulation status, the model showed poor agreement for the non-sporulated E. acervulina and E. tenella. This lack of agreement is in accordance with the lower correlation shown for these two groups.
4. Discussion
The purpose of this study was to compare manual and automated analyses, using a custom Mask-RCNN model, for the enumeration, speciation, and determination of sporulation status of three species of coccidia that infect chickens. We used digitalized microscopic images of floatation samples as a validation technique.
Enumeration and speciation of coccidia oocysts are commonly performed in research and even in some clinical investigations. Enumeration and speciation can be laborious because they are routinely performed manually. Consequently, this problem represents a substantial bottleneck for research projects and vaccine evaluations, requiring well-trained and experienced parasitologists. Validated qPCR protocols to quantify samples with mixed
Eimeria populations are available [
11]. However, the practical utility of PCR assays for routine diagnostics is questionable, because many PCR reactions described in the literature target the multi-copy ITS1 gene [
7], which makes these protocols unsuitable for the quantification of oocysts in mixed infections. Results for protocols capable of quantifying samples with mixed
Eimeria populations may be delayed because sporulation is recommended prior to performing the assay [
11].
To date, there are a few automated protocols for oocyst enumeration and speciation using flow cytometry [
9] and digital image analysis [
10]. However, we found that the protocols used for sample preparation for these applications are complex, and the oocyst counts may be overestimated. A recent publication describes a fast and automated method for the enumeration of
Eimeria oocysts [
8]. None of the available protocols differentiates sporulated and non-sporulated oocysts. Information about the sporulation status can be important for clinicians and vaccine producers, as only sporulated oocysts can cause infection in poultry [
13].
Recent advances in computer technology have enabled software-based automation of standard laboratory data analysis, including analysis of digital images. Artificial Intelligence represents a promising technology for microscopic parasite examination [
14]. The Convolutional Neural Network (CNN) is the most commonly used artificial neural network to examine visual images [
15]. Initial versions of our pipeline used OpenCV version 4.6 and TensorFlow CNN version 2.13 as foundational software packages to construct a complete working pipeline prototype. These packages were primarily used for image classification, object detection, and image segmentation tasks. We were able to prepare coccidia data by first labeling it, then using OpenCV to augment and expand it. Then we developed custom-trained models using TensorFlow. These models provided enumeration and speciation information for new input images in a process (inference) that again used OpenCV.
Once we had the prototype, we used the Mask-RCNN pipeline for the image analysis. Mask-RCNN is a newer CNN algorithm that makes object detection and its classification more accurate and faster [
16]. Our machine learning pipeline for the identification and enumeration of coccidia parasites constituted three steps: data preparation, training, and inference. As with any machine learning pipeline, dataset preparation was the key to successfully achieving a high-accuracy model [
17]. When creating such datasets, labeling images is often time-consuming. In this study, data labeling means to identify the species of each oocyst and whether it is sporulated or not. For the model to extract the proper information, accurate labeling is essential. For example, the model must learn to discriminate debris from oocysts, determine the species of
Eimeria, or whether the oocyst is sporulated or not. Also, we needed to label as many oocysts as possible, including partial ones at the margin of the slide and overlapping oocysts, to prevent the algorithm from misclassifying them. Irrelevant features, such as the orientation of the oocysts, can also interfere with data extraction during training, and affect the results. Data enhancement, including horizontal flipping, aims to teach the model to learn the invariance of the data [
18]. A weakness of our model was the inability to include images with no oocysts, as PixelLib, our training library, rejected these images. This difficulty should be further investigated, and solutions further developed.
Training the application by species was the most time intensive. This required a thoughtful approach to (1) obtaining examples of manually labeled oocyst and (2) devising a protocol that would provide adequate data to the learning algorithms without becoming overly burdensome for humans. In other words, our efforts focused on making each manually labeled image as useful as possible in building a generalizable vision model that would perform accurately and reliably on novel samples. For this step, we used 80% of file pairs from the augmented dataset. The other 20% of the file pairs were used for validation and fine tuning of the model. We used novel ways of manipulating the labeled images to improve the accuracy of the computer vision model without needing a large number of labeled images. The original dataset contained 110 file pairs that, after manipulation, were expanded to 2928 file pairs.
The Learning Rate and Learning Momentum parameters are important in identifying over-learning and when to stop training [
19]. In our model, a small Learning Rate of 0.01 and a Learning Momentum of 0.95 provided the appropriate combination for the performance prediction. The batch size determines the training time, memory usage, and accuracy. There is no single “best” batch size. A small batch size may be slower to train but consumes less memory and provides more accurate results. For this reason, we selected a batch size of three images. For the number of epochs, we started with 18 epochs, and we increased the number until the model no longer improved.
Finally, we tested our model with samples prepared from three commercial coccidia vaccines that contained the three Eimeria spp. Our approach was consistently able to achieve good performance for single species samples. The automated oocyst counts showed high agreement with the counts obtained by the manual method for the individual species models. On the other hand, for the multi-species model, the automated oocyst counts showed moderate agreement compared to the manual method. Between all models for each of the three species, the average deviation from the correct count ranged from an average underestimate of 0.18% to an average overestimate of 0.39%. The poorest agreement was in the non-sporulated E. acervulina and E. tenella oocysts. This is not surprising, as these oocysts are smaller than those of E. maxima, and it may be difficult to identify the sporocysts within the oocysts. Further, our training dataset contained fewer instances of these groups compared to some other groups.
We wanted our model to be capable of differentiating sporulated and non-sporulated oocysts, because only the first are infective. This sporulation step occurs in the environment. While sporulated oocysts may survive in the environment for more than a year, non-sporulated oocysts can only survive a short time in the environment [
20]. If environmental conditions prevent the oocysts from sporulating, it is possible that a vaccinated flock does not get good coverage and a coccidia outbreak occurs later in life, when losses can be more severe. An automated model for speciation of
Eimeria sp. oocysts, with an accuracy of 96.9%, has been published recently [
21]. For this validation model, these researchers used a publicly available database (
http://www.coccidia.icb.usp.br/ accessed on 15 January 2023 [
22], which only contains sporulated oocysts. Therefore, while this model may identify coccidia by species, it cannot differentiate sporulation status of the oocysts, which can be important for assessing the management conditions, vaccine cycling, or quality of a vaccine.
To differentiate between sporulated and non-sporulated coccidia, we fine-tuned the parameters of the neural network incorporating the sporulation status as a new group, thus resulting in six different group-labels (one for each of the three species in one of two sporulation states). We experimented by creating single-species models that could differentiate sporulated from non-sporulated oocysts, as well as creating a combined model capable of both speciating and determining sporulation simultaneously.
The single-species and multi-species models excelled in identifying
E. maxima oocysts. Furthermore, the models were also successful in differentiating sporulated and non-sporulated oocysts of
E. maxima. This is expected as these species are the largest [
13] and easy to resolve visually and by digital computing. On the other hand, the model was least accurate for
E. acervulina, which has the smallest oocysts of the three species investigated. Notably, most of the false positives had a low confidence value (below 0.7). A higher confidence threshold might have produced more accurate speciation results by eliminating these false positive detections. Furthermore, to develop a successful model, it is most important to have digital images that have high resolution, that is in focus. Therefore, better imaging solutions that produce higher-resolution images are another potential avenue for improvement.
Important components of successful analytical methods used in routine diagnosis and research include performance, time, and resource-efficiency. Our results show that automated image analysis is promising, and it can drastically reduce the analysis time compared to manual measurements. The model demonstrated good accuracy, with just a few digital images presented. Further validation of the Artificial Intelligence model would enable accurate and rapid analysis of a larger number of samples in a short period of time. This would decrease the risk of bias by the analyst.
This study was a proof of concept to demonstrate we can classify and enumerate coccidia using AI. We worked with the three most common coccidia, which are included in all the commercial vaccines and whose oocyst morphology is relatively easy to separate visually. In the future, we plan to include other
Eimeria spp. in the model. It would be interesting to develop a model that is capable of discriminating oocysts from
Eimeria protozoa that have similar dimensions and sizes, such as
E. tenella vs.
E. brunetti, and
E. acervulina vs.
E. praecox. Similarly, we would like to develop a model for turkey coccidia, whose oocysts are difficult to separate visually and biochemically [
23].