## Abstract

**:**

## 1. Introduction

## 2. Material and Methodology of the Proposed Research

#### 2.1. You Only Look Once Series (YOLO Series)

#### 2.2. Rectangular Bounding Box and $\mathcal{L}\mathfrak{o}\mathfrak{s}\mathfrak{s}$ Function

#### 2.3. Content-Aware ReAssembly of Feature: $\mathbb{C}\mathbb{A}\mathbb{R}\mathbb{A}\mathbb{F}\mathbb{E}$

#### 2.4. Image Acquisition

#### 2.5. The Proposed YOLO-AppleScab Model

#### 2.6. Experiment Setup

^{®}Core™ i5-11400H, 64-bit 2.70 GHz dodeca-core CPUs and a NVIDIA GeForce RTX 3050 GPU, Santa Clara, CA, USA. Table 1 presents the basic configuration of the local computer.

## 3. Results and Discussion

#### 3.1. The Network Visualization

#### 3.2. Performance of the Proposed Model under Different Lighting Conditions

#### 3.3. Comparison of Different State-of-the-Art Algorithms

## 4. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

**Figure 1.**YOLO model detection. The classification task is addressed while dealing with regression. Therefore, each interested class (Healthy, Scab) is represented under different colors to illustrate the YOLO detection model.

**Figure 2.**Prediction of bounding box. YOLOv7 will predict the width and height of the box as offsets from cluster centroids and center coordinates of the box relative to the location of the filter application using a sigmoid function. The red dotted indicates the prior anchor, and the blue square is the prediction.

**Figure 3.**Apple fruit samples from dataset AppleScabLDs: (

**a**) healthy apple fruit, and (

**b**) infected apple by scab disease.

**Figure 4.**An overview of the proposed model of $\mathbb{C}\mathbb{A}\mathbb{R}\mathbb{A}\mathbb{F}\mathbb{E}$ architecture incorporated in YOLOv7 network architecture.

**Figure 5.**Feature maps activation in Upsample and $\mathbb{C}\mathbb{A}\mathbb{R}\mathbb{A}\mathbb{F}\mathbb{E}$ layers. (

**a**,

**d**) YOLOv7 prediction; (

**b**,

**e**) stage 53 Upsample feature maps; (

**c**,

**f**) stage 65 Upsample feature maps; (

**g**,

**j**) YOLO-AppleScab prediction; (

**h**,

**k**) stage 53 $\mathbb{C}\mathbb{A}\mathbb{R}\mathbb{A}\mathbb{F}\mathbb{E}$ feature maps; (

**i**,

**l**) stage 65 $\mathbb{C}\mathbb{A}\mathbb{R}\mathbb{A}\mathbb{F}\mathbb{E}$ feature maps. Each color represents a distinct outcome of the convoluted feature map at the present layer.

Computer Configuration | Specific Parameters |
---|---|

CPU | 11th Gen Intel^{®} Core™ i5-11400H |

GPU | NVIDIA Geforce RTX 3050 |

Operating system | Ubuntu 22.04.1 LTS |

Random Access Memory | 16 GB |

Illumination | Class | GT | Correctly Identified | Falsely Identified | Missed | |||
---|---|---|---|---|---|---|---|---|

Amount | Rate | Amount | Rate | Amount | Rate | |||

Strong Light | Healthy | 20 | 18 | 90% | 1 | 5% | 1 | 5% |

Scab | 28 | 27 | 96.43% | 1 | 0% | 0 | 3.57% | |

Soft Light | Healthy | 29 | 22 | 75.86% | 2 | 6.87% | 5 | 17.24% |

Scab | 27 | 23 | 85.18% | 2 | 7.41% | 2 | 7.4% |

**Table 3.**Classification metrics. Comparison with different SOTA of YOLOv3, YOLOv4, YOLOv7 and the proposed method for the two studied classes (healthy and scab) that apple represents. The input image size is 416 $\times $ 416.

Model | Healthy | Scab | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

Precision (%) | Recall (%) | F1 (%) | ${\mathit{m}\mathit{A}\mathit{P}}_{0.5}$ (%) | ${\mathit{m}\mathit{A}\mathit{P}}_{0.5-0.95}$(%) | Precision (%) | Recall (%) | F1 (%) | ${\mathit{m}\mathit{A}\mathit{P}}_{0.5}$(%) | ${\mathit{m}\mathit{A}\mathit{P}}_{0.5-0.95}$(%) | |

Yolov3 | 88.38 | 56.64 | 69.04 | 68.60 | 41.60 | 68.20 | 91.80 | 78.23 | 92.10 | 69.80 |

yolov4 | 89.90 | 67.30 | 76.98 | 79.40 | 53.90 | 89.40 | 95.90 | 92.54 | 92.70 | 70,30 |

Yolov7 | 85.40 | 74.50 | 79.58 | 80.10 | 51,60 | 91.80 | 90.90 | 91.35 | 93.40 | 72.70 |

Yolo-Applescab | 100 | 74.50 | 85.39 | 83.30 | 54.80 | 89.50 | 95.90 | 92.58 | 94.70 | 73.20 |

**Table 4.**Classification metrics. Comparison with different SOTA of YOLOv3, YOLOv4, YOLOv7 and Faster R-CNN is used for benchmarking. The input image size is 416 $\times $ 416. The $mA{P}_{0.5-0.95}\text{}$are expressed in percentages. Two classes (healthy and scab) represent the apple condition.

Model | Healthy | Scab |
---|---|---|

${\mathit{m}\mathit{A}\mathit{P}}_{0.5-0.95}$ (%) | ${\mathit{m}\mathit{A}\mathit{P}}_{0.5-0.95}$ (%) | |

YOLOv3 | 41.60 | 69.80 |

YOLOv4 | 53.90 | 70.30 |

YOLOv7 | 51.60 | 72.70 |

Faster R-CNN | 47.03 | 59.79 |

YOLO-AppleScab | 54.80 | 73.20 |

