Next Article in Journal
VPTD: Human Face Video Dataset for Personality Traits Detection
Previous Article in Journal
Self-Reported Mental Health and Psychosocial Correlates during the COVID-19 Pandemic: Data from the General Population in Italy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Data Descriptor

RipSetCocoaCNCH12: Labeled Dataset for Ripeness Stage Detection, Semantic and Instance Segmentation of Cocoa Pods

by
Juan Felipe Restrepo-Arias
*,
María Isabel Salinas-Agudelo
,
María Isabel Hernandez-Pérez
,
Alejandro Marulanda-Tobón
and
María Camila Giraldo-Carvajal
Escuela de Ciencias Aplicadas e Ingeniería, Universidad EAFIT, Medellín 050022, Colombia
*
Author to whom correspondence should be addressed.
Data 2023, 8(6), 112; https://doi.org/10.3390/data8060112
Submission received: 31 May 2023 / Revised: 9 June 2023 / Accepted: 12 June 2023 / Published: 18 June 2023

Abstract

:

Abstract

Fruit counting and ripeness detection are computer vision applications that have gained strength in recent years due to the advancement of new algorithms, especially those based on artificial neural networks (ANNs), better known as deep learning. In agriculture, those algorithms capable of fruit counting, including information about their ripeness, are mainly applied to make production forecasts or plan different activities such as fertilization or crop harvest. This paper presents the RipSetCocoaCNCH12 dataset of cocoa pods labeled at four different ripeness stages: stage 1 (0–2 months), stage 2 (2–4 months), stage 3 (4–6 months), and harvest stage (>6 months). An additional class was also included for pods aborted by plants in the early stage of development. A total of 4116 images were labeled to train algorithms that mainly perform semantic and instance segmentation. The labeling was carried out with CVAT (Computer Vision Annotation Tool). The dataset, therefore, includes labeling in two formats: COCO 1.0 and segmentation mask 1.1. The images were taken with different mobile devices (smartphones), in field conditions, during the harvest season at different times of the day, which could allow the algorithms to be trained with data that includes many variations in lighting, colors, textures, and sizes of the cocoa pods. As far as we know, this is the first openly available dataset for cocoa pod detection with semantic segmentation for five classes, 4116 images, and 7917 instances, comprising RGB images and two different formats for labels. With the publication of this dataset, we expect that researchers in smart farming, especially in cocoa cultivation, can benefit from the quantity and variety of images it contains.

Dataset

Dataset License

Licensed under Creative Commons Attribution 4.0 International.

1. Introduction

The application of precision agriculture strategies in cocoa crops continues to encounter various challenges that need to be addressed. These challenges primarily involve issues related to the poor quality of existing data and the acquisition of new data necessary for the application of advanced precision agriculture techniques [1].
One of the main challenges is to identify different stages of ripeness of the cocoa pods since this type of crop has a wide number of varieties, and all of them can show different textures and color characteristics in their maturation process [2].
Detecting ripeness stages in cocoa pods is critical in determining two relevant factors in any crop: effectively planning the optimal timing of harvest and accurately predicting production volumes. Unfortunately, the adequate maturity to harvest is not always homogeneous, affecting the fermentation process necessary to obtain good chocolate quality [3].
The following different techniques have been tested for ripeness-stages detection in cocoa pods:
  • acoustic signals [2,4],
  • determination of metabolic profiles through biochemical markers [5], and
  • laser techniques with backscattered images [6].
However, these techniques are unrealistic when implemented in the field with real conditions, since the devices for capturing sound data, laser images, spectrometry, or bio-chemical markers require expensive devices that are not within the reach of the farmers.
On the other hand, artificial intelligence techniques based on artificial neural networks (ANNs), better known as deep learning, are increasingly used [7,8,9].
The precision and robustness of deep learning models depend on the quality and quantity of the training data, as they are crucial factors that contribute to the variability of the phenomenon under study [10].
Moreover, the increasing prevalence of smartphones among farmers for their daily activities simplifies the process of capturing images, eliminating the necessity of investing in costly equipment and specialized management for data capture.
Unfortunately, the community engaged in applied research using deep learning techniques to detect ripeness stages in cocoa pods faces a scarcity of image datasets for most varieties. In addition, the available public datasets offer only a limited number of images for training deep learning models [8,11].
To help the community that performs applied research for developing deep learning solutions to detect ripeness stages in cocoa pods, we propose the RipSetCocoaC-NCH12 dataset, which consists of 4116 images taken with different types of smartphones labeled for semantic segmentation. Having several stages of ripeness is a feature that will allow researchers to train machine learning algorithms that classify more than two classes: mature and immature. These features will allow the scientific community interested in these applications to train more robust and accurate deep learning models.
The RipSetCocoaCNCH12 dataset will be important for the training of machine learning algorithms that seek to detect different ripeness stages in cocoa crops of the CNCH12 variety and to make inventories of pods.

2. RipSetCocoaCNCH12 Dataset

2.1. Descripion

CACAO CNCH12, developed by “Compañía Nacional de Chocolates”, is the cocoa variety in the dataset. The images were collected at the “Compañía Nacional de Chocolates” farm, located in the municipality of Támesis, department of Antioquia—Colombia (5°43′02″ N–75°41′25″ W). The average height above sea level in the farm is approximately 1100 m. The dataset was created between 1 December 2022 and 17 February 2023, the primary cocoa harvest season in the study area.
The average ripening period for a cocoa pod typically spans six to seven months, although slight variations may occur based on the specific agronomic and climatic conditions of the crop. The ripeness stages were defined in ranges of two months due to the key physical and chemical differences of the cocoa pods according to the agronomists of the “Compañía Nacional de Chocolates” company. The stages are defined based on the duration in months, starting from pollination of the flowers to the optimal time for harvesting the pod. The sequential progression of cocoa pods during the ripening process, from 0 to 6 months, is illustrated in Figure 1.
The images of cocoa pods were divided into five classes (Table 1). They were divided into four classes according to their ripeness stage in months: Class 1 (0–2 months), Class 2 (2–4 months), Class 3 (4–6 months), and Class 4 (>6 months) (Figure 2). Additionally, there is a fifth class known as “abortions” that does not fall under any of the ripeness stages (Class A). Abortions are cocoa pods that start their growth process but die from various causes associated with attacks by pests or diseases or even due to physiological problems of the plant (Figure 3).
The dataset contains two folders: the first contains the annotations in COCO 1.0 format, and the second contains the images in segmentation mask 1.1 format. In each of these folders, the images are divided into subfolders named with the main class they contain; an image can contain several instances of different classes, but the images in each folder are dominated by one of the classes. The distribution of instances in each folder can be seen below in Figure 4.

2.2. Quantitative Measure to Differenciate Cocoa Classes

The ripening process of fruit involves a sequence of physiological changes to become ready for consumption or processing. The fruit grows, accumulating essential nutrients and water, while noticeable transformations in color, texture, and composition signify its ripeness.
A widely used way to measure the state of maturity of a fruit quantitatively at different stages is to calculate the internal sugar content by measuring Brix degrees [12,13,14,15]. To have a quantitative measure that would confirm the difference between ripeness stages, the Brix degrees were measured in more than 35 cocoa pods for each class in the four ripeness stages (C1 to C4). The results are presented in Table 2.
An ANOVA test was performed to check for a significant difference between the different classes, according to their measure of Brix degrees. The results can be seen below in Table 3.
N u l l   h y p o t h e s i s :   μ j   a r e   e q u a l A l t e r n a t i v e   h y p o t h e s i s :   μ j   a r e   n o t   e q u a l
According to the results of the F and p-value, the null hypothesis is rejected. Therefore, there is a significant difference in Brix degrees among classes, which confirms the accuracy of dividing cocoa pods into the four proposed classes for the stages of ripeness.
Every image is 3000 × 3000 px in JPEG format, with 8 bits. The image files were named with the date and time of capture. Figure 5 is an example of the images corresponding to the four ripeness stages.
Table 4 below shows a summary of the RipSetCocoaCNCH12 dataset.

3. Methods

Nowadays, smartphones have become ubiquitous. In even the most remote rural areas, smartphones have become the main communication technology due to their low costs and portability. These devices can also give farmers the ability to collect image data. Therefore, in this work, the images were captured with smartphones to have a dataset as similar as possible to real conditions.

3.1. Image Data Acquisition

Five devices from some of the leading manufacturers were selected for this work. To ensure significant variability in the types of images captured and enrich the dataset, multiple devices were chosen. The technical specifications of used smartphones can be seen below in Table 5.
The strategy for capturing images involved zigzag paths in the field enabling access to each crop tree. During each pass, a person took images of a single class to allow easier classification in the folders.
Between one and four images of each cocoa pod were taken from different angles to obtain as many samples as possible (Figure 6).
The images were taken between 8:00 a.m. and 4:00 p.m. First, the size format for the capture was adjusted on all smartphones to a 1:1 ratio, and then resizing was applied to them using a script in the Python language with Pillow (Python Imaging Library), giving them a final size of 3000 × 3000 px. The original images had sizes in the range from 3072 × 3072 to 4096 × 4096 px.

3.2. Brix Degrees Data Acquisition

Some pods were selected to measure the Brix degrees of the internal sugar content, as mentioned in Section 2.1. First, the pods chosen for samples were perforated with a drill. Then, the sample was extracted, which was later placed in a handheld refractometer, and finally, the data were recorded manually. Images of this process can be seen below in Figure 7.

3.3. Data Annotation

The tool used for labeling images was CVAT (Computer Vision Annotation Tool) [16], which allows for different techniques. The technique used for this work was polygon labeling to obtain a semantic segmentation of the classes (Figure 8).
The dataset contains labels in two alternative formats: (1) COCO 1.0, which has files in the format (*.json) for detection using bounding boxes and polygons, and (2) segmentation mask 1.1, which contains separate folders for semantic segmentation and instance segmentation. Examples of these masks can be seen in Figure 9 and Figure 10.

4. Limitations

The RipSetCocoaCNCH12 dataset does not include classes of cocoa pods to discard. In future work, diseases and rotten pods may be included. Additionally, more data should be collected on other different cocoa varieties.

Author Contributions

Conceptualization, J.F.R.-A., M.I.H.-P., and A.M.-T.; methodology, J.F.R.-A. and M.I.S.-A.; software, J.F.R.-A.; validation, J.F.R.-A., M.I.H.-P., and A.M.-T.; formal analysis, M.I.S.-A. and J.F.R.-A.; data curation, M.I.S.-A. and M.C.G.-C.; writing—original draft preparation, J.F.R.-A.; writing—review and editing, J.F.R.-A., M.I.H.-P., and A.M.-T.; project administration, J.F.R.-A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Universidad EAFIT, project No. 819422.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Zenodo at https://doi.org/10.5281/zenodo.7968315 (accessed on 24 May 2023).

Acknowledgments

We want to thank the “Compañía Nacional de Chocolates” company for providing access to the farm “La Granja” in the municipality of Támesis to take the images for this work. Thanks for their support and for allowing us to use their facilities. Special thanks to the BIOSUROESTE organization.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bosompem, M. Potential challenges to precision agriculture technologies development in Ghana: Scientists’ and cocoa extension agents’ perspectives. Precis. Agric. 2021, 22, 1578–1600. [Google Scholar] [CrossRef]
  2. Bueno, G.E.; Valenzuela, K.A.; Arboleda, E.R. Maturity classification of cacao through spectrogram and convolutional neural network. J. Teknol. Sist. Komput. 2020, 8, 228–233. [Google Scholar] [CrossRef]
  3. Quezada-Ramón, L.A.; Quevedo-Guerrero, J.N.; García-Batista, R.M. Determinación del efecto del grado de madurez de las mazorcas en la producción y la calidad sensorial de (Theobroma cacao L.). Rev. Científica Agroecosistemas 2017, 5, 36–46. Available online: http://aes.ucf.edu.cu/index.php/aes/index (accessed on 12 May 2023).
  4. Galindo, J.A.M.; Rosal, J.E.C.; Villaverde, J.F. Ripeness Classification of Cacao Using Cepstral-Based Statistical Features and Support Vector Machine. In Proceedings of the 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia, 13–15 September 2022; pp. 1–5. [Google Scholar] [CrossRef]
  5. Gallego, A.M.; Zambrano, R.A.; Zuluaga, M.; Rodríguez, A.V.C.; Cortés, M.S.C.; Vergel, A.P.R.; Valencia, J.W.A. Analysis of fruit ripening in Theobroma cacao pod husk based on untargeted metabolomics. Phytochemistry 2022, 203, 113412. [Google Scholar] [CrossRef] [PubMed]
  6. Lockman, N.A.; Hashim, N.; Onwude, D.I. Laser-Based imaging for Cocoa Pods Maturity Detection. Food Bioprocess Technol. 2019, 12, 1928–1937. [Google Scholar] [CrossRef]
  7. Veites-Campos, S.A.; Betancour, R.R.; González-Pérez, M. Identification of Cocoa Pods with Image Processing and Artificial Neural Networks. Int. J. Adv. Eng. Manag. Sci. 2018, 4, 510–518. [Google Scholar] [CrossRef]
  8. Heredia-Gómez, J.F.; Rueda-Gómez, J.P.; Talero-Sarmiento, L.H.; Ramírez-Acuña, J.S.; Coronado-Silva, R.A. Cocoa pods ripeness estimation, using convolutional neural networks in an embedded system. Rev. Colomb. Comput. 2020, 21, 42–55. [Google Scholar] [CrossRef]
  9. Baculio, N.G.; Barbosa, J.B. An Objective Classification Approach of Cacao Pods using Local Binary Pattern Features and Artificial Neural Network Architecture (ANN). Indian J. Sci. Technol. 2022, 15, 495–504. Available online: https://indjst.org/articles/an-objective-classification-approach-of-cacao-pods-using-local-binary-pattern-features-and-artificial-neural-network-architecture-ann (accessed on 1 March 2023). [CrossRef]
  10. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  11. Ayikpa, K.J.; Mamadou, D.; Ballo, A.B.; Yao, K.; Gouton, P.; Adou, K.J. CocoaMFDB: A dataset of cocoa pod maturity and families in an uncontrolled environment in Côte d’Ivoire. Data Brief 2023, 48, 109196. Available online: https://linkinghub.elsevier.com/retrieve/pii/S2352340923003153 (accessed on 1 March 2023). [CrossRef] [PubMed]
  12. Pérez, V.O.; Álvarez-Barreto, C.I.; Matallana, L.G.; Acuña, J.R.; Echeverri, L.F.; Imbachí, L.C. Effect of Prolonged Fermentations of Coffee Mucilage with Different Stages of Maturity on the Quality and Chemical Composition of the Bean. Fermentation 2022, 8, 519. [Google Scholar] [CrossRef]
  13. Darbellay, C.; Luisier, J.-L.; Villettaz, J.-C.; Azodanlou, R. Changes in flavour and texture during the ripening of strawberries. Eur. Food Res. Technol. 2003, 218, 167–172. [Google Scholar] [CrossRef] [Green Version]
  14. Chassagne-Berces, S.; Fonseca, F.; Citeau, M.; Marin, M. Freezing protocol effect on quality properties of fruit tissue according to the fruit, the variety and the stage of maturity. LWT 2010, 43, 1441–1449. [Google Scholar] [CrossRef]
  15. Teka, T.A. Analysis of the effect of maturity stage on the postharvest biochemical quality characteristics of tomato (Lycopersicon esculentum Mill.) fruit. Int. Res. J. Pharm. Appl. Sci. 2013, 3, 180–186. Available online: www.irjpas.com (accessed on 1 March 2023).
  16. CVAT. Available online: https://www.cvat.ai/ (accessed on 21 February 2023).
Figure 1. Ripeness process in a sequence of cocoa pods.
Figure 1. Ripeness process in a sequence of cocoa pods.
Data 08 00112 g001
Figure 2. Ripeness stages: (a) 0–2 months (C1); (b) 2–4 months (C2); (c) 4–6 months (C3); (d) >6 months (C4).
Figure 2. Ripeness stages: (a) 0–2 months (C1); (b) 2–4 months (C2); (c) 4–6 months (C3); (d) >6 months (C4).
Data 08 00112 g002
Figure 3. Examples of several types of abortions (CA).
Figure 3. Examples of several types of abortions (CA).
Data 08 00112 g003
Figure 4. Distribution of the instances for each image folder (y-axes differ between the frames).
Figure 4. Distribution of the instances for each image folder (y-axes differ between the frames).
Data 08 00112 g004
Figure 5. Dataset examples of the ripeness stages: (a) Class 1; (b) Class 2; (c) Class 3; (d) Class 4.
Figure 5. Dataset examples of the ripeness stages: (a) Class 1; (b) Class 2; (c) Class 3; (d) Class 4.
Data 08 00112 g005
Figure 6. Image capture process for one cocoa pod from different angles.
Figure 6. Image capture process for one cocoa pod from different angles.
Data 08 00112 g006
Figure 7. Process of sampling to measure Brix degrees: (a) perforation of the cocoa pod with a drill, (b) placement of the sample in a handheld spectrometer, (c) reading of the Brix degree measurement, and (d) measurement recording.
Figure 7. Process of sampling to measure Brix degrees: (a) perforation of the cocoa pod with a drill, (b) placement of the sample in a handheld spectrometer, (c) reading of the Brix degree measurement, and (d) measurement recording.
Data 08 00112 g007
Figure 8. Examples of labeling of cocoa pods with CVAT.
Figure 8. Examples of labeling of cocoa pods with CVAT.
Data 08 00112 g008
Figure 9. Examples of masks for semantic segmentation: (a) C1; (b) C2; (c) C3; (d) C4.
Figure 9. Examples of masks for semantic segmentation: (a) C1; (b) C2; (c) C3; (d) C4.
Data 08 00112 g009
Figure 10. Example of masks for instance segmentation: Class 1.
Figure 10. Example of masks for instance segmentation: Class 1.
Data 08 00112 g010
Table 1. Number and names of instances per class.
Table 1. Number and names of instances per class.
ClassClass NameInstances
C1Stage 1 (0–2 months)3278
C2Stage 2 (2–4 months)1688
C3Stage 3 (4–6 months)1519
C4For harvest (>6 months)1169
CAAbortions263
Total 7917
Table 2. Number of samples and average Brix degrees for the ripeness stages.
Table 2. Number of samples and average Brix degrees for the ripeness stages.
ClassNumber of Samples by Class Selected to Measure Degrees BrixAverage Brix Degrees Measured μ j (°Bx)
C1395.3
C2456.6
C3388.7
C44016.6
Table 3. ANOVA table for Brix degrees in the four different ripeness stages.
Table 3. ANOVA table for Brix degrees in the four different ripeness stages.
Source of VariationSum of Squaresdf *Mean SquareF **p Value
Between groups2955.783985.26 (VBG)305.72 1.12 e 63
Within groups483.411503.22 (VWG)
Total3439.20153
* Degrees of freedom. ** F = variance between groups/variance within groups = VBG/VWG.
Table 4. The RipSetCocoaCNCH12 specifications.
Table 4. The RipSetCocoaCNCH12 specifications.
ItemDescription
Field of applicationObject detection—smart farming
Data acquisitionSmartphone devices
Method of annotationManually with CVAT (Computer Vision Annotation Tool)
Number of classes5: stage 1 (0–2 months), stage 2 (2–4 months), stage 3 (4–6 months), for harvest (>6 months), and abortions
Number of images4116
Number of instances7917
Data collected byAuthors of this paper
Years of collection2022–2023
Vertical resolution96 dpi
Horizontal resolution96 dpi
Dataset size27 GB
Image format.JPG
Image size3000 × 3000 px
Annotation formatsCOCO 1.0 and segmentation mask 1.1
Table 5. Technical specifications of the smartphone cameras used to capture the dataset images.
Table 5. Technical specifications of the smartphone cameras used to capture the dataset images.
SmartphoneCamera Specifications
Samsung Galaxy A01Dual rear camera consisting of a 13-megapixel f/2.0 main sensor and a 2-megapixel f/2.4 depth sensor.
Samsung Galaxy Note 10Triple camera composed of an ultra-wide angle: 16 MP, f/2.2, 123°; a wide angle: 12 MP, AF, f/1.5–2.4; and a phone Camera: 12 MP, f/2.1.
iPhone SE 2020Single camera. 12 MP wide-angle camera, f/1.8 aperture.
LG G5Dual camera. 16 MP main camera and f/1.8 aperture.8MP secondary super-wide-angle camera with f/2.4 aperture.
Motorola G9 plusQuadruple camera. Main camera: 64 MP sensor, f/1.8 aperture and phase detection focus. Ultra-angular: 8 MP sensor, f/2.2 aperture. Macro: 2 MP sensor and f/2.2 aperture. Depth: 2 MP sensor and f/2.2 aperture.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Restrepo-Arias, J.F.; Salinas-Agudelo, M.I.; Hernandez-Pérez, M.I.; Marulanda-Tobón, A.; Giraldo-Carvajal, M.C. RipSetCocoaCNCH12: Labeled Dataset for Ripeness Stage Detection, Semantic and Instance Segmentation of Cocoa Pods. Data 2023, 8, 112. https://doi.org/10.3390/data8060112

AMA Style

Restrepo-Arias JF, Salinas-Agudelo MI, Hernandez-Pérez MI, Marulanda-Tobón A, Giraldo-Carvajal MC. RipSetCocoaCNCH12: Labeled Dataset for Ripeness Stage Detection, Semantic and Instance Segmentation of Cocoa Pods. Data. 2023; 8(6):112. https://doi.org/10.3390/data8060112

Chicago/Turabian Style

Restrepo-Arias, Juan Felipe, María Isabel Salinas-Agudelo, María Isabel Hernandez-Pérez, Alejandro Marulanda-Tobón, and María Camila Giraldo-Carvajal. 2023. "RipSetCocoaCNCH12: Labeled Dataset for Ripeness Stage Detection, Semantic and Instance Segmentation of Cocoa Pods" Data 8, no. 6: 112. https://doi.org/10.3390/data8060112

Article Metrics

Back to TopTop