Effect of Bit Depth on Cloud Segmentation of Remote-Sensing Images

Liao, Lingcen; Liu, Wei; Liu, Shibin

doi:10.3390/rs15102548

Open AccessArticle

Effect of Bit Depth on Cloud Segmentation of Remote-Sensing Images

by

Lingcen Liao

^1,2

,

Wei Liu

^1,2,* and

Shibin Liu

^1,2

¹

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

²

College of Resource and Environment, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(10), 2548; https://doi.org/10.3390/rs15102548

Submission received: 16 February 2023 / Revised: 19 April 2023 / Accepted: 10 May 2023 / Published: 12 May 2023

(This article belongs to the Special Issue Advanced Learning Techniques for Remote Sensing Image Quality Improvement)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Due to the cloud coverage of remote-sensing images, the ground object information will be attenuated or even lost, and the texture and spectral information of the image will be changed at the same time. Accurately detecting clouds from remote-sensing images is of great significance to the field of remote sensing. Cloud detection utilizes semantic segmentation to classify remote-sensing images at the pixel level. However, previous studies have focused on the improvement of algorithm performance, and little attention has been paid to the impact of bit depth of remote-sensing images on cloud detection. In this paper, the deep semantic segmentation algorithm UNet is taken as an example, and a set of widely used cloud labeling dataset “L8 Biome” is used as the verification data to explore the relationship between bit depth and segmentation accuracy on different surface landscapes when the algorithm is used for cloud detection. The research results show that when the image is normalized, the effect of cloud detection with a 16-bit remote-sensing image is slightly better than that of an 8-bit remote sensing image; when the image is not normalized, the gap will be widened. However, using 16-bit remote-sensing images for training will take longer. This means data selection and classification do not always need to follow the highest possible bit depth when doing cloud detection but should consider the balance of efficiency and accuracy.

Keywords:

bit depth; remote sensing; semantic segmentation; cloud; deep learning

1. Introduction

Remote-sensing images provide important data support for various industries, but the acquired remote-sensing images are often accompanied by cloud cover that seriously affects the interpretation of ground targets [1,2]. Existing studies show that the mean annual cloud cover of global remote-sensing images is about 66% [3]. The presence of clouds hinders the application of remote-sensing images in Earth observation [4,5]. Therefore, accurately detecting clouds on remote-sensing images has become a prominent issue.

Recently, there has been a lot of study conducted on the development of computer-based automatic cloud segmentation techniques as a prerequisite for picture analysis [6,7,8]. Existing cloud detection methods can be divided into two categories: empirical-rule algorithms based on physical features, and machine-learning algorithms [9]. The majority of rule-based methods consist of a set of thresholds applied to one or more spectral bands of the image, or for extracted features that attempt to enhance the physical properties of the cloud [10]. Generally speaking, rule-based methods are straightforward and simple to use, and when the spectral data provided by satellites is sufficiently rich, they are more effective in cloud classification. The Fmask algorithm is a classical rule-based method, which uses a threshold function to determine whether the spectrum corresponding to each pixel is a cloud or not [11]. However, this method is highly dependent on the human setting of the threshold value and requires a large number of law-generation experiments to obtain a more accurate value, which is costly to obtain. The Sen2Cor algorithm can generate cloud-recognition results with different probabilities through different waveband threshold conditions, but its recognition accuracy is not high, and it is easy to identify bright surface features as clouds and mountain shadows as cloud shadows by mistake [12]. On the other hand, machine-learning approaches treat cloud detection as a statistical classification problem, such as Random Forest (RF) [13] and Support Vector Machine (SVM) [14]. These methods rely on high-quality remote-sensing images and use their high resolution and distinct spatial features to classify shape, texture, edges, and other features. Since a large number of empirically set thresholds and band selection are not required, machine-learning methods outperform rule-based methods when the quality of training data is good enough [15]. However, most of these methods rely heavily on training samples, and their accuracy decreases significantly when the training samples are underrepresented or biased in distribution [16]. Deep neural networks also belong to one of the methods of machine learning. With the booming development of deep-learning technology, image segmentation techniques in the field of computer vision have also been gradually introduced into the field of remote sensing for cloud detection [17,18,19]. Due to its ability to effectively mine multi-level texture features from images without requiring manual feature selection, it achieves higher accuracy than traditional machine-learning classifiers in cloud-detection tasks [20]. Although the emergence of deep learning has elevated cloud detection to a new level [21], making the high-dimensional features of remote-sensing images more fully exploited and achieving certain improvements in detection accuracy, the effect of bit depth of remote-sensing images on the accuracy of cloud segmentation algorithms based on deep-learning methods has not been discussed yet.

Bit depth, also known as color depth or pixel depth [22], is used to describe the ability of the sensor to discriminate between objects viewed in the same part of the electromagnetic spectrum, which equates to the number of data values that can be had in each band [23]. Bit depth refers to the number of bits used to represent the color of each pixel in an image; it is a critical factor in remote-sensing image analysis and interpretation [24]. According to the different coding methods, remote-sensing images with bit depth ≥ 10 bit are generally defined as high radiometric resolution images. High radiometric resolution remote-sensing images can obtain detailed structure and spectral information of various features in a more detailed way, enhance the interpretation ability and reliability of images, and improve the accuracy of remote-sensing analysis [25]. For example, the data captured by the operational Land Imager (OLI) of Landsat 8 have better radiometric accuracy in the 12-bit dynamic range, which improves the overall signal-to-noise ratio. This compares to only 256 gray levels in the 8-bit instruments of Landsat 1–7. The improved signal-to-noise performance results in an improved description of land cover states and conditions. The product is delivered in 16-bit unsigned integer format [26].

In a study devoted to radiometric resolution impact on classification accuracy [27], a bagging classification tree was used to carry out the experiments. They discovered that the radiometric resolution of the spectral indices and texture bands is more closely related to their own radiometric resolution than to the radiometric resolution of the original remote-sensing image. In a paper-on-cloud detection with deep learning, Francis et al. [28] quantized 16-bit Landsat 8 images to 4 bits to verify the robustness of the model. Their findings imply that high-precision data is not necessary for their cloud masking approach to perform well. Ji et al. [29] used a novel fully convolutional network (FCN) to detect clouds and shadows in Landsat 8 images but converted the 16-bit images into 8-bit images by default during the process of data preprocessing. Li et al. [17] utilized 16-bit GF-1 data for cloud detection, cloud removal, and cloud coverage assessment, but all images were converted to 8-bit RGB images before being fed into the network. In addition to cloud-segmentation tasks, in other remote-sensing image processing tasks such as water body recognition, Song et al. [30] changed the bit depth of Worldview-3 images from 16-bit to 8-bit in the data preprocessing stage to improve the algorithm processing speed. In the task of marine ranching recognition, Chen et al. [31] also changed the bit depth of GF-1 images from 16-bit to 8-bit in the data preprocessing stage. In the field of biomedicine, Mahbod et al. [32] investigated the effect of image bit depth on the segmentation performance of cell nucleus instances. Considering that remote-sensing images are acquired in a different way than biomedical images with more complex content resolution and lower signal-to-noise ratio, the effect of bit depth on the segmentation performance of remote-sensing images is still worth exploring.

It can be observed from the existing literature that there has been limited focus on the impact of bit depth on the information extraction process in remote-sensing images. In contrast, the influence of spatial [33,34] and spectral [35,36,37] resolution on remote-sensing data has been extensively studied in regards to classification accuracy and information-extraction capabilities. The results of cloud segmentation through remote sensing would be influenced by the bit depth of remote-sensing images [38]. In addition, prior research on bit depth has primarily focused on a single scenario. Thus, this study aims to expand upon existing literature by examining the influence of bit depth on cloud segmentation in remote-sensing imagery. To achieve this, we assessed its impact across eight distinct landscape classifications: barren, forest, grass/crops, shrubland, snow/ice, urban, water, and wetlands.

This study will be based on a comprehensive review of the literature and experimental analysis using a representative set of remote-sensing images. The findings of this research will provide insights into the optimal bit depth for cloud segmentation and inform best practices for remote-sensing image processing. More specifically, the main three contributions in this study are summarized as follows:

Unique focus on the impact of bit depth: While previous studies have largely focused on improving the performance of cloud-detection algorithms, our research specifically addresses the overlooked aspect of bit depth in remote-sensing images. By examining the relationship between bit depth and segmentation accuracy, we provide new insights into the importance of bit depth in cloud-detection tasks.
Comparative analysis of 8-bit and 16-bit remote-sensing images: Our study is among the first to systematically compare the performance of cloud detection using 8-bit and 16-bit remote-sensing images. This comparison not only highlights the differences in accuracy between the two types of images but also sheds light on the trade-offs between efficiency and accuracy in cloud-detection tasks.
Extensive evaluation across different surface landscapes: To ensure the generalizability of our findings, we have evaluated the performance of the UNet algorithm across different surface landscapes. This comprehensive evaluation helps to highlight the varying impact of bit depth on cloud-segmentation accuracy in diverse contexts.

2. Materials

Training deep-learning models requires a large number of high-quality remote sensing images and corresponding labels. In this study, we utilize a widely recognized cloud detection dataset, the Landsat 8 Biome Type Cloud Validation Dataset (L8 Biome) [1], which contains 96 Landsat 8 images sampled from around the world, with a size of 8000 × 8000 (30-m resolution), and artificially generated cloud masks. The dataset comprises eight distinct cloud underlying surface types: barren, forest, grass, shrubland, snow, urban, water, and wetlands, with each category containing 12 images. At the same time, in order to ensure the heterogeneity and diversity of the data, the dataset selects images with different latitude and longitude and cloud types (Figure 1).

In this study, the focus was placed on the impact of cloud detection. Thus, the cloud masks of the L8 Biome Dataset were consolidated into two distinct categories: cloud and non-cloud. The artificial cloud mask was generated by assigning a value of “0” to the “cloud shadow” and “clear” categories, while the “thin cloud” and “cloud” categories were assigned a value of “1”. Subsequently, the red, green, and blue bands were extracted from the original image to synthesize the three-band images.

The data preparation process for the remote-sensing images involved several steps. Firstly, 16-bit Landsat 8 images provided by L8 Biome were converted to 8-bit images using the raster dataset tool in ArcMap. To demonstrate the difference in performance between the two kinds of bit depths, sample datasets were created for both 8-bit and 16-bit images (Figure 2). Due to limitations in computational resources, complete remote-sensing images were not used for experiments. Instead, small image blocks were cropped from the whole remote-sensing images. To minimize boundary issues, referring to Jeppesen et al. [39], each remote sensing image was cropped into 512 × 512-size sample image blocks with 64-pixel buffer zones between adjacent windows, and image blocks with more than 20% of No-data were deleted. To further increase the sample number and complexity, data-augmentation strategies, including random flips and rotations, were applied to the processed samples (Figure 3). Finally, we obtained 2248 images of size 512 × 512 for each type. We divided the dataset into training and testing sets at a 4:1 ratio, meaning there are 1798 images in the training set and 450 images in the testing set for each type. Note that a small part of the training samples was used for validation. The training set was used to train the network model, the validation set was used to adjust the model parameters during the training process, and the test set was only used to evaluate the model performance and did not participate in the training process.

3. Methods

In this chapter, we will provide a detailed explanation of the methods and technical details employed in our experiment. First, we will introduce the UNet and its network details. Following that, we will discuss the design of the experiment, the environment, and specific parameters. Lastly, we will evaluate the performance of images with different bit depths using evaluation metrics. In the subsequent subsections, we will elaborate on each step in detail.

3.1. UNet Semantic Segmentation Algorithm

UNet [40] is a convolutional neural network (CNN) architecture that was developed based on the FCN [41] architecture and first applied to biomedical image segmentation in 2015. This architecture was designed to address the challenge of segmenting objects with high levels of variability in size, shape, and appearance. To achieve this, UNet combines a contracting path, which captures the context of the input image with a symmetrical expanding path, restoring the detail of the segmented objects. UNet employs a symmetric encoding and decoding structure, which is among the most frequently utilized methods in semantic medical-image segmentation. In the field of remote sensing, the UNet architecture has been widely used for various tasks, including cloud segmentation. The ability of UNet to handle high levels of variability in size, shape, and appearance makes it well-suited for cloud segmentation, which often involves separating cloud objects from the surrounding sky or land. Clouds can exhibit considerable variations in size, shape, and appearance, and UNet’s employment of skip connections helps to preserve detailed information from the input data, making it well-suited for cloud segmentation where objects of interest can have complex internal structures. Multiple studies have shown the effectiveness of using UNet for cloud segmentation [42,43].

The UNet architecture is depicted in Figure 4. The left half of the figure displays the encoding path, while the right half displays the decoding path. The encoding path comprises blocks with two 3 × 3 convolutions and a max pooling layer. The convolution layer is used to extract features from the input remote-sensing image, and the max pooling layer reduces the size of the feature map by half. On the other hand, the decoding path consists of blocks with a deconvolution, a skip connection, and two 3 × 3 convolutions. The deconvolution layer expands the pixel size by a factor of two. To match the size of the output remote-sensing image with that of the input, we used bilinear interpolation to upsample the small-sized hotspot map to obtain the original-sized semantic-segmentation image. The skip connection transfers information from the encoding path to the decoding path, thus enhancing UNet’s segmentation ability by compensating for missing information in the image. Finally, the final segmentation result is obtained through channel downscaling using a 1 × 1 convolution.

3.2. Experimental Design

In order to assess the effect of bit depth on cloud-segmentation performance in various scenes, two sets of experiments were conducted. Firstly, datasets were created, consisting of 512 × 512-sized image sample blocks and corresponding masks for both 16- and 8-bit images, respectively. The cloud-detection performance of the network trained with various bit-depth samples in a single scene was then evaluated. The 16 cloud detection networks trained using the aforementioned training samples were utilized for cloud detection in the test set to examine the impact of bit depth on cloud segmentation performance in different scenes. Prior to network training, the pixels of the input images were normalized to the 0–1 interval through Equation (1):

X_{n o r m} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}

(1)

where X denotes the original data, X_norm denotes the normalized data, and X_max, X_min denote the maximum and minimum values of the original data, respectively.

In the second set of experiments, with all other conditions kept constant, the effect of not normalizing the images was further investigated by conducting experiments without image normalization. This was done in order to determine whether normalization affects the disparity between the experimental results obtained for different bit depths.

3.3. Experimental Enviroment

In this experiment, a Pytorch-based semantic-segmentation model, UNet, was implemented on an RTX TITAN GPU server equipped with 24 GB of memory. The server was configured with an Intel Core i7-8700 processor clocked at 3.20 GHz and 16 GB of memory, and it was running a 64-bit Ubuntu 18.0 operating system. The programming language used was Python 3.8. Key libraries utilized included Gdal, OpenCV, Numpy, Matplotlib, and PIL.

3.4. Experimental Details

During the optimization phase, we employed the Adam optimization algorithm with an initial learning rate of 0.001. In order to dynamically adjust the learning rate, we employed an exponential weight decay strategy with a decay rate of 0.9. The learning rate per epoch was calculated as Equation (2):

l = l_{i n i t i a l} \times {0.9}^{e p o c h}

(2)

where l denotes the learning rate, l_initial denotes the initial learning rate.

Our study involved training all models for approximately 100 epochs until convergence was achieved. Due to memory constraints on the GPU, the batch size was set to 8, and the same parameter settings were maintained to fairly evaluate the performance of various approaches. The detailed steps of our cloud-detection model training and validation are presented in Algorithm 1.

Algorithm 1 Cloud-detection model of different bit depths.

Input: Remote-sensing image of either 8-bit or 16-bit depth

Output: Cloud mask of the image and evaluation metrics

1: Preprocess the input image to obtain 16-bit and 8-bit images

2: Split the image into overlapping patches

3: For each patch in the image:

4: Feed the patch to the trained deep-learning model

5: Obtain the predicted cloud mask of the patch

6: Merge the predicted cloud masks of all patches to obtain the final cloud mask of the entire image

7: Postprocess the final cloud mask to refine the results

8: Evaluate the performance of the deep-learning model using various evaluation metrics, such as accuracy, kappa, etc.

9: Output the final cloud mask of the image and the evaluation metrics

3.5. Evaluation Metrics

In this paper, objective evaluation of the test results is achieved through the use of a confusion matrix. The matrix, of size M × M displays the number of elements that have been classified as a particular category, as well as the number of true instances of that category, where M represents the total number of categories. Precision and kappa are calculated from the information contained in the confusion matrix, with the calculation formulae being expressed as follows:

A_{a c c u r a c y} = \frac{T_{T P} + T_{T N}}{T_{T P} + T_{T N} + F_{F P} + F_{F N}}

(3)

K_{k a p p a} = \frac{N \sum_{i = 1}^{M} x_{i i} - \sum_{i = 1}^{M} (x_{i +} x_{+ i})}{N^{2} - \sum_{i = 1}^{M} (x_{i +} x_{+ i})}

(4)

where T_TP refers to the number of samples that have been correctly identified as clouds, T_TN denotes the number of samples that have been correctly classified as non-clouds, F_FP represents the instances of non-cloud samples that have been mistakenly categorized as clouds, F_FN stands for the cloud samples that have been misclassified as non-clouds, A_accuracy is the proportion of pixels correctly predicted to all pixels in the samples, and K_kappa is a measure of the reduction in errors generated by a classification system compared to random chance. It is calculated using the diagonal element (x_ii) and the sum of the rows and columns (x_i+ and x_+i) of the confusion matrix, as well as the total number of image elements (N). The Kappa coefficient represents the extent to which the classifier has outperformed a purely random classification approach and provides a measure of the effectiveness of the system.

4. Results

In this section, we illustrate the quantitative comparisons in Section 4.1, and then Section 4.2 provides the visualization results.

4.1. Quantitative Comparisons

We conducted experiments on different land cover datasets to explore the performance of cloud segmentation using remote-sensing images with different bit depths. Table 1 shows a quantitative comparison of different datasets, including accuracy and the kappa coefficient. The experimental results indicate that using 16-bit remote-sensing images for cloud segmentation performs relatively well on most types of land cover, especially on urban, grass/crops, and water types, and that it is superior to using 8-bit remote-sensing images. Figure 5 displays a comparison of the classification accuracy of different models, and it can be observed that the model trained on normalized 16-bit images achieves a higher overall classification accuracy than other models. However, it is important to note that various types of land cover may exhibit different degrees of complexity. Thus, different models may perform differently on each type. For example, the best-performing model for Snow/Ice achieves only 73.71% classification accuracy, while the model trained on 8-bit images achieves only 71.22%. After the normalization operation, we also observed a slight reduction in the accuracy gap between 16-bit and 8-bit images. This is likely due to the fact that the difference between pixel values in the 16-bit and 8-bit images is also reduced after normalization. For instance, in the barren scenario, the accuracy gap between 16-bit and 8-bit images after normalization is only 0.9%, whereas the accuracy gap between 16-bit and 8-bit images without normalization increases to 1.16%. Additionally, the kappa coefficient can be used to evaluate the performance of classifiers, as it considers the difference between classifier accuracy and random classification. Figure 6 presents a comparison of the kappa coefficients of the various models, revealing a similar trend as in Figure 5. Specifically, training on normalized 16-bit images produces better classification performance compared to the other models.

Table 2 provides data on the training times for four different models applied to eight different land cover types using 8- or 16-bit remote-sensing images for cloud segmentation. As shown in Figure 7, for most land cover types, training with 16-bit images took longer than training with 8-bit images. This is because deep-learning models require extensive computation for each pixel, including convolution, pooling, and activation functions. The computation load for each pixel in 16-bit images is higher than that in 8-bit images, leading to longer training times.

4.2. Visualization Results

To visually compare different methods, we display some sample results in Figure 8. The figure contains eight examples of clouds with varying scales and appearances listed in two rows. The white areas in the image represent clouds, the black areas denote the background, the green regions indicate missed detections, and the red regions signify false detections. Compared to 8-bit remote-sensing images, 16-bit images obtain the most consistent results with ground truth and have strong robustness in different complex scenarios. Meanwhile, the generalization capacity of 8-bit images is insufficient.

As shown in Figure 8, the use of 16-bit remote-sensing images for cloud segmentation can provide better results in identifying large cloud regions in barren, forest, and grass/crops scenes, while the use of 8-bit images can lead to more fragmented cloud extraction. Moreover, in the shrubland scene, using 8-bit images can lead to false detection, while this phenomenon does not occur when using 16-bit images. In urban, water, and wetlands scenes, using 16-bit images can also provide more detailed extraction of small cloud features. While the extraction results in the snow/ice scene may not be as satisfactory due to similarities between the cloud and background, 16-bit images still provide more informative data. In summary, the use of 16-bit remote-sensing images can offer better cloud extraction results, especially in identifying large cloud regions and small cloud features.

5. Discussion

The primary objective of this study was to assess the impact of different bit depths on cloud-segmentation accuracy and conduct experiments using remote-sensing images of various surface types. The findings indicate that both 8- and 16-bit images exhibit similar competitiveness in cloud segmentation. Nevertheless, the higher bit-depth classification accuracy presented in Figure 8 is notably better, accurately identifying and classifying clouds of various shapes and sizes. Overall, remote-sensing images with higher bit depth provide more stable and higher accuracy and kappa coefficients and perform better in cloud segmentation than lower bit-depth images. However, it is important to note that higher bit-depth images do not always provide more accurate results, as this is also dependent on the selected elemental dataset and classification method. In other words, some images are easier to segment, resulting in higher segmentation scores, while others are more challenging, resulting in lower segmentation scores. The accuracy and kappa coefficients of images with lower bit depths varied by 2–4% in both snow and ice scenes, resulting in poorer classification. This may be attributed to the fact that lower bit depths lead to the missed detection of finer elements within the pixels. The detection of these elements leads to higher intra-class variance, which may result in lower accuracy.

When comparing the use of 16-bit images to 8-bit images for the semantic segmentation of clouds, the primary difference in visualization results is the level of detail and accuracy of the segmented clouds. Due to their increased dynamic range, 16-bit images can capture more detail and variation in the clouds, resulting in more accurate and detailed semantic segmentation. This enables the accurate and detailed segmentation of clouds with clear boundaries and a clear distinction between clouds and the background. Furthermore, 16-bit images are less sensitive to noise, which can reduce the impact of noise on the final segmentation results. In contrast, 8-bit images have a limited dynamic range, which may result in a lack of captured detail and changes in the clouds, leading to less accurate and detailed segmentation of the clouds, less clear boundaries, and less clear distinctions between clouds and the background. Additionally, 8-bit images are quantized to 256 different levels, while each color channel in 16-bit images is represented by 16 bits, allowing for the representation of color values to be limited to 65,536 different levels. This increased resolution of color values can help capture more details and variations within the clouds, contributing to more accurate and detailed semantic segmentation. In conclusion, the semantic segmentation of clouds using 16-bit images can yield more accurate and detailed segmentation results with clear boundaries and a clear distinction between clouds and the background. This is due to the increased dynamic range and resolution of color values in 16-bit images, resulting in a more detailed and accurate representation of clouds. On the other hand, semantic segmentation of clouds using 8-bit images leads to less accurate and detailed segmentation results, with less clear boundaries and less clear distinctions between clouds and the background.

As shown in Table 1 of the results section, omitting the normalization step during training resulted in a decrease in accuracy, further exacerbating the gap in accuracy between 8- and 16-bit images. This highlights the significance of the normalization step in training deep-learning-based algorithms. Normalization is a preprocessing technique that aims to scale pixel values within a specific range, typically [0, 1], which helps the algorithm converge faster and generalize better. It is particularly useful when dealing with images of varying intensities, as it can reduce the effect of contrast differences between images. In the case of 8BN, normalization helps to mitigate some limitations of lower bit depth, such as reduced dynamic range and lower sensitivity to subtle variations in pixel values. On the other hand, increasing the bit depth from 8- to 16-bit results in a higher dynamic range and more precise representation of pixel values, which can improve the segmentation accuracy. In this study, the 16B results without normalization indeed show an improvement over the 8B results. However, it is important to note that the full potential of the 16-bit images might not have been realized due to the absence of normalization. The similar results between 8BN and 16B in Table 1 indicate that normalization can help to bridge the performance gap between lower and higher bit depths to some extent. However, it does not imply that normalization can completely substitute the benefits of using higher bit-depth images. In fact, when both 8- and 16-bit images are normalized (8BN and 16BN), the 16BN results show a higher segmentation accuracy, suggesting that a combination of higher bit depth and normalization can provide even better cloud-detection performance.

While normalizing images can make the weight distribution of a deep-learning model more stable, it may also have an impact on the training time. The normalization process often requires additional computations, which can consume valuable training time, especially when training large-scale models. Additionally, in some cases, normalization can reduce the dynamic range of the pixel values, which can make it more difficult for the model to distinguish between different land-cover types. Thus, while normalization can improve model performance and reliability, it is important to consider its potential impact on training time, especially in the context of training more complex datasets with more iterations. Further investigation is necessary to determine the extent of this impact and how it varies with different datasets, classification methods, and hyperparameters. To further explore the differences in segmentation performance of remote-sensing images with varying bit depths, we conducted training-time analysis. In our study, we determined that the training time for 16-bit images is indeed slightly longer than for 8-bit images. While the difference in Table 2 might not appear significant, it is important to consider the overall computational resources and time required when dealing with large-scale remote-sensing applications. In such scenarios, even small differences in training time can have a more substantial impact on efficiency. Regarding the normalization operation, it does indeed have an effect on efficiency. Normalization helps in reducing the range of pixel values, which in turn can speed up the training process by making convergence faster. However, it is crucial to note that normalization also affects the accuracy of cloud detection, as it helps the algorithm to generalize better across different image intensities. In summary, while there may not be a significant difference in training time between 8- and 16-bit images as shown in Table 2, the consideration of efficiency should take into account the overall computational resources and time required for large-scale applications. It is worth noting that in our experiments, using 8-bit images instead of 16-bit images did not significantly affect training time or GPU usage since both cases used the default training scheme of the 32-bit floating point for model training. While mixed precision (using the 16- and 32-bit floating points) is possible for model training, our experiments found no significant change in model performance in terms of training time and GPU usage since we only utilized 8- and 16-bit images. Nevertheless, this minor difference cannot be disregarded as training time is also impacted by the selected dataset, classification method, and hyperparameter tuning. If more complex datasets and additional iterations were employed, this gap would have been substantially widened, particularly given that we only trained each model for 100 epochs.

In this study, we focused on evaluating the impact of bit depth on cloud-segmentation accuracy using a single deep-learning classification algorithm. This approach allowed us to specifically examine the effect of image-bit depth on cloud segmentation while minimizing the impact of parameter tuning and training sampling considerations. However, to increase the generalizability and transferability of our findings, it may be useful to extend this research to include other classification algorithms and datasets in future studies. It should also be noted that while we focused on cloud segmentation in this study, the potential impact of image-bit depth on other remote-sensing analysis tasks, such as water and building extraction, warrants further investigation.

6. Conclusions

Thanks to the continuous development of computer technology in the field of remote sensing, advanced algorithms, such as statistical methods, artificial neural networks, support-vector machines, and fuzzy-logic algorithms, have been developed and widely used. To some extent, this makes up for the shortcomings of traditional cloud-segmentation methods in terms of low detection accuracy. In this study, we investigated the effect of image-bit depth on the performance of DL-based cloud segmentation using different datasets. Our findings indicate that there is a difference in accuracy for models trained with 8- and 16-bit remote-sensing images, and the accuracy of cloud segmentation is impacted by different surface types. Although 16-bit images are more accurate for cloud segmentation, training with 8-bit images is more efficient. Further exploration of the impact of image-bit depth on other remote-sensing analysis tasks can be addressed in future research. Our research offers valuable insights for remote sensing practitioners, helping them make informed decisions about the optimal bit depth for cloud detection tasks. By considering the balance between efficiency and accuracy, practitioners can select the most suitable bit depth for their specific application, ultimately leading to improved cloud-detection results.

In conclusion, using a higher bit depth, such as 16-bit images, and normalizing the images before semantic segmentation can significantly improve the accuracy of cloud-layer segmentation in remote-sensing images. The increased dynamic range and resolution in color values of 16-bit images can capture more details and variations within the clouds, which is crucial for accurate semantic segmentation. Normalizing the images can help remove variations in brightness and contrast, making the image more consistent and aiding the model to generalize better. Additionally, normalization can also reduce the impact of outliers or extreme values in the image. By utilizing these techniques, the accuracy and reliability of cloud-layer segmentation in remote-sensing applications can be greatly improved.

Author Contributions

L.L. designed the experiments and wrote the manuscript. W.L. supervised the study and reviewed the draft paper. S.L. revised the manuscript and gave some appropriate suggestions. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Chinese Academy of Sciences (Grant Number XDA19010401), and in part by the International Science & Technology Cooperation Program of China (Grant Number 2018YFE0100100).

Data Availability Statement

Not applicable.

Acknowledgments

Thanks to the L8 Biome dataset provided by United States Geological Survey (USGS), which is accessible at https://landsat.usgs.gov/landsat-8-cloud-cover-assessment-validation-data (accessed on 20 October 2022). The authors also thank the anonymous reviewers and the editors for their insightful comments and helpful suggestions to improve our manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

L8 Biome	The Landsat 8 Biome Type Cloud Validation Dataset
ML	Machine Learning
DL	Deep Learning
CNN	Convolutional Neural Network
FCN	Fully Convolutional Network
RF	Random Forest
SVM	Support Vector Machine
OLI	Operational Land Imager

References

Foga, S.; Scaramuzza, P.L.; Guo, S.; Zhu, Z.; Dilley, R.D.; Beckmann, T.; Schmidt, G.L.; Dwyer, J.L.; Hughes, M.J.; Laue, B. Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sens. Environ. 2017, 194, 379–390. [Google Scholar] [CrossRef]
Sabins, F.F., Jr.; Ellis, J.M. Remote Sensing: Principles, Interpretation, and Applications; Waveland Press: Long Grove, IL, USA, 2020. [Google Scholar]
Zhang, Y.; Rossow, W.B.; Lacis, A.A.; Oinas, V.; Mishchenko, M.I. Calculation of radiative fluxes from the surface to top of atmosphere based on ISCCP and other global data sets: Refinements of the radiative transfer model and the input data. J. Geophys. Res. Atmos. 2004, 109, D19. [Google Scholar] [CrossRef]
Gómez-Chova, L.; Camps-Valls, G.; Calpe-Maravilla, J.; Guanter, L.; Moreno, J. Cloud-screening algorithm for ENVISAT/MERIS multispectral images. IEEE Trans. Geosci. Remote Sens. 2007, 45, 4105–4118. [Google Scholar] [CrossRef]
Wang, R.; Pun, M.-O. Robust semisupervised land-use classification using remote sensing data with weak labels. IEEE Access 2021, 10, 43435–43453. [Google Scholar] [CrossRef]
Dey, V.; Zhang, Y.; Zhong, M. A Review on Image Segmentation Techniques with Remote Sensing Perspective. In Proceedings of the ISPRS TC VII Symposium—100 Years ISPR, Vienna, Austria, 5–7 July 2010; Volume 38. [Google Scholar]
Guo, Q.; Tong, L.; Yao, X.; Wu, Y.; Wan, G. Cd_Hiefnet: Cloud Detection Network Using Haze Optimized Transformation Index and Edge Feature for Optical Remote Sensing Imagery. Remote Sens. 2022, 14, 3701. [Google Scholar] [CrossRef]
Yin, M.; Wang, P.; Hao, W.; Ni, C. Cloud detection of high-resolution remote sensing image based on improved U-Net. In Multimedia Tools and Applications; Springer: Cham, Switzerland, 2023. [Google Scholar]
Georgopoulos, N.; Stavrakoudis, D.; Gitas, I.Z. Object-Based Burned Area Mapping Using Sentinel-2 Imagery and Supervised Learning Guided by Empirical Rules. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 9980–9983. [Google Scholar]
Mohd, O.; Suryanna, N.; Sahibuddin, S.S.; Abdollah, M.F.; Selamat, S.R. Thresholding and fuzzy rule-based classification approaches in handling mangrove forest mixed pixel problems associated within QuickBird remote sensing image analysis. Int. J. Agric. For. 2012, 2, 300–306. [Google Scholar]
Zhu, Z.; Wang, S.; Woodcock, C.E. Improvement and expansion of the Fmask algorithm: Cloud, cloud shadow, and snow detection for Landsats 4–7, 8, and Sentinel 2 images. Remote Sens. Environ. 2015, 159, 269–277. [Google Scholar] [CrossRef]
Louis, J.; Debaecker, V.; Pflug, B.; Main-Knorn, M.; Bieniarz, J.; Mueller-Wilm, U.; Cadau, E.; Gascon, F. Sentinel-2 Sen2Cor: L2A processor for users. In Proceedings of the Living Planet Symposium 2016, Prague, Czech Republic, 9–13 May 2016; pp. 1–8. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Latry, C.; Panem, C.; Dejean, P. Cloud detection with SVM technique. In Proceedings of the 2007 IEEE International Geoscience and Remote Sensing Symposium, Barcelona, Spain, 23–28 July 2007; pp. 448–451. [Google Scholar]
Wang, L.; Yan, J.; Mu, L.; Huang, L. Knowledge discovery from remote sensing images: A review. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, e1371. [Google Scholar] [CrossRef]
Orynbaikyzy, A.; Gessner, U.; Conrad, C. Spatial transferability of random forest models for crop type classification using Sentinel-1 and Sentinel-2. Remote Sens. 2022, 14, 1493. [Google Scholar] [CrossRef]
Li, W.; Zou, Z.; Shi, Z. Deep matting for cloud detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8490–8502. [Google Scholar] [CrossRef]
Du, X.; Wu, H. Feature-aware aggregation network for remote sensing image cloud detection. Int. J. Remote Sens. 2023, 44, 1872–1899. [Google Scholar] [CrossRef]
Guo, Y.; Cao, X.; Liu, B.; Gao, M. Cloud detection for satellite imagery using attention-based U-Net convolutional neural network. Symmetry 2020, 12, 1056. [Google Scholar] [CrossRef]
Wieland, M.; Li, Y.; Martinis, S. Multi-sensor cloud and cloud shadow segmentation with a convolutional neural network. Remote Sens. Environ. 2019, 230, 111203. [Google Scholar] [CrossRef]
Xie, F.; Shi, M.; Shi, Z.; Yin, J.; Zhao, D. Multilevel cloud detection in remote sensing images based on deep learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3631–3640. [Google Scholar] [CrossRef]
Liu, J.; Wen, X.; Nie, W.; Su, Y.; Jing, P.; Yang, X. Residual-guided multiscale fusion network for bit-depth enhancement. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 2773–2786. [Google Scholar] [CrossRef]
Joseph, G. Fundamentals of Remote Sensing; Universities Press: Hyderabad, India, 2005. [Google Scholar]
Morkel, T.; Eloff, J.H.; Olivier, M.S. An overview of image steganography. In Proceedings of the ISSA, Pretoria, South Africa, 29 June–1 July 2005; pp. 1–11. [Google Scholar]
Navalgund, R.R.; Jayaraman, V.; Roy, P.S. Remote sensing applications: An overview. Curr. Sci. 2007, 93, 1747–1766. [Google Scholar]
Acharya, T.D.; Yang, I. Exploring landsat 8. Int. J. IT Eng. Appl. Sci. Res. 2015, 4, 4–10. [Google Scholar]
Verde, N.; Mallinis, G.; Tsakiri-Strati, M.; Georgiadis, C.; Patias, P. Assessment of radiometric resolution impact on remote sensing data classification accuracy. Remote Sens. 2018, 10, 1267. [Google Scholar] [CrossRef]
Francis, A.; Sidiropoulos, P.; Muller, J.-P. CloudFCN: Accurate and robust cloud detection for satellite imagery with deep learning. Remote Sens. 2019, 11, 2312. [Google Scholar] [CrossRef]
Ji, S.; Dai, P.; Lu, M.; Zhang, Y. Simultaneous cloud detection and removal from bitemporal remote sensing images using cascade convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2020, 59, 732–748. [Google Scholar] [CrossRef]
Song, S.; Liu, J.; Liu, Y.; Feng, G.; Han, H.; Yao, Y.; Du, M. Intelligent object recognition of urban water bodies based on deep learning for multi-source and multi-temporal high spatial resolution remote sensing imagery. Sensors 2020, 20, 397. [Google Scholar] [CrossRef]
Chen, Y.; He, G.; Yin, R.; Zheng, K.; Wang, G. Comparative Study of Marine Ranching Recognition in Multi-Temporal High-Resolution Remote Sensing Images Based on DeepLab-v3+ and U-Net. Remote Sens. 2022, 14, 5654. [Google Scholar] [CrossRef]
Mahbod, A.; Schaefer, G.; Löw, C.; Dorffner, G.; Ecker, R.; Ellinger, I. Investigating the impact of the bit depth of fluorescence-stained images on the performance of deep learning-based nuclei instance segmentation. Diagnostics 2021, 11, 967. [Google Scholar] [CrossRef]
Afrasiabian, Y.; Noory, H.; Mokhtari, A.; Nikoo, M.R.; Pourshakouri, F.; Haghighatmehr, P. Effects of spatial, temporal, and spectral resolutions on the estimation of wheat and barley leaf area index using multi-and hyper-spectral data (case study: Karaj, Iran). Precis. Agric. 2021, 22, 660–688. [Google Scholar] [CrossRef]
Chen, Z.; Ye, F.; Fu, W.; Ke, Y.; Hong, H. The influence of DEM spatial resolution on landslide susceptibility mapping in the Baxie River basin, NW China. Nat. Hazards 2020, 101, 853–877. [Google Scholar] [CrossRef]
Bradter, U.; O’Connell, J.; Kunin, W.E.; Boffey, C.W.; Ellis, R.J.; Benton, T.G. Classifying grass-dominated habitats from remotely sensed data: The influence of spectral resolution, acquisition time and the vegetation classification system on accuracy and thematic resolution. Sci. Total Environ. 2020, 711, 134584. [Google Scholar] [CrossRef] [PubMed]
Yu, X.; Lu, D.; Jiang, X.; Li, G.; Chen, Y.; Li, D.; Chen, E. Examining the roles of spectral, spatial, and topographic features in improving land-cover and forest classifications in a subtropical region. Remote Sens. 2020, 12, 2907. [Google Scholar] [CrossRef]
Gyamfi-Ampadu, E.; Gebreslasie, M.; Mendoza-Ponce, A. Evaluating multi-sensors spectral and spatial resolutions for tree species diversity prediction. Remote Sens. 2021, 13, 1033. [Google Scholar] [CrossRef]
Poli, D.; Remondino, F.; Angiuli, E.; Agugiaro, G. Radiometric and geometric evaluation of GeoEye-1, WorldView-2 and Pléiades-1A stereo images for 3D information extraction. ISPRS J. Photogramm. Remote Sens. 2015, 100, 35–47. [Google Scholar] [CrossRef]
Jeppesen, J.H.; Jacobsen, R.H.; Inceoglu, F.; Toftegaard, T.S. cloud detection algorithm for satellite imagery based on deep learning. Remote Sens. Environ. 2019, 229, 247–259. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Hu, K.; Zhang, D.; Xia, M. CDUNet: Cloud detection UNet for remote sensing imagery. Remote Sens. 2021, 13, 4533. [Google Scholar] [CrossRef]
Ahmed, T.; Sabab, N.H.N. Classification and understanding of cloud structures via satellite images with EfficientUNet. SN Comput. Sci. 2022, 3, 99. [Google Scholar] [CrossRef]

Figure 1. Global distribution of the Landsat 8 Cloud Cover Assessment (CCA) scenes.

Figure 2. Examples of remote-sensing images of different scenes and its pixel distribution map. Left: Original 16-bit remote-sensing images. Right: Eight-bit remote-sensing image. (a) Barren; (b) Forest; (c) Grass/crops; (d) Shrubland; (e) Snow/ice, (f) Urban; (g) Water; (h) Wetlands.

Figure 3. Data-augmentation images and corresponding masks: (a) The original image; (b) The flipped images; (c) The rotated image.

Figure 4. UNet architecture.

Figure 5. Statistical chart of the accuracy for different types.

Figure 6. Statistical chart of the kappa for different types.

Figure 7. Statistical chart of training time.

Figure 8. Cloud-segmentation results produced by 16- and 8-bit images. The first column: raw images. The second column: manually generated cloud masks. The third column: result of 16-bit images. The fourth column: result of 8-bit images. The red circle shows the part of the results with large differences. (a) Barren; (b) Forest; (c) Grass/crops; (d) Shrubland; (e) Snow/ice; (f) Urban; (g) Water; (h) Wetlands.

Table 1. The quantitative results for different bit depths.

Metric	Type	16BN	16B	8BN	8B
Accuracy	Barren	94.01%	93.21%	93.11%	92.05%
	Forest	89.10%	88.33%	86.91%	85.19%
	Grass/Crops	96.59%	95.15%	93.96%	92.27%
	Shrubland	92.59%	91.97%	91.62%	90.98%
	Snow/Ice	75.13%	73.71%	74.01%	71.22%
	Urban	97.73%	96.84%	95.74%	94.61%
	Water	94.05%	93.16%	93.04%	90.21%
	Wetlands	92.77%	92.35%	91.45%	89.65%
Kappa	Barren	0.859	0.846	0.840	0.823
	Forest	0.814	0.809	0.793	0.774
	Grass/Crops	0.869	0.850	0.848	0.827
	Shrubland	0.838	0.802	0.813	0.771
	Snow/Ice	0.610	0.591	0.592	0.580
	Urban	0.875	0.861	0.858	0.842
	Water	0.851	0.843	0.831	0.807
	Wetlands	0.826	0.808	0.814	0.781

Notes: 16BN = Normalized 16-bit images; 16B = 16-bit images without normalization; 8BN = Normalized 8-bit images; 8B = 16-bit images without normalization.

Table 2. Comparison of training time of different methods.

	Type	16BN	16B	8BN	8B
Traning time(s)	Barren	22,244.0	21,615.6	21,727.2	21,541.8
	Forest	22,104.0	21,750.6	21,800.0	21,607.8
	Grass/Crops	22,082.4	21,794.8	21,822.0	21,652.2
	Shrubland	21,979.2	21,908.4	21,553.8	21,613.2
	Snow/Ice	22,000.2	21,978.6	21,971.4	21,634.2
	Urban	21,817.2	21,666.0	21,605.4	21,592.2
	Water	22,244.4	22,011.6	21,914.4	22,056.0
	Wetlands	22,287.0	21,915.6	21,786.0	21,735.0

Notes: 16BN = Normalized 16-bit images; 16B = 16-bit images without normalization; 8BN = Normalized 8-bit images; 8B = 16-bit images without normalization.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liao, L.; Liu, W.; Liu, S. Effect of Bit Depth on Cloud Segmentation of Remote-Sensing Images. Remote Sens. 2023, 15, 2548. https://doi.org/10.3390/rs15102548

AMA Style

Liao L, Liu W, Liu S. Effect of Bit Depth on Cloud Segmentation of Remote-Sensing Images. Remote Sensing. 2023; 15(10):2548. https://doi.org/10.3390/rs15102548

Chicago/Turabian Style

Liao, Lingcen, Wei Liu, and Shibin Liu. 2023. "Effect of Bit Depth on Cloud Segmentation of Remote-Sensing Images" Remote Sensing 15, no. 10: 2548. https://doi.org/10.3390/rs15102548

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Effect of Bit Depth on Cloud Segmentation of Remote-Sensing Images

Abstract

1. Introduction

2. Materials

3. Methods

3.1. UNet Semantic Segmentation Algorithm

3.2. Experimental Design

3.3. Experimental Enviroment

3.4. Experimental Details

3.5. Evaluation Metrics

4. Results

4.1. Quantitative Comparisons

4.2. Visualization Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI