FedUKD: Federated UNet Model with Knowledge Distillation for Land Use Classification from Satellite and Street Views

Kanagavelu, Renuga; Dua, Kinshuk; Garai, Pratik; Thomas, Neha; Elias, Simon; Elias, Susan; Wei, Qingsong; Yong, Liu; Rick, Goh Siow Mong

doi:10.3390/electronics12040896

Open AccessArticle

FedUKD: Federated UNet Model with Knowledge Distillation for Land Use Classification from Satellite and Street Views

by

Renuga Kanagavelu

¹,

Kinshuk Dua

²

,

Pratik Garai

²

,

Neha Thomas

³,

Simon Elias

⁴,

Susan Elias

^5,*,

Qingsong Wei

¹,

Liu Yong

¹ and

Goh Siow Mong Rick

¹

Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore

²

Centre for Advanced Data Science, Vellore Institute of Technology, Chennai 600127, India

³

College of Engineering, Anna University, Chennai 600025, India

⁴

Measi Academy of Architecture, Chennai 600014, India

⁵

School of Electronics Engineering, Vellore Institute of Technology, Chennai 600127, India

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(4), 896; https://doi.org/10.3390/electronics12040896

Submission received: 5 January 2023 / Revised: 27 January 2023 / Accepted: 28 January 2023 / Published: 9 February 2023

(This article belongs to the Special Issue Recent Advances in Blockchain Technology and Distributed AI Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Federated deep learning frameworks can be used strategically to monitor land use locally and infer environmental impacts globally. Distributed data from across the world would be needed to build a global model for land use classification. The need for a federated approach in this application domain would be to avoid the transfer of data from distributed locations and save network bandwidth to reduce communication costs. We used a federated UNet model for the semantic segmentation of satellite and street view images. The novelty of the proposed architecture involves the integration of knowledge distillation to reduce communication costs and response times. The accuracy obtained was above 95% and we also brought in a significant model compression to over 17 times and 62 times for street-view and satellite images, respectively. Our proposed framework has the potential to significantly improve the efficiency and privacy of real-time tracking of climate change across the planet.

Keywords:

federated learning; knowledge distillation; land use classification; UNet; semantic segmentation

1. Introduction

Monitoring land use and land cover (LULC) in real time to track environmental impacts across the globe is required for the sustainable development of our planet. The speed at which the problem advances, combined with an urgent need to address the ecological impacts of climate change, has spurred research into the field. With the rapid rise of climate change (the world’s biggest environmental challenge), the worldwide predicament of how to efficiently combat it is currently being focused on. Around 196 countries entered the Paris climate accord in 2015; this accord is an international agreement that pledges to keep the global average temperature increase below 1.5 °C and reduce greenhouse gas emissions [1]. The challenge is to efficiently monitor compliance and bring in accountability on a global scale. We propose the design of an AI-driven model to monitor the ground truth of land use in order to estimate its impact on the environment in real time.

There are two different ways that LULC can be monitored, i.e., via (i) aerial view images from satellites or drones and (ii) street view images from CCTV cameras [2,3]. From an aerial view, we can observe the overall change in land cover and obtain environmental data across different types of land. The street view analysis provides city-wide insight into energy consumption, population density, etc. The two views combined can provide extensive analysis of land use, for detection and monitoring, from two different perspectives. Automating the surveillance of LULC globally can, therefore, help to monitor climate change that can be inferred from land use.

Satellite views of countries across the world can vary dramatically in their characterization. Even within a country, there could be land categories that range from snow-covered areas to deserts, agricultural areas, and water bodies, besides urban and non-urban residential and commercial areas. Similarly, objects that are featured in street view can drastically change based on the local culture, climate, and economic status of the region. The data will, therefore, be non-IID (independent and identically distributed), with feature distribution skews, label distribution skews, quantity skews, etc.

We propose a federated learning (FL) approach to train a deep learning model for semantic segmentation. The motivation to use FL is to be able to deal with the non-IID nature of data from across the globe. When models are built using publicly available images, data privacy or security are not issues but the increased communication cost due to the high network bandwidth requirement will be a bottleneck if data need to be available in a centralized location for training. In real time, when the model is created using distributed data from various countries, communication costs need to be minimal in order to claim that the model is efficient. In order to mitigate all of these issues, in this paper, we present a novel federated UNet with knowledge distillation (FedUKD). The main contributions of our work are as follows:

1.: To our knowledge, we are the first ones to propose novel federated learning (FL) approach incorporated with knowledge distillation to train a deep learning model for semantic segmentation of land use data.
2.: Our approach can significantly reduce the computation and communication costs by compressing the model by 62× while still maintaining the baseline accuracy. To prove that the results are dataset-independent, we performed an analysis of two datasets: (1) the CityScapes dataset [4] to cover the street view use-cases of real-time land use and land cover monitoring, and (2) own Chennai land use dataset for the satellite view use-case.
3.: We present the novel Chennai land use satellite dataset; it is publically available for future studies.

2. Related Work in Land Use Classification

Several research works in the past have focused on automating the task of land use and land cover (LULC) detection for real-time monitoring and surveillance [2,3,5,6,7,8,9,10,11]. Machine learning (ML) and deep learning (DL) approaches have been used extensively in this domain and insightful research findings have been reported. Amongst the various state-of-the-art image processing techniques, semantic segmentation has been the most relevant approach for land use and land cover (LULC) change detection. The spatial resolutions of satellite images made available for research have been improving, adding several new dimensions to the field as smaller urban objects are now visible. The research approaches used earlier were spectral image classification, pixel-based image analysis (PBIA), and object-based image analysis (OBIA), but currently, there seems to be a paradigm shift toward pixel-level semantic segmentation [7,11]. A recent research publication [7] has presented a comprehensive review of the advancements in deep learning-based semantic segmentation for urban land use and land cover (LULC) detection. The main task of semantic segmentation is to classify each pixel of an image, using features that are derived from the ground truth indicated using masks in annotated images. DL frameworks use the masks as labels for the automated selection of features that enable the classification of LULC regardless of seasonal, temporal, and spatial changes [8,9,10].

The survey presented in [12] documented details of DL frameworks that are the foundation of many modern image segmentation DL architectures. One popular framework is the encoder–decoder-based model, with UNet being of significant importance. It consists of two functional blocks; a contracting or downsampling block that extracts features using 3 × 3 convolutions. The features are then copied to the expanding or upsampling block, which reduces feature maps and increases dimensions, so as to not lose pattern information. From the feature maps, a segmentation map is then generated [13]. It is worth noting that UNets make use of data augmentation, which increases the number of labeled samples used in training, as the datasets used might not contain enough annotated data required for training the model from scratch. They are used to learn from a few labeled images. In the work presented in this paper, we make use of UNet-based DL architecture for semantic segmentation. Several existing research works have used image segmentation for land use and land cover (LULC) classification toward the goal of monitoring changes in urban land use through satellite images and street view images, involving several different land use categories. The following subsections summarize these works.

2.1. LULC Detection from Satellite Images

In several existing works [3,14,15,16], the use of DL techniques in LULC classification, emphasizing multi-spectral and hyperspectral, images are illustrated. These types of images contain spatial and spectral data, from which material characteristics and differences in the land use categories are retrieved for classification. The datasets used for this were both extensive with ample amounts of labeled images as well as datasets containing images that were to be labeled using semi-supervised techniques, as only a few real-world data samples can be provided along with corresponding ground truth information. It was found that aerial remote sensing images had higher spatial quality, so they were preferred, albeit with low temporal resolution. The challenges were in obtaining well-annotated datasets as well as remote sensing technologies that worked with deep learning frameworks. Additionally, for any deep learning model to learn and leverage spatial patterns, combining the mapped images with the ground truth is necessary. This mapping process can be done in several ways [17], most commonly using a three-dimensional grid that conserves the spatial relationships between the different landmarks and locations in the imagery. These two sets of images, the mapped imagery as well as the ground truth, are used as training samples and for automatic feature extraction by the DL model [18]. An interesting work that monitors landscape changes in coastal, agricultural, as well as urban areas is presented in [16]. Satellite images were segmented and classified, such as wetland, water, agriculture, and urban/suburban regions. Using the OpenStreetMap (OSM) database, a few existing research works [19,20] have demonstrated the classification of land use categories, such as water, road, and agricultural, with seasonal and climate-related changes. Time-series analysis is also relevant to aerial/satellite image analysis, with algorithms, such as dynamic time warping (DTM) filling in temporal gaps in the remote sensing time-series data [21] and helping to overcome the limitation of irregularly distributed training samples. Deep learning frameworks for classifying crops with cloudy and non-cloudy images using the Sentinel-2 time series dataset have been presented [22,23]. Other works [24,25] have focused on changes in water bodies to monitor and analyze the urban aquatic ecosystem. This was done by using remote-sensing technologies, such as normalized difference water index (NDWI) and modified normalized difference water index (MNDWI) methods for the classification of water bodies and non-water bodies.

2.2. Street View Image Analysis

Semantic segmentation on street view images has been an interesting area of research. The most popular models for semantic segmentation are the FCN (fully convolutional network), UNet, and DeepLabV3+. In [26], a comparative analysis of the performances of these models was done and presented for 11 classes of street objects, such as sky, building, road, tree, car, pavement, pedestrians, etc. The dataset used was CamVid (2007), one of the first to provide a labeled dataset for semantic segmentation. The results showed that each of the above three models performed better for some classes in comparison to the others. The work presented in [27] uses the Tencent map dataset, to analyze the percentages of green and blue covers available in street view images. Semantic segmentation of the images was done to detect the percentages of green and blue covers available in street view images. This information was correlated with the mental health data for the corresponding regions. The goal was to analyze the impacts of these outdoor features on geriatric depression.

In [28], Google Street View data were used to develop an automated street sidewalk detection model. The images were translated into graph-based segments and machine learning approaches were used to obtain good accuracy. The goal of the work presented in [29] was to design a model for real-time semantic segmentation of street view images required for autonomous applications. The DSANet model was introduced and found to have a faster response during real-time inference of semantic segmentation on street scenes taken from CamVid and CityScapes datasets [4]. Analyzing the model’s performance on the Mapillary Vistas dataset [27] was considered challenging as the images were harder to segment compared to previously released datasets for semantic segmentation. A variant of the FCN called dilated convolution network was designed to meet this challenge and it was found to have good accuracy for segments occupying large areas. Another challenging set of street view images containing changes in scenes due to seasonal and lighting variations was released in [30] and a model for pixel-wise change detection occurring due to season, lighting, and weather (called deep deconvolutional network) was also presented.

In [31], a global-and-local network architecture (GLNet) has been proposed to incorporate contextual information from a local context and spatial information from a global context for better performance. This was analyzed using the CityScapes dataset for street view images; the improvements over the existing models, including PSPNet and ICNet are presented. A comprehensive review of the various datasets that can be used and research directions in the field are also well documented [32]. However, there has been no attempt to design a federated deep learning architecture for the semantic segmentation of street view images. This is the research gap that is addressed in our paper.

3. Material and Methods

Federated learning (FL) is an emerging field with many interesting open research problems [33]. It is expected to provide responsible AI solutions by integrating data privacy and security measures. However, despite not sharing data in distributed computing systems, there are several privacy threat models associated with the different stages of ML/DL. The data publishing stage is prone to attribute inference attacks, the model training stage is vulnerable to reconstruction attacks, while the model’s inference stage can be affected by reconstruction attacks, model Inversion attacks, and membership inference attacks. Privacy-preserving ML/DL use defense techniques, such as homomorphic encryption, secure multi-party computation, and differential privacy.

Privacy is an important consideration even when working with seemingly innocent satellite imagery, as it can reveal sensitive information about individuals and organizations. For example, satellite imagery can be used to identify the location and type of infrastructure, such as military bases or power plants, which could be of interest to adversaries. Additionally, high-resolution images can reveal details about individual homes, such as the presence of a swimming pool or solar panels, which could be used to infer information about the occupants’ wealth or lifestyle. Furthermore, the use of satellite imagery combined with other data sources, such as street view images, can be used to identify and track individuals, raising concerns about surveillance and the violation of individual privacy rights. Therefore, it is important to consider privacy implications when working with satellite imagery for monitoring land use and land cover, and to take appropriate measures to protect the sensitive information that may be revealed by the images.

In the context of satellite imagery, federated learning would enable different organizations to train a more accurate land use classification model on their own satellite imagery without sharing the raw data. This would prevent sensitive information from being exposed and enable the organizations to maintain control over their own data.

For the application proposed in this paper, the goal is to adopt a federated learning approach in developing a shared deep learning model for the semantic analysis of satellite and street view images. The federated averaging algorithm (FedAvg) was first employed for distributed machine learning over a horizontally partitioned dataset [34]. In federated averaging, each party uploads the clear-text gradient to a coordinator independently, then the coordinator computes the average of the gradients and updates the model. Finally, the coordinator sends the clear-text updated model back to each party. The federated averaging (FedAvg) algorithm works well for certain non-convex objective functions under independent and identical (IID) settings but may not work in non-IID datasets [35].

3.1. Federated UNet for Semantic Segmentation

The motivation to create a federated UNet model for land use classification was derived from the work presented in a related work titled UNet for semantic segmentation on unbalanced aerial imagery (https://github.com/amirhosseinh77/UNet-AerialSegmentation, accessed on 3 January 2023). The related repository presents a dataset created from the UAE region, with labels intended for semantic segmentation of land use. Land Use can vary from country to country based on the geographical terrain. Some countries may have more deserts, forests, snow-covered land, agricultural land, water bodies, etc., besides regular residential and non-urban areas. For some countries, the same land use can look very different in aerial view; for instance, residential areas could range from chaotic-looking unplanned areas to well-planned urban areas in some countries. These real-world variations make land use classification very region-specific. In order to have global models to monitor climate change across the world, we need to have a federated learning approach in training the models toward automation in land use detection. We first implemented a federated UNet model using the datasets and distributed scenarios described in the following section (see Figure 1). The UNet used was a big model with about 17 million parameters and, hence, communication overhead was high. The accuracy was good and the model was found to be effective for a federated learning framework. Thus, the federated UNet model was used for LULC classification while further enhancement through model optimization was achieved and the details are presented in the following subsection.

3.2. FedUKD: Federated Unet with Knowledge Distillation

We successfully implemented a federated UNet-based architecture for semantic segmentation of satellite images and street view images. We used a three-client scenario to demonstrate the model’s performance. However, in reality, the design of a global model for LULC detection will involve contributions by thousands of local client models from participating countries, leading to high communication overheads. To optimize communication costs, we integrated a teacher–student knowledge distillation approach presented in the FedKD model [36]. The reduction in communication costs is achieved by sharing the smaller student model to build the global model instead of sharing large models with the server. The larger teacher model is used locally for reciprocal learning between the teacher and student models. The teacher model is updated and maintained locally at each client while the student model updates are all aggregated by the central server and global updates are distributed to clients to update their student models. This process continues until the student model converges.

In Figure 2, the teacher–student knowledge distillation workflow is illustrated. This is the interaction that happens at every client location. For the teacher model, we used UNet with around 17 million parameters. We implemented the knowledge distillation framework for a larger CityScapes dataset. For the student model, we used an approach to downsize the UNet presented in an existing work [37,38]. We proposed a much smaller FCN network that was derived by taking the architecture of the original UNet and reducing the number of layers and parameters in the structure. The kernel size of the convolutional layer was kept the same, i.e., 3 × 3. Our proposed model has 2 downsample and upsample layers each (vs. 4 of each in the original UNet). The filters in these convolution blocks were reduced to 16, 32, 32, and 16 vs. 64, 128, 256, 512, 512, 256, 128, and 64 of the UNet model. This reduction in the number of layers and the filters in these layers gave us a much smaller model with 118,000 parameters. The performances of the models and related discussions are presented in the following subsections. Algorithm 1 describes the model-building process and details of the communication rounds. The server procedure describes the global update rounds and aggregation process; the UKDLearning procedure explains the teacher–student knowledge distillation process. Algorithm 1 helps one to understand the overall workflow of the proposed FL framework for designing a UNet model through knowledge distillation to save communication costs.

Algorithm 1 Federated UNet with knowledge distillation

Input: pretrained teacher T

number of clients C

number of communication rounds R

number of local training iterations i

Output: globally trained model weights

w_{f}^{r}

and

w_{g}^{r}

1:: procedureServer
2:: Initialize $w_{f}^{0}$ and $w_{g}^{0}$
3:: for i = 0, 1, … R − 1 do
4:: for i = 0, 1, … C in parallel do
5:: ${(w_{f}^{r})}_{i} \leftarrow U K D L e a r n i n g (i, T {(w_{f}^{r})}_{i})$
6:: $(w_{g}^{r + 1}), (w_{f}^{r + 1}) \leftarrow F e d A v g (w_{f}^{r})$
7:: for i = 0, 1, … C in parallel do
8:: send( ${(w_{f}^{r})}_{i}$ ) to i
9:
10:: procedureUKDLearning((i,t, ${(w_{f}^{r})}_{i}$ ):)
11:: Update student based on ${(w_{f}^{r})}_{i}$
12:: for n = 0, 1, … E do
13:: for d in dataset do
14:: $S \leftarrow s t u d e n t (d)$
15:: $t \leftarrow T (d)$
16:: $L_{P} \leftarrow p i x e l w i s e l o s s (s, t)$
17:: $L_{C} \leftarrow c r i t e r i o n (S, g t)$
18:: $L_{t} \leftarrow α L_{P} - (α - 1) L_{C}$
19:: Train student to minimize $L_{t}$

4. Experimental

For the research presented in this paper, we used separate datasets to demonstrate federated learning for aerial views and street views. For aerial views, we created our own dataset, and for street views, we used the CityScapes dataset. More details on the datasets are presented in the following subsections:

4.1. Land Use Classification from Satellite Images

4.1.1. Proposed Chennai Land Use (CLU) Dataset

Creating labels for semantic segmentation is a challenging task. In most cases, particularly in the satelliteimaging domain, experts have limited time to support research activities. Moreover labeled segments could vary from person to person and, hence, post-processing techniques are employed to bring about a consensus amongst the labels provided by different experts. Here, we present a novel approach to creating land use image labels that can be generated in a consistent manner, providing reliable and accurate labels for training. For this demonstration, we use satellite images from Chennai, a city in India. The Chennai Metropolitan Development Authority (CMDA) provided the city’s master plan land use maps on their official website (http://cmdalayout.com/landusemaps/landusemaps.aspx, accessed on 3 January 2023). These maps provide the ground truth accurately. We used these maps region-wise and extracted the corresponding satellite images from the Google Earth portal. Since the original CMDA maps had legends that were not suitable for semantic segmentation, color coding had to be redone for each region. For illustration, the satellite image and corresponding annotated image of a region called Aminjikarai are presented in Figure 3. The land use dataset we created contains labeled data for 70 of such regions in Chennai. The dataset can be downloaded and used for research purposes (the link to download is available at the end of Section 5).

4.1.2. CLU Dataset for Federated Learning

In order to synthetically create a label-skewed distributed scenario for the study, the Chennai land use (CLU) dataset was split into three datasets as follows:

Training—total 60 satellite images and corresponding ground truth images.
-
Dataset 1—has 29 images. Images with the water body, agricultural, urbanize, and coastal regulation zone (CRZ) legends can be found in this dataset.
-
Dataset 2—has 9 images. These images do not have water body, agricultural, urbanize, or CRZ legends, but all of them have the non-urban legend.
-
Dataset 3—has 22 images. These images do not have water body, agricultural, urbanize, CRZ, or non-urban legends.
Testing—total 10 images.

4.2. Land Use Classification from Street View Images

4.2.1. CityScapes Street View (CSP) Dataset

In this research, the dataset used to infer land use from street view images is the CityScapes dataset [4], created primarily to support research related to semantic understanding of urban street scenes. It is a labeled dataset providing polygonal annotations for vehicles, people, and objects in the urban outdoors for about 30 classes. The dataset provides street view images from 50 cities with diverse scenarios of seasons, weather, and time of day, as well as a range of objects with vibrant backgrounds and scene layouts. To create the dataset needed for this research, we used samples from CityScapes fine annotations as segmentation masks for corresponding images from the left 8-bit samples provided for training.

4.2.2. CSP Dataset for Federated Learning

The focus of the research presented in this paper is on the proposed federated learning (FL) framework for semantic analysis of street view images. We consider three different locations in a city to be the three clients in our federated learning framework. The images for training/inference could be obtained from localized CCTV footage or Google Street View images captured in real time. In a smart city deployment, this federated learning model could scale to thousands of client nodes representing all streets of an entire city under surveillance. Sample images from the CityScapes dataset are used for representation in this section. It has 30 labeled classes of outdoor objects and a subset of the objects are present in each image. For example, see Figure 4). In order to create a label-skewed scenario for the study, we considered class vehicles and class people and ensured that the dataset created for the three clients did not have an overlap of these labels. Some of the other classes may or may not be present based on the view.

In order to synthetically create a label-skewed distributed scenario for the study, the CityScapes dataset was split into 3 datasets as follows:

Training—a total of 98 original images and corresponding ground truth images.
-
Dataset 1—vehicles with no people—65 images.
-
Dataset 2—people with no vehicles—22 images.
-
Dataset 3—no people and no vehicles—11 images.
Testing—a total of 20 images.

To analyze the model’s performance of our proposed FedUKD, i.e., the federated UNet model with knowledge distillation, the CSP data were quantity-skewed for the study. Here, the training set considered had 2975 street view samples with corresponding masks and 500 images were available for validation. The motivation of the research is to develop a global model for street view understanding for challenging scenarios where different locations of a city present different subsets of objects to be trained or identified. Currently, privacy-preserving approaches are integral parts of any distributed computing application and, hence, federated learning frameworks for real-time smart city surveillance are being considered by governments globally.

5. Results and Discussion

Model performance analyses were carried out as follows:

Centralized UNet performance on the Chennai land use dataset;
Centralized UNet performance on the CityScapes dataset;
Federated UNet performance on the label-skewed Chennai land use dataset;
Federated UNet performance on the label-skewed CityScapes dataset;
FedUKD model performance on the quantity-skewed CityScapes dataset;
FedUKD model performance on the label-skewed CityScapes dataset.

The Chennai land use (CLU) dataset comprising 60 images was divided among 3 client nodes under a label-skewed scenario created for analyzing the effectiveness of the federated approach of model-building. The results after the third communication round of global updates to the clients are presented in Figure 5a. The performance using a centralized model was also captured in order to compare the performances on the designated test images for both centralized and federated learning approaches. The cross-entropy loss for 50 epochs can be observed for the 3 clients and the centralized approach. The performance was found to be more than 95% in all cases for 10-fold cross-validation. Similarly, the performance of the CityScapes dataset is presented in Figure 5b. A sample size of 118 images were used in a label-skewed scenario distributed across 3 clients. For the FedUKD experiment, there were 10 communication rounds; each with two epochs of training. The graph of the results represents cumulative epochs (Figure 6a). Communication and federated averaging happen after every two epochs. For the best results, the temperature was set at 5 and the learning rate alpha to 0.3. Both CityScapes and Chennai land use datasets were pre-processed to be non-IID as described in Section 3.

On the CityScapes dataset, the teacher model was trained on a large dataset (2975 training samples, 500 validation samples) for just two epochs. The accuracy was 72% for training and 66% for validation. The student model was trained on the smaller non-IID dataset consisting of only 98 images. Max training accuracy after federated learning was 71% for client 1 while max validation accuracy was 55%. The teacher model had 17 million parameters, which represented 69 MB of data to be transferred over the network. The student model only had 1 million parameters, which only took 4 MB of space. We compressed the data by almost 17 times. The results were even better with the Chennai land use dataset (Figure 6b). The accuracy was 97% and we shrunk the 17 million model parameters to 270,000, which only represented 1 MB of data to be transferred, an optimization of over 62 times in terms of the number of parameters and 69 times in terms of space. Parameter optimization is the reduction of the number of parameters while space optimization is the reduction in the amount of memory taken by those parameters; for example, we reduced the space requirement from 69 to 1 MB.

Comparison

To the best of our knowledge, there are no studies that have implemented both federated learning and knowledge distillation for this particular use case or the chosen datasets. However, we compared the results of three state-of-the-art compression methods that use the same (or similar) datasets. For the full comparison summary, Table 1.

The spirit distillation [39] method for compressing large semantic segmentation network models utilizes knowledge transfer from multiple domains. “Spirit distillation” is based on the idea of distillation, which is a knowledge transfer technique used to transfer the knowledge of a large, pre-trained model to a smaller one. The authors propose to extend this technique by transferring knowledge from multiple domains and using it to train a smaller model. The authors evaluated their method on several benchmarks and tasks, including the CityScapes dataset, and showed that it can achieve similar performance to the large model with 2× compression. They achieved an accuracy of 84% on the CityScapes dataset; however, they only used a very small fraction (64 images) of the total CityScapes dataset, which consisted of around 5000 images.

Compared to spirit distillation, even though we have lower overall accuracy, the accuracy drop after federated knowledge distillation is only a few percent and achieves a much higher (17×) compression.

Similarly, in ContextNet [40], the authors compressed the model even more using architecture that builds on factorized convolution, network compression, and pyramid representation. They brought it down to about 0.9 million parameters (representing a 20× compression); however, the compression came at the steep cost of 10–20% accuracy loss in the CityScapes dataset and worked with lower-resolution images.

The above two models were tested on street view datasets. For the satellite view, MKANet [41] was only able to compress the model by 2×, on the GID dataset, which is similar to CLU.

We summarize the benefits of using knowledge distillation with our federated UNet below:

1.: Clients are located far away in the case of urban planning since they can span entire cities. This makes communication slow and expensive, especially for large models. Our FedUKD model can shrink the data needed to be transferred by over 69 times, making communication quick and effective.
2.: The student model can achieve an accuracy comparable to the teacher model, as demonstrated by our experiment on the Chennai land use dataset.
3.: The model might converge faster (in fewer epochs and, thus, less time) because the number of parameters to train is much smaller.
4.: In case a big dataset is not available and clients do not want to share data, the teacher model can be trained on a bigger public dataset and then be used to train the student model on a smaller non-IID dataset available to the clients.
5.: Our approach can also prevent certain clients from overfitting, for example, client 2 had a very small number of samples and, hence, over-fitted around epoch 6, but after the federated averaging step, client 2’s loss was aligned with the loss from the other clients.

Code link:https://github.com/kinshukdua/FedUKD/ (accessed on 3 January 2023); Link for the Chennai land use (CLU) dataset: https://github.com/simonelias128/Chennai-Land-Use-Dataset/ (accessed on 3 January 2023).

6. Conclusions and Future Work

Remote monitoring of global climate change can help address the growing impacts on the environment and our planet. The 196 countries that agreed to the Paris climate accord in 2015 pledged to keep the global average temperature increase to below 1.5 °C and reduce greenhouse gas emissions. However, how can these promises be measured and monitored for compliance, both locally and globally? In this paper, we propose a novel strategy using a state-of-the-art federated learning approach to design and develop a deep learning model using knowledge distillation for land use classification from satellite and street view images. Using a federated approach for this application can save communication costs by carrying out localized training and avoiding the transfer of data to a centralized location. Federated learning efficiently handles the inherently non-IID data across clients during the distributed model-building process. We created our own dataset for the study and used a benchmarked dataset to evaluate the performance of our model. Since impacts on the environment and climate change can be inferred from land use, robust land use monitoring systems can help bring accountability to climate change management across the globe. The proposed model (with more than 95% accuracy) can be integrated into computer vision-based systems to monitor the global impacts on the environment.

The tests were performed on only three federated clients. We are planning to extend the size of the federated learning testbed by using cloud-based (e.g., AWS or Azure) clients to find out real-world computations and communication costs associated with larger-scale implementations. Our model uses the FedAvg (federated averaging) algorithm for the aggregation stage of federated learning; we are also planning to come up with a new aggregation algorithm to make the model even more effective. More applications can also be explored for the FedUKD model aside from LULC, considering the compression power.

Author Contributions

Conceptualization: R.K., S.E. (Susan Elias), Q.W. and L.Y.; Software Implementation: K.D. and P.G.; Investigation: R.K., K.D., P.G., N.T., S.E. (Simon Elias) and S.E. (Susan Elias); Data curation: N.T., S.E. (Simon Elias) and G.S.M.R.; Writing—original draft: R.K., K.D. and S.E. (Susan Elias); Visualization: K.D., P.G. and G.S.M.R.; Supervision: R.K., S.E. (Susan Elias), Q.W. and L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data and code are available and mentioned in the text.

Conflicts of Interest

The authors declare no conflict of interest.

References

Agreement, P. Paris agreement. In Proceedings of the Report of the Conference of the Parties to the United Nations Framework Convention on Climate Change (21st Session), Paris, France, 11 December 2015; Retrived December; HeinOnline. 2015. Volume 4, p. 2017. [Google Scholar]
Wang, S.; Chen, W.; Xie, S.M.; Azzari, G.; Lobell, D.B. Weakly supervised deep learning for segmentation of remote sensing imagery. Remote Sens. 2020, 12, 207. [Google Scholar] [CrossRef]
Cao, R.; Zhu, J.; Tu, W.; Li, Q.; Cao, J.; Liu, B.; Zhang, Q.; Qiu, G. Integrating aerial and street view images for urban land use classification. Remote Sens. 2018, 10, 1553. [Google Scholar] [CrossRef]
Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The CityScapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
Atik, S.O.; Ipbuker, C. Integrating convolutional neural network and multiresolution segmentation for land cover and land use mapping using satellite imagery. Appl. Sci. 2021, 11, 5551. [Google Scholar]
Liu, T.; Yang, L.; Lunga, D. Change detection using deep learning approach with object-based image analysis. Remote Sens. Environ. 2021, 256, 112308. [Google Scholar] [CrossRef]
Neupane, B.; Horanont, T.; Aryal, J. Deep Learning-Based Semantic Segmentation of Urban Features in Satellite Images: A Review and Meta-Analysis. Remote Sens. 2021, 13, 808. [Google Scholar] [CrossRef]
Yi, T.J. Semantic Segmentation of Aerial Imagery Using U-Nets. 2020. Available online: https://scholar.afit.edu/etd/3593 (accessed on 3 January 2023).
Papadomanolaki, M.; Vakalopoulou, M.; Karantzalos, K. Urban Change Detection Based on Semantic Segmentation and Fully Convolutional LSTM Networks. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, V-2-2020, 541–547. [Google Scholar]
Rousset, G.; Despinoy, M.; Schindler, K.; Mangeas, M. Assessment of Deep Learning Techniques for Land Use Land Cover Classification in Southern New Caledonia. Remote Sens. 2021, 13, 2257. [Google Scholar]
Osco, L.P.; Junior, J.M.; Ramos, A.P.M.; de Castro Jorge, L.A.; Fatholahi, S.N.; de Andrade Silva, J.; Matsubara, E.T.; Pistori, H.; Gonçalves, W.N.; Li, J. A review on deep learning in UAV remote sensing. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102456. [Google Scholar] [CrossRef]
Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. arXiv 2020, arXiv:2001.05566. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Tewabe, D.; Fentahun, T. Assessing land use and land cover change detection using remote sensing in the Lake Tana Basin, Northwest Ethiopia. Cogent Environ. Sci. 2020, 6, 1778998. [Google Scholar]
Vali, A.; Comai, S.; Matteucci, M. Deep Learning for Land Use and Land Cover Classification Based on Hyperspectral and Multispectral Earth Observation Data: A Review. Remote Sens. 2020, 12, 2495. [Google Scholar] [CrossRef]
Fonji, S.; Taff, G.N. Using satellite data to monitor land-use land-cover change in North-eastern Latvia. Springerplus 2014, 3, 61. [Google Scholar] [CrossRef] [PubMed]
Kerins, P.; Guzder-Williams, B.; Mackres, E.; Rashid, T.; Pietraszkiewicz, E. Mapping Urban Land Use in India and Mexico using Remote Sensing and Machine Learning; WRI: Washington, DC, USA, 2021. [Google Scholar]
Masolele, R.N.; De Sy, V.; Herold, M.; Marcos, D.; Verbesselt, J.; Gieseke, F.; Mullissa, A.G.; Martius, C. Spatial and temporal deep learning methods for deriving land-use following deforestation: A pan-tropical case study using Landsat time series. Remote Sens. Environ. 2021, 264, 112600. [Google Scholar]
Fyleris, T.; Krišciunas, A.; Gružauskas, V.; Calneryte, D. Deep Learning Application for Urban Change Detection from Aerial Images. In GISTAM 2021: Proceedings of the 7th International Conference on Geographical Information Systems Theory, Applications and Management, Online, 23–25 April 2021; SciTePress: Setubal, Portugal, 2021; pp. 15–24. [Google Scholar]
Srivastava, S.; Lobry, S.; Tuia, D.; Munoz, J.V. Land-use characterisation using Google Street View pictures and OpenStreetMap. In Proceedings of the AGILE 2018, Lund, Sweden, 12–15 June 2018. [Google Scholar]
Viana, C.M.; Girão, I.; Rocha, J. Long-Term Satellite Image Time-Series for Land Use/Land Cover Change Detection Using Refined Open Source Data in a Rural Region. Remote Sens. 2019, 11, 1104. [Google Scholar] [CrossRef]
Campos, T.M.; García, H.F.; Martínez, B.; Izquierdo, V.E.; Atzberger, C.; Camps, V.G.; Gilabert, M. Understanding deep learning in land use classification based on Sentinel-2 time series. Sci. Rep. 2020, 10, 17188. [Google Scholar] [CrossRef]
Abdi, A.M. Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data. GIScience Remote Sens. 2020, 57, 1–20. [Google Scholar] [CrossRef] [Green Version]
Ce, Z.; Isabel, S.; Xin, P.; Huapeng, L.; Andy, G.; Jonathon, H.; Peter, M.A. Joint Deep Learning for land cover and land use classification. Remote Sens. Environ. 2019, 221, 173–187. [Google Scholar]
Ali, M.; Dirawan, G.; Hasim, A.; Abidin, M. Detection of Changes in Surface Water Bodies Urban Area with NDWI and MNDWI Methods. Int. J. Adv. Sci. Eng. Inf. Technol. 2019, 9, 946. [Google Scholar]
Li, T.; Jiang, C.; Bian, Z.; Mingchang, W.; Niu, X. Semantic Segmentation of Urban Street Scene Based on Convolutional Neural Network. J. Phys. Conf. Ser. 2020, 1682, 012077. [Google Scholar] [CrossRef]
Helbich, M.; Yao, Y.; Liu, Y.; Zhang, J.; Liu, P.; Wang, R. Using deep learning to examine street view green and blue spaces and their associations with geriatric depression in Beijing, China. Environ. Int. 2019, 126, 107–117. [Google Scholar] [CrossRef]
Kang, B.; Lee, S.; Zou, S. Developing Sidewalk Inventory Data Using Street View Images. Sensors 2021, 21, 3300. [Google Scholar] [CrossRef] [PubMed]
Elhassan, M.A.; Huang, C.; Yang, C.; Munea, T.L. DSANet: Dilated spatial attention for real-time semantic segmentation in urban street scenes. Expert Syst. Appl. 2021, 183, 115090. [Google Scholar] [CrossRef]
Fernández Alcantarilla, P.; Stent, S.; Ros, G.; Arroyo, R.; Gherardi, R. Street-View Change Detection with Deconvolutional Networks. Auton. Robot. 2018, 42, 1301–1322. [Google Scholar] [CrossRef]
Lin, C.Y.; Chiu, Y.C.; Ng, H.F.; Shih, T.K.; Lin, K.H. Global-and-Local Context Network for Semantic Segmentation of Street View Images. Sensors 2020, 20, 2907. [Google Scholar] [CrossRef] [PubMed]
Nadeem, Q. Semantic Segmentation, Urban Navigation, and Research Directions. 2018. Available online: https://www.cs.princeton.edu/courses/archive/spring18/cos598B/public/projects/LiteratureReview/COS598B_spr2018_SemanticSegmentationNavigation.pdf (accessed on 3 January 2023).
Yang, Q.; Liu, Y.; Cheng, Y.; Kang, Y.; Chen, T.; Yu, H. 2019. Available online: https://www.morganclaypoolpublishers.com/catalog_Orig/samples/9781681736983_sample.pdf (accessed on 3 January 2023).
McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the AISTATS, Fort Lauderdale, FL, USA, 20–22 April 2017. [Google Scholar]
Peter, K. Advances and Open Problems in Federated Learning. 2021. Available online: http://xxx.lanl.gov/abs/1912.04977 (accessed on 3 January 2023).
Wu, C.; Wu, F.; Liu, R.; Lyu, L.; Huang, Y.; Xie, X. FedKD: Communication Efficient Federated Learning via Knowledge Distillation. 2021. Available online: http://xxx.lanl.gov/abs/2108.13323 (accessed on 3 January 2023).
Trebing, K.; Stanczyk, T.; Mehrkanoon, S. SmaAt-UNet: Precipitation nowcasting using a small attention-UNet architecture. Pattern Recognit. Lett. 2021, 145, 178–186. [Google Scholar]
Prasad, P.J.R.; Elle, O.J.; Lindseth, F.; Albregtsen, F.; Kumar, R.P. Modifying U-Net for small dataset: A simplified U-Net version for liver parenchyma segmentation. In Proceedings of the Medical Imaging 2021: Computer-Aided Diagnosis, Bellingham, WA, USA, 15–19 February 2021; pp. 396–405. [Google Scholar]
Wu, Z.; Jiang, Y.; Zhao, M.; Cui, C.; Yang, Z.; Xue, X.; Qi, H. Spirit Distillation: A Model Compression Method with Multi-domain Knowledge Transfer. In Proceedings of the International Conference on Knowledge Science, Engineering and Management, Tokyo, Japan, 14–16 August 2021; pp. 553–565. [Google Scholar]
Poudel, R.P.; Bonde, U.; Liwicki, S.; Zach, C. Contextnet: Exploring context and detail for semantic segmentation in real-time. arXiv 2018, arXiv:1805.04554. [Google Scholar]
Zhang, Z.; Lu, W.; Cao, J.; Xie, G. MKANet: An Efficient Network with Sobel Boundary Loss for Land-Cover Classification of Satellite Remote Sensing Imagery. Remote Sens. 2022, 14, 4514. [Google Scholar] [CrossRef]

Figure 1. FL model for semantic segmentation of satellite images.

Figure 2. Teacher–student federated learning model architecture.

Figure 3. Satellite image and corresponding label—Chennai land use dataset.

Figure 4. Street view with people but no vehicles.

Figure 5. FedUnet model performance. (a) CityScapes dataset; (b) Chennai land use dataset.

Figure 6. Federated UKD model performance. (a) CityScapes dataset; (b) Chennai land use dataset.

Table 1. Model performance: summary of results.

Model	Federated	Dataset	Accuracy	Parameter	Space
Model	Federated	Dataset	Accuracy	Optimization	Optimization
UNet	No	CSP	72%	1×	1×
FedUNet	Yes	CSP	78%	1×	1×
Spirit Distill	No	CSP (only 64 images)	84.6%	2.48×	-
ContextNet	No	CSP (Lower Resolution)	66.1%	20×	-
FedUKD	Yes	CSP	71%	17×	17.25×
FedUNet	Yes	CLU	95%	1×	1×
UNet	No	CLU	95%	1×	1×
MKANet	No	GID (similar to CLU)	65%	2×	-
FedUKD	Yes	CLU	97%	62×	69×

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kanagavelu, R.; Dua, K.; Garai, P.; Thomas, N.; Elias, S.; Elias, S.; Wei, Q.; Yong, L.; Rick, G.S.M. FedUKD: Federated UNet Model with Knowledge Distillation for Land Use Classification from Satellite and Street Views. Electronics 2023, 12, 896. https://doi.org/10.3390/electronics12040896

AMA Style

Kanagavelu R, Dua K, Garai P, Thomas N, Elias S, Elias S, Wei Q, Yong L, Rick GSM. FedUKD: Federated UNet Model with Knowledge Distillation for Land Use Classification from Satellite and Street Views. Electronics. 2023; 12(4):896. https://doi.org/10.3390/electronics12040896

Chicago/Turabian Style

Kanagavelu, Renuga, Kinshuk Dua, Pratik Garai, Neha Thomas, Simon Elias, Susan Elias, Qingsong Wei, Liu Yong, and Goh Siow Mong Rick. 2023. "FedUKD: Federated UNet Model with Knowledge Distillation for Land Use Classification from Satellite and Street Views" Electronics 12, no. 4: 896. https://doi.org/10.3390/electronics12040896

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FedUKD: Federated UNet Model with Knowledge Distillation for Land Use Classification from Satellite and Street Views

Abstract

1. Introduction

2. Related Work in Land Use Classification

2.1. LULC Detection from Satellite Images

2.2. Street View Image Analysis

3. Material and Methods

3.1. Federated UNet for Semantic Segmentation

3.2. FedUKD: Federated Unet with Knowledge Distillation

4. Experimental

4.1. Land Use Classification from Satellite Images

4.1.1. Proposed Chennai Land Use (CLU) Dataset

4.1.2. CLU Dataset for Federated Learning

4.2. Land Use Classification from Street View Images

4.2.1. CityScapes Street View (CSP) Dataset

4.2.2. CSP Dataset for Federated Learning

5. Results and Discussion

Comparison

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI