A Review of Deep Transfer Learning and Recent Advancements

Iman, Mohammadreza; Arabnia, Hamid Reza; Rasheed, Khaled

doi:10.3390/technologies11020040

Open AccessEditor’s ChoiceReview

A Review of Deep Transfer Learning and Recent Advancements

by

Mohammadreza Iman

^1,*

,

Hamid Reza Arabnia

¹

and

Khaled Rasheed

²

¹

School of Computing, University of Georgia, Athens, GA 30602, USA

²

Institute for Artificial Intelligence, Franklin College of Arts and Sciences, University of Georgia, Athens, GA 30602, USA

^*

Author to whom correspondence should be addressed.

Technologies 2023, 11(2), 40; https://doi.org/10.3390/technologies11020040

Submission received: 2 February 2023 / Revised: 6 March 2023 / Accepted: 13 March 2023 / Published: 14 March 2023

(This article belongs to the Special Issue Data Science and Big Data in Biology, Physical Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Deep learning has been the answer to many machine learning problems during the past two decades. However, it comes with two significant constraints: dependency on extensive labeled data and training costs. Transfer learning in deep learning, known as Deep Transfer Learning (DTL), attempts to reduce such reliance and costs by reusing obtained knowledge from a source data/task in training on a target data/task. Most applied DTL techniques are network/model-based approaches. These methods reduce the dependency of deep learning models on extensive training data and drastically decrease training costs. Moreover, the training cost reduction makes DTL viable on edge devices with limited resources. Like any new advancement, DTL methods have their own limitations, and a successful transfer depends on specific adjustments and strategies for different scenarios. This paper reviews the concept, definition, and taxonomy of deep transfer learning and well-known methods. It investigates the DTL approaches by reviewing applied DTL techniques in the past five years and a couple of experimental analyses of DTLs to discover the best practice for using DTL in different scenarios. Moreover, the limitations of DTLs (catastrophic forgetting dilemma and overly biased pre-trained models) are discussed, along with possible solutions and research trends.

Keywords:

machine learning; deep learning; transfer learning; deep transfer learning; progressive learning

1. Introduction

In recent years, Deep Learning (DL) has successfully addressed a number of challenging and interesting applications; in particular, problems that involved non-linearity of datasets. Recent advancements in deep learning methods deliver various usages and applications in extremely different areas such as image processing, natural language processing (NLP), numerical data analysis and predictions, and voice recognition. However, deep learning comes with restrictions, such as expensive training processes (time and processing) and the requirement of extensive training data (labeled data) [1].

Since the start of the Machine Learning (ML) era, transfer learning has been a neat exploration for scientists. Before the rise of deep learning models, transfer learning was known as domain adaptation and focused on homogeneous data sets and how to relate such sets to each other because of the nature of ML algorithms [2,3]. Traditional ML models have less dependency on the dataset size, and usually, their training is less costly than deep learning models since they have been mostly designed for linear problems. Therefore, the motivation for using transfer learning in deep learning is higher than ever in the AI (Artificial Intelligence) and ML fields since it can address the two restraints of extensive training data and training costs.

Recent transfer learning methods on deep learning aim to reduce the training process time and cost, and the necessity of extensive training datasets which can be hard to harvest in some areas such as medical images. Moreover, a pre-trained model for a specific job can be run on a simple edge device such as a cellphone with limited processing capacity and limited training time [4]. Moreover, developments in DTL are opening the door to more intuitive and sophisticated AI systems since it considers learning a continuous task. A great example of this idea is Google’s deep mind project and advancements such as progressive learning [5]. All this is bringing DTL to the forefront of research in artificial intelligence and machine learning.

This review aims to answer the following research questions: (i) What is DTL, and how does it differ from semi-supervised, multiview, and multitask learning? (ii) What are different transfer learning methods and their taxonomy? (iii) What are the most applied DTL methods, and how are they effective? (iv) What is the best practice of DTL model-based approaches in practice? (v) What are the limitations of DTL and possible solutions/research trends?

In this paper, first, the definition of DTL is reviewed, followed by the taxonomy of DTL. Then, the selected recent practical studies of DTL are listed, categorized, and summarized. Moreover, two experimental evaluations of DTL and their conclusions are reviewed. Last but not least, we discuss the limitations of today’s DTL techniques and possible ways to tackle them.

2. Deep Learning

Deep learning (DL) or deep neural network (DNN) is a machine learning subcategory, which can deal with nonlinear datasets. DNNs consist of layers of stacked nodes, with activation function and associated weights, (fully/partially) connected and usually trained (weight adjustments) by back-propagation and optimization algorithms. During the past two decades, DNNs were developed rapidly and are used in many aspects of our daily lives today. For instance, Convolutional Neural Network (CNN) layers have improved deep learning models for visual-related tasks since 2011, and as of today, most DLs use CNN layers [1]. For more details about machine learning and deep learning, please refer to [1], since this paper is focused on deep transfer learning, and we assume that the reader should have a thorough understanding of machine learning and deep learning.

3. Deep Transfer Learning (DTL)

Deep transfer learning is about using the obtained knowledge from another task and dataset (even one not strongly related to the source task or dataset) to reduce learning costs. In many ML problems, arranging a large amount of labeled data is impossible, which is mandatory for most DL models. For instance, at the beginning of the COVID-19 pandemic or even a year into it, providing enough chest X-ray-labeled data for training a deep learning model was still challenging, while when using deep transfer learning, the AI succeeded in detecting the disease with a very high accuracy with a limited training set [6,7]. Another application is applying machine learning on edge devices such as phones for variant tasks by taking advantage of deep transfer learning to reduce the need for processing power.

An untrained DL uses a random initializing weight for nodes, and during the expensive training process, those weights adjust to the most optimized values by applying an optimization algorithm for a specific task (dataset). Remarkably, Ref. [8] proved that initializing those weights based on a trained network with even a very distant dataset improves the training performance compared to the random initialization.

Deep transfer learning differs from semi-supervised learning since, in DTL, the source and target datasets can have a different distribution and just be related to each other, while in semi-supervised learning, the source and target data are from the same dataset, only the target set does not have the labels [2]. DTL is also not the same as Multiview learning, since Multiview learning uses two or more distinct datasets to improve the quality of one task, e.g., video datasets can be separated into image and audio datasets [2]. Last but not least, DTL differs from Multitask learning despite many shared similarities. The most fundamental difference is that in Multitask learning, the tasks use interconnections to boost each other, and knowledge transfer happens concurrently between related tasks. In contrast in DTL, the target domain is the focus, and the knowledge has already been obtained for target data from source data, and they do not need to be related or function simultaneously [2].

4. From Transfer Learning to Deep Transfer Learning, Taxonomy

It is possible to categorize Deep Transfer Learnings (DTLs) in different ways by various criteria, similar to Transfer Learnings. DTLs can be divided into two categories of homogeneous and heterogenous based on the homogeneity of source and target data [2]. However, this categorization can be conducted differently because it is subjective and relative. For example, a dataset of X-ray photos can be considered heterogeneous to a dataset of tree species photos when the comparison domain is limited to only image data. In contrast, it can be considered homogeneous to the same tree species photo dataset when the domain consists of audio and text datasets.

Moreover, DTLs can be categorized into three groups based on label-setting aspects: (i) transductive, (ii) inductive, and (iii) unsupervised [2]. Briefly, transductive is when only the source data is labeled; if both source and target data are labeled it is inductive; if none of the data are labeled, it is unsupervised deep transfer learning [2].

Refs. [2,9] mention and define another categorization of DTLs through the aspect of applied approaches. They similarly categorized DTLs into four groups of: (i) instance-based, (ii) feature-based/mapping-based, (iii) parameter-based/model-based, and (iv) relational-based/adversarial-based approaches. Instance-based transfer learning approaches are based on using the selected parts of instances (or all) in source data and applying different weighting strategies to be used with target data. Feature-based approaches map instances (or some features) from both source and target data into more homogeneous data. Further, the [2] survey divides the feature-based category into asymmetric and symmetric feature-based transfer learning subcategories. “Asymmetric approaches transform the source features to match the target ones. In contrast, symmetric approaches attempt to find a common latent feature space and then transform both the source and the target features into a new feature representation” [2]. The model-based (parameter-based) methods are about using the obtained knowledge in the model (network) with different combinations of pre-trained layers: freezing some and/or finetuning some and/or adding some fresh layers. Relational/adversarial-based approaches focus on extracting transferable features from both source and target data either using the logical relationship or rules learned in the source domain or by applying methods inspired by generative adversarial networks (GAN) [2,9]. Figure 1 shows the taxonomy of the above-mentioned categories [2].

Other than the model-based and adversarial-based approaches, all other categories have been explored deeply during the last couple of decades for different ML techniques known as domain adaptation or transfer learning [2,3]. However, most of those techniques are still applicable to deep transfer learning (DTL) as well. Model-based (parameter-based) approaches are the most applied techniques in DTL since they can tackle the domain adaptation between the source and target data by adjusting the network (model). In other words, deep transfer learning is mainly focused on model-based approaches. Remarkably, model-based approaches in deep learning models can even tackle the adaptation of a very distant source and target data [2,9].

In deep transfer learning (DTL), different techniques are applied for model-based approaches, although generally, they are combinations of pre-training, freezing, finetuning, and/or adding a fresh layer(s). A deep learning network (DL model) trained on source data is called a pre-trained model consisting of pre-trained layers. Freezing and finetuning are techniques using some or all layers of pre-trained models to train the model on target data. Freezing some layers means the parameters/weights will not change and are constant values for frozen layers from a pre-trained model. Finetuning means the parameters/weights are initialized with the pre-trained values instead of random initialization for the whole network or some selected layers. Another recent DTL technique is based on freezing a pre-trained model and adding new layers to that model for training on target data; Google’s deep mind project introduces this technique in 2016 as Progressive Learning/progressive neural networks (PNNs) [5,10].

The concept of progressive learning mimics human skill learning, which is adding a new skill on top of previously learned skills as a foundation to learn a new one. E.g., a child learns how to run after learning to crawl and walk and using all the skills obtained in the process. Similarly, PNNs prevent catastrophic forgetting in DTL versus finetuning techniques by freezing the whole pre-trained model and learning (adjusting to) the new task by training the newly added layers on top of the previously trained layers [5,10].

In deep learning models, usually, the earlier layers conduct the feature extraction at a high level of detail, further layers towards the end extract the information and conceptualize the given data, and lateral layers conduct the classifications or predictions. For instance, in the image-related model, the earlier layers of CNN extract the edges, corners, and tiny patches of a given image. Further layers put those details together to detect objects or faces, and the lateral layers, usually fully connected layers, conduct the classification [11]. Given this process, the most effective and efficient approach for DTL, to our knowledge, is to freeze the earlier and middle layers from a related pre-trained model and finetune the lateral layers for the new task/dataset [12]. Similarly, the new layers are added to the last part of a pre-trained model in progressive learning.

Nonetheless, some other research in this area use combinational and sophisticated methods to tackle transfer learning in deep learning such as ensembled networks, weighting strategies, etc. [2]. However, to our knowledge, the search for recent advancements in DTL for practical tasks ends up with methods based on mostly the model-based and limited number of adversarial-based approaches.

5. Review of Recent Advancements in DTL

We limited our selection to the last five years of published studies on deep transfer learning for various tasks and data types. Table 1 shows the list of selected works from hundreds of reviewed literature sorted by their DTL approaches. We used the systematic literature review (SLR) technique [13] for the process of finding and selecting these thirty-eight publications. The inclusion criteria that we used for our selection process are as follows: (a) published in the past five years, (b) reproducible (detailed implementation and models), (c) applied to practical ML problems, and (d) generalizable. We found that all reviewed studies mostly fall into three categories of model-based approaches and some into the adversarial-based approach, which are explained in the previous section. We name these approaches as (i) Finetuning: finetuning a pre-trained model on target data; (ii) freezing CNN layers: the earlier CNN layers are frozen, and only the lateral fully connected layers are finetuned; (iii) progressive learning: some or all layers of a pre-trained model are selected and used frozen, and some fresh layers will be added to the model to be trained on target data; and (iv) adversarial-based: extracting transferable features from both source and target data using adversarial or relational methods, as shown in Figure 2.

The most common DTL method is using a trained model on a highly related dataset to target data and finetune it on target data (finetuning). The simplicity of applying this technique makes it the most popular DTL method in our selection; 21 of 38 selected works have used this method. This method can improve training on target data in various ways, such as reducing training costs and tackling the need for an extensive target dataset. However, it is still prone to catastrophic forgetting. Needless to say, it is a very effective DTL method for many tasks and datasets in various fields such as medical, mechanics, art, physics, security, etc. Moreover, it has been applied for both image datasets and tabular (numerical) datasets as listed in Table 1.

The second popular approach in DTL is freezing CNN layers in a pre-trained model and finetuning only lateral fully connected layers (Freezing CNN layers). CNN layers extract features from the given dataset, and the fully connected layers are responsible for classification, which in this method will be finetuned to the new task for target data.

Refs. [33,34,35,36,37,38,39,40,41,42] are the sample research publications, which have used this method for different data types such as image and tabular data as listed in Table 1. This technique is specific to the models consisting of CNN layers; however, it can be extended to other deep learning models by assuming that the earlier and middle layers are acting similarly to CNN layers for feature extraction.

Using well-known models such as VGG-Net, Alex-Net, and Res-Net, which have already been trained on ImageNet datasets [50], is a general approach for both of the techniques mentioned above, since they are easily accessible, and they are pre-trained to the highest possible accuracy. It is worth mentioning that such training can take days of processing time even with clusters of GPUs/TPUs, and the mentioned methods are skipping the pre-training step by simply downloading a publicly available pre-trained model. Refs. [43,44,45,46] are based on the progressive learning method, also known as progressive neural networks (PNNs), described earlier. Ref. [44] evaluates the progressive learning effectiveness for common natural language processing (NLP) tasks: sequence labeling and text classification. Through the evaluation and comparison of applying PNNs to various models, datasets, and tasks, they show how PNNs improve DL models’ accuracy by avoiding the catastrophic forgetting in finetuning techniques. Refs. [43,45,46] use PNNs for image and audio datasets and similarly finds tangible improvements in comparison to other DTL techniques. Refs. [47,48] are examples of adversarial-based approaches that we found in the literature. In [47], they used conditional generative adversarial networks (CGAN) to expand the limited target data of chest X-ray images for detecting the COVID-19 DTL model. Ref. [48] applies the domain adversarial training to obtain the shared features between multiple source datasets.

Moreover, we found some tailored DTL methods for specific tasks and datasets such as [49]. The proposed method in [49], as they describe it, is based on the “three-layer sparse auto-encoder to extract the features of raw data, and applies the maximum mean discrepancy term to minimizing the discrepancy penalty between the features from training data and target data.” They tailor that method for smart industry fault diagnosis problems and achieve a 99.82% accuracy which is better than other approaches such as the deep belief network, sparse filter, deep learning, and support vector machine. Such tailored DTL approaches are not usually easy to generalize for different tasks or datasets. Nonetheless, they can open the door to interesting and new techniques in deep transfer learning’s future.

6. Experimental Analyzations of Deep Transfer Learning

In this section, we review two remarkable experimental evaluations of DTL techniques. The tests’ setup, analysis, and conclusions are noteworthy for applying DTL techniques in different scenarios.

“What is being transferred in transfer learning?” [51] is a recent experimental study which uses a series of tests on visual domain and deep learning models and tries to investigate what makes a successful transfer and which part of the network is responsible for that. To do so, they analyze networks in four different cases: (i) pre-trained network, (ii) random initialized network, (iii) finetuned network on target domain after pretraining on source domain, and (iv) trained network from random initialization [51]. Moreover, to characterize the role of the feature reuse, they use a source (pre-train) domain containing natural images (IMAGENET), and a few target (downstream) domains with decreasing visual similarities from natural images: DOMAINNET real, DOMAINNET clipart, CHEXPERT (medical chest X-rays) and DOMAINNET quickdraw [51].

The study demonstrates that feature reuse plays a key role in deep transfer learning as a pre-trained model on IMAGENET, and shows the largest performance improvement on the real domain, which shares similar visual features (natural images) with IMAGENET in comparison to randomly initialized models. Moreover, they run a series of experiments by shuffling the image blocks (different block sizes). These experiments prove that feature reuse plays a very important role in transfer learning, particularly when the target domain shares visual features with the source domain. However, they realize that feature reuse is not the only reason for deep transfer learning success, since even for distant targets such as CHEXPERT and quickdraw, they still observe performance boosts from deep transfer learning. Additionally, in all cases pre-trained models converge way faster than random initialized models [51].

Further, they manually analyze common and uncommon mistakes in the training of randomly initialized versus pre-trained models. They observe that data samples marked incorrect in the pre-trained model and correct in the randomly initialized model are mostly ambiguous samples. On the other hand, the majority of the samples that a pre-trained model marked correct and a randomly initialized model marked incorrect are straightforward samples. This means that a pre-trained model has a stronger prior, and it is harder to adapt to the target domain. Moreover, using the centered kernel alignment to measure feature similarities, they conclude that the initialization point drastically impacts feature similarity, and two networks with a high accuracy can have a different feature space. Moreover, they discover similar results for distance in the parameter space, in which two random-initialized models are farther from each other compared to two pre-trained models [51].

In regards to performance barriers and basins in the loss landscape, they have concluded that the network stays in the same basin of the solution when finetuning a pre-trained network. They reached this conclusion by training pre-trained models from two random runs as well as training random initialized models twice and comparing. Even when training a random initialized model two times with the same random values, the models end up in different basins [51].

Module criticality is an interesting analysis of deep learning models. Usually, in a deep CNN model, each layer of CNN considers a module, while in some models a component of the network can be considered as a module. To measure the criticality of a module, it is possible to take a trained model and re-initialize each module at once and compare the amount of model accuracy drop. Adopting this technique, the authors of [51] discovered: (i) fully connected layers (near to the model output) become critical for the P-T model, and the (ii) module criticality increases moving from the input side of the model towards the output, which is consistent with the concept of earlier layers (near the input), extracting more general features, while lateral layers have features that are more specialized for the target domain. Ref. [52] is another experimental analysis of transfer learning in visual tasks with the title of “Factors of Influence for Transfer Learning across Diverse Appearance Domains and Task Types”. Three factors of influence are investigated in this study: (i) image domain, the difference in image domain between source and target tasks, (ii) task type, the difference in task type, and (iii) dataset size, the size of the source, and target training sets. They perform over 1200 transfer learning experiments on 20 datasets spanning seven diverse image domains (consumer, driving, aerial, underwater, indoor, synthetic, and closeups) and four task types (semantic segmentation, object detection, depth estimation, and keypoint detection) [52].

They use data normalization (e.g., Illumination normalization) and augmentation techniques to improve the models’ accuracy. They adopt recent high-resolution backbone HRNetV2, which consists of 69M parameters. This backbone is easily adjustable for different datasets by simply replacing the head of the backbone. To make a fair comparison, they pre-trained (to be used for transfer learning) their models from scratch and evaluated their performance using a top-1 accuracy on the ILSVRC’12 validation set [52].

The transfer learning experiments are mainly divided into two settings of the (i) transfer learning with a small target training set and (ii) with the full target set. The evaluation of transfer learning models is based on the gain obtained from finetuning from a specific source model compared to finetuning from ILSVRC’12 image classification with the main question of “are additional gains possible, by picking a good source?”. Furthermore, they added a series of experiments for multi-source training to investigate the impact of using multi-source training for a specific task [52].

Such an exhaustive experimental analysis resulted in following observations: (i) all experiments proved that transfer learning outperforms training from scratch (random initialization); (ii) for 85% of target tasks there exists a source task which tops ILSVCR’12 pre-training; (iii) the most transfer gain happens when the source and target tasks are in the same image domain (within-domain), which is even more important than the source size; (iv) positive transfer gain is possible when the source image domain includes the target domain; (v) although multisource models bring good transfer, they are outperformed by the largest within-domain source; (vi) “for 65% of the targets within the same image domain as the source, cross-task-type transfer results in positive transfer gains”; (vii) as naturally expected, the larger datasets positively transfer towards the smaller datasets; (viii) transfer effects are stronger for a small target training set, which helps the process of choosing the transfer learning model by testing several models with a small section of target data [52].

7. Discussion

The Deep Transfer Learning (DTL) research field is thriving because of the motivation to handle the limitations of Deep Learning (DL) models, which are the dependency on extensive labeled data and training costs. The main idea is to use the obtained knowledge from source data in the training process on target data. Another possible impactful outcome of the DTL research line is to achieve continual learning, which brings the Artificial General Intelligence [1] a step closer to reality. Continual learning can be achieved simply through a chain of transfer learning processes, while the end model is still valid on all previous training sources.

As we reviewed in previous sections, model-based approaches are the most commonly used approaches in DTL, since deep learning models have the capacity to be adjusted to transfer knowledge. However, there are two main constraints in such approaches—the catastrophic forgetting dilemma and an overly biased pre-trained model.

In the case of finetuning a pre-trained model, there is a high chance of the drastic changes of weights through the whole model resulting in the catastrophic forgetting dilemma. Therefore, the obtained knowledge could be partially or even completely wiped out, resulting in unsuccessful training and no possibility of continual learning. This constraint limits the success of the finetuning approach to the tightly related source and target data. Moreover, a very well-known technique to reduce the forgetting effect is to add a limited number of source samples to the target training data.

Freezing the pre-trained CNN layers technique tries to tackle the catastrophic forgetting by freezing the obtained knowledge on earlier layers and finetuning the fully-connected lateral layers to achieve transfer learning for target data. Given the fact that earlier layers in DL models extract detailed features and move towards the output, more abstract knowledge is extracted [11]; freezing the earlier layers limits the ability of the model to learn any new features from target data, which is known as an overly biased pre-trained model. Having extensive source data or access to a pre-trained model on a large dataset is critical for a successful transfer using this technique. In this way, there is a high chance that the pre-trained model has already learned any possible detailed features, and simply by finetuning, the lateral layers can perform on target data. However, even tackling the first obstacle, this solution is still imperiled by the catastrophic forgetting in lateral layers. This technique is still successful in the case of the related source and target data and tasks despite the limitations mentioned above.

Progressive learning tries to find a middle ground between catastrophic forgetting and a biased model by adding a new layer(s) to the end of a frozen pre-trained model. This technique is successful in the case of a task transfer for the related source and target data. It cannot deal with the distant source and target data since the earlier layers are frozen and cannot learn new features; however, the new lateral layer helps the model adjust to a new task.

A possible solution to address both catastrophic forgetting and an overly biased pre-trained model in DTL is to increase the learning capacity of a pre-trained model by vertically expanding it. In another research paper, we propose expanding the model vertically in training on target data, adding new nodes on frozen pre-trained layers throughout the model instead of adding a new layer(s) to the end of the model [53]. The vertical expansion increases the model learning capacity while keeping the previously obtained knowledge intact. Therefore, not only do we achieve successful transfer learning, our final model is still valid on the source data, opening the door to deep continual learning [53].

8. Conclusions

This paper reviews the taxonomy of deep transfer learning (DTL) and the definitions of different approaches. Moreover, we review, list, categorize and analyze over thirty recently applied DTL research studies. Then, we investigate the methodology and limitations of the three most common model-based deep transfer learning methods: (i) Finetuning, (ii) Freezing CNN Layers, and (iii) Progressive Learning. These techniques have proven their ability and effectiveness for various machine learning problems. The simplicity of finetuning publicly available pre-trained models on extensive datasets is the reason for it being the most common transfer learning technique. Moreover, two thorough experimental studies in DTL are summarized; their discoveries clarify the details of a successful deep transfer learning approach for different scenarios. Last but not least, the limitations of current DTLs, catastrophic forgetting dilemma, and overly biased pre-trained models are discussed, along with possible solutions.

Author Contributions

Writing and original draft preparation, M.I.; supervision, review, and editing, H.R.A. and K.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Iman, M.; Arabnia, H.R.; Branchinst, R.M. Pathways to Artificial General Intelligence: A Brief Overview of Developments and Ethical Issues via Artificial Intelligence, Machine Learning, Deep Learning, and Data Science; Springer: Cham, Switzerland, 2021; pp. 73–87. [Google Scholar]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Farahani, A.; Voghoei, S.; Rasheed, K.; Arabnia, H.R. A brief review of domain adaptation. In Advances in Data Science and Information Engineering; Springer: Cham, Switzerland, 2021; pp. 877–894. [Google Scholar]
Voghoei, S.; Tonekaboni, N.H.; Wallace, J.G.; Arabnia, H.R. Deep learning at the edge. In Proceedings of the 2018 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 12–14 December 2018; pp. 895–901. [Google Scholar]
Chang, H.S.; Fu, M.C.; Hu, J.; Marcus, S.I. Google Deep Mind’s AlphaGo. Or/Ms Today 2016, 43, 24–29. [Google Scholar]
Das, N.N.; Kumar, N.; Kaur, M.; Kumar, V.; Singh, D. Automated Deep Transfer Learning-Based Approach for Detection of COVID-19 Infection in Chest X-rays. Irbm 2022, 43, 114–119. [Google Scholar]
Jaiswal, A.; Gianchandani, N.; Singh, D.; Kumar, V.; Kaur, M. Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning. J. Biomol. Struct. Dyn. 2020, 39, 5682–5689. [Google Scholar] [CrossRef] [PubMed]
Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst. 2014, 4, 3320–3328. [Google Scholar]
Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A survey on deep transfer learning. Lect. Notes Comput. Sci. (Incl. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinform.) 2018, 11141, 270–279. [Google Scholar]
Rusu, A.A.; Rabinowitz, N.C.; Desjardins, G.; Soyer, H.; Kirkpatrick, J.; Kavukcuoglu, K.; Pascanu, R.; Hadsell, R. Progressive neural networks. arXiv 2016, arXiv:1606.04671. [Google Scholar]
Yosinski, J.; Clune, J.; Nguyen, A.; Fuchs, T.; Lipson, H. Understanding neural networks through deep visualization. arXiv 2015, arXiv:1506.06579. [Google Scholar]
Hariharan, R.; Sudhakar, P.; Venkataramani, R.; Thiruvenkadam, S.; Annangi, P.; Babu, N.; Vaidya, V. Understanding the mechanisms of deep transfer learning for medical images. In Deep Learning and Data Labeling for Medical Applications; Springer: Cham, Switzerland, 2016; pp. 188–196. [Google Scholar]
Kitchenham, B.; Pearlbrereton, O.; Budgen, D.; Turner, M.; Bailey, J.; Linkman, S. Systematic literature reviews in software engineering—A systematic literature review. Inf. Softw. Technol. 2009, 51, 7–15. [Google Scholar] [CrossRef]
Wan, L.; Liu, R.; Sun, L.; Nie, H.; Wang, X. UAV swarm based radar signal sorting via multi-source data fusion: A deep transfer learning framework. Inf. Fusion 2022, 78, 90–101. [Google Scholar] [CrossRef]
Albayrak, A. Classification of analyzable metaphase images using transfer learning and fine tuning. Med. Biol. Eng. Comput. 2022, 60, 239–248. [Google Scholar] [CrossRef] [PubMed]
Kumar, S. MCFT-CNN: Malware classification with fine-tune convolution neural networks using traditional and transfer learning in internet of things. Future Gener. Comput. Syst. 2021, 125, 334–351. [Google Scholar]
Wang, Y.; Feng, Z.; Song, L.; Liu, X.; Liu, S. Multiclassification of endoscopic colonoscopy images based on deep transfer learning. Comput. Math. Methods Med. 2021, 2021, 2485934. [Google Scholar] [CrossRef] [PubMed]
Akh, M.A.H.; Roy, S.; Siddique, N.; Kamal, M.A.S.; Shimamura, T. Facial Emotion Recognition Using Transfer Learning in the Deep CNN. Electronics 2021, 10, 1036. [Google Scholar]
Dipendra, J.; Choudhary, K.; Tavazza, F.; Liao, W.; Choudhary, A.; Campbell, C.; Agrawal, A. Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning. Nat. Commun. 2019, 10, 1–12. [Google Scholar]
Talo, M.; Baloglu, U.B.; Yıldırım, Ö.; Acharya, U.R. Application of deep transfer learning for automated brain abnormality classification using MR images. Cogn. Syst. Res. 2019, 54, 176–188. [Google Scholar] [CrossRef]
Wu, Z.; Jiang, H.; Zhao, K.; Li, X. An adaptive deep transfer learning method for bearing fault diagnosis. Measurement 2020, 151, 107227. [Google Scholar] [CrossRef]
Mao, W.; Ding, L.; Tian, S.; Liang, X. Online detection for bearing incipient fault based on deep transfer learning. Meas. J. Int. Meas. Confed. 2020, 152, 107278. [Google Scholar] [CrossRef]
Huy, P.; Chén, O.Y.; Koch, P.; Lu, Z.; McLoughlin, I.; Mertins, A.; Vos, M.D. Towards more accurate automatic sleep staging via deep transfer learning. IEEE Trans. Biomed. Eng. 2020, 68, 1787–1798. [Google Scholar]
Perera, P.; Patel, V.M. Deep Transfer Learning for Multiple Class Novelty Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11544–11552. [Google Scholar]
Xu, Y.; Sun, Y.; Liu, X.; Zheng, Y. A Digital-Twin-Assisted Fault Diagnosis Using Deep Transfer Learning. IEEE Access 2019, 7, 19990–19999. [Google Scholar] [CrossRef]
Han, K.; Vedaldi, A.; Zisserman, A. Learning to Discover Novel Visual Categories via Deep Transfer Clustering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8401–8409. [Google Scholar]
Geng, M.; Wang, Y.; Xiang, T.; Tian, Y. Deep Transfer Learning for Person Re-identification. arXiv 2016, arXiv:1611.05244. [Google Scholar]
Sabatelli, M.; Kestemont, M.; Daelemans, W.; Geurts, P. Deep Transfer Learning for Art Classification Problems. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
George, D.; Shen, H.; Huerta, E.A. Deep transfer learning: A new deep learning glitch classification method for advanced ligo. arXiv 2017, arXiv:1706.07446. [Google Scholar]
Ding, R.; Li, X.; Nie, L.; Li, J.; Si, X.; Chu, D.; Liu, G.; Zhan, D. Empirical study and improvement on deep transfer learning for human activity recognition. Sensors 2019, 19, 57. [Google Scholar] [CrossRef] [Green Version]
Zeng, M.; Li, M.; Fei, Z.; Yu, Y.; Pan, Y.; Wang, J. Automatic ICD-9 coding via deep transfer learning. Neurocomputing 2019, 324, 43–50. [Google Scholar] [CrossRef]
Kaya, H.; Gürpınar, F.; Salah, A.A. Video-Based emotion recognition in the wild using deep transfer learning and score fusion. Image Vis. Comput. 2017, 65, 66–75. [Google Scholar] [CrossRef]
Ay, B.; Tasar, B.; Utlu, Z.; Ay, K.; Aydin, G. Deep transfer learning-based visual classification of pressure injuries stages. Neural Comput. Appl. 2022, 4, 16157–16168. [Google Scholar] [CrossRef]
Li, P.; Cui, H.; Khan, A.; Raza, U.; Piechocki, R.; Doufexi, A.; Farnham, T.M. Deep transfer learning for WiFi localization. In Proceedings of the 2021 IEEE Radar Conference (RadarConf21), Atlanta, GA, USA, 8–14 May 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar]
Celik, Y.; Talo, M.; Yildirim, O.; Karabatak, M.; Acharya, U.R. Automated invasive ductal carcinoma detection based using deep transfer learning with whole-slide images. Pattern Recognit. Lett. 2020, 133, 232–239. [Google Scholar] [CrossRef]
Liu, C.; Wei, Z.; Ng, D.W.K.; Yuan, J.; Liang, Y.C. Deep Transfer Learning for Signal Detection in Ambient Backscatter Communications. IEEE Trans. Wirel. Commun. 2020, 20, 1624–1638. [Google Scholar] [CrossRef]
Deepak, S.; Ameer, P.M. Brain tumor classification using deep CNN features via transfer learning. Comput. Biol. Med. 2019, 111, 103345. [Google Scholar] [CrossRef] [PubMed]
Mormont, R.; Geurts, P.; Marée, R. Comparison of Deep Transfer Learning Strategies for Digital Pathology. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2262–2271. [Google Scholar]
Zhi, Y.; Yu, W.; Liang, P.; Guo, H.; Xia, L.; Zhang, F.; Ma, Y.; Ma, J. Deep transfer learning for military object recognition under small training set condition. Neural Comput. Appl. 2019, 31, 6469–6478. [Google Scholar]
Gao, Y.; Mosalam, K.M. Deep Transfer Learning for Image-Based Structural Damage Recognition. Comput. Civ. Infrastruct. Eng. 2018, 33, 748–768. [Google Scholar] [CrossRef]
Yu, Y.; Lin, H.; Meng, J.; Wei, X.; Guo, H.; Zhao, Z. Deep Transfer Learning for Modality Classification of Medical Images. Information 2017, 8, 91. [Google Scholar] [CrossRef] [Green Version]
Wang, S.; Li, Z.; Yu, Y.; Xu, J. Folding Membrane Proteins by Deep Transfer Learning. Cell Syst. 2017, 5, 202–211.e3. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Joshi, D.; Mishra, V.; Srivastav, H.; Goel, D. Progressive Transfer Learning Approach for Identifying the Leaf Type by Optimizing Network Parameters. Neural Process. Lett. 2021, 53, 3653–3676. [Google Scholar] [CrossRef]
Abdul, M.; Hagerer, G.; Dugar, S.; Gupta, S.; Ghosh, M.; Danner, H.; Mitevski, O.; Nawroth, A.; Groh, G. An evaluation of progressive neural networksfor transfer learning in natural language processing. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; pp. 1376–1381. [Google Scholar]
Gu, Y.; Ge, Z.; Bonnington, C.P.; Zhou, J. Progressive Transfer Learning and Adversarial Domain Adaptation for Cross-Domain Skin Disease Classification. IEEE Biomed. Health Informa. 2020, 24, 1379–1393. [Google Scholar] [CrossRef] [PubMed]
Gideon, J.; Khorram, S.; Aldeneh, Z.; Dimitriadis, D.; Provost, E.M. Progressive Neural Networks for Transfer Learning in Emotion Recognition. Proc. Annu. Conf. Int. Speech Commun. Assoc. Interspeech 2017, 2017, 1098–1102. [Google Scholar]
Loey, M.; Manogaran, G.; Khalifa, N.E.M. A deep transfer learning model with classical data augmentation and CGAN to detect COVID-19 from chest CT radiography digital images. Neural Comput. Appl. 2020, 1–13. [Google Scholar] [CrossRef]
Li, X.; Zhang, W.; Ding, Q.; Li, X. Diagnosing Rotating Machines with Weakly Supervised Data Using Deep Transfer Learning. IEEE Trans. Ind. Inform. 2020, 16, 1688–1697. [Google Scholar] [CrossRef]
Wen, L.; Gao, L.; Li, X. A new deep transfer learning based on sparse auto-encoder for fault diagnosis. IEEE Trans. Syst. Man, Cybern. Syst. 2019, 49, 136–144. [Google Scholar] [CrossRef]
Simon, M.; Rodner, E.; Denzler, J. ImageNet Pre-Trained Models with Batch Normalization. arXiv 2016, arXiv:1612.01452. [Google Scholar]
Neyshabur, B.; Sedghi, H.; Zhang, C. What is being transferred in transfer learning? Adv. Neural Inf. Process. Syst. 2020, 33, 512–523. [Google Scholar]
Mensink, T.; Uijlings, J.; Kuznetsova, A.; Gygli, M.; Ferrari, V. Factors of Influence for Transfer Learning across Diverse Appearance Domains and Task Types. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 9298–9314. [Google Scholar] [CrossRef] [PubMed]
Iman, M.; Miller, J.A.; Rasheed, K.; Branch, R.M.; Arabnia, H.R. EXPANSE: A Continual and Progressive Learning System for Deep Transfer Learning. In Proceedings of the 2022 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 14–16 December 2022; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]

Figure 1. Taxonomy of Transfer Learning which is extendable to Deep Transfer Learning as well.

Figure 2. Most common Deep Transfer Learning approaches.

Table 1. List of selected recent deep transfer learning (DTL) publications.

Ref.	Year	Title	Data Type	Time Series	Approach	CNN	Known Models Used	Dataset Field
[14]	2022	UAV swarm-based radar signal sorting via multi-source data fusion: A deep transfer learning framework	Image	No	Finetuning	Yes	Yolo, Faster-RCNN, and Cascade-RCNN	Radar image
[15]	2022	Classification of analyzable metaphase images using transfer learning and fine tuning	Image	No	Finetuning	Yes	VGG16, Inception V3	Medical image
[16]	2021	Multiclassification of Endoscopic Colonoscopy Images Based on Deep Transfer Learning	Image	No	Finetuning	Yes	AlexNet, VGG, and Res-Net	Medical Image
[17]	2021	MCFT-CNN: Malware classification with fine-tune convolution neural networks using traditional and transfer learning in Internet of Things	Image	No	Finetuning	Yes	Res-Net50	Malware classification
[18]	2021	Facial Emotion Recognition Using Transfer Learning in the Deep CNN	Image	No	Finetuning	Yes	VGGs, Res-Nets, Inception-v3, DenseNet-161	Facial emotion recognition (FER)
[6]	2020	Automated Deep Transfer Learning-Based Approach for Detection of COVID-19 Infection in Chest X-rays	Image	No	Finetuning	Yes	Inception-Xception	Medical image
[7]	2020	Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning	Image	No	Finetuning	Yes	ImageNet, Dense-Net	Medical image
[19]	2019	Enhancing materials property prediction by leveraging computational and experimental data using deep transfer learning	Tabular/ bigdata	Yes	Finetuning	No	None	Quantum mechanics
[20]	2019	Application of deep transfer learning for automated brain abnormality classification using MR images	Image	No	Finetuning	Yes	Res-Net	Medical image
[21]	2019	An adaptive deep transfer learning method for bearing fault diagnosis	Tabular/ bigdata	Yes	Finetuning	No	LSTM RNN	Mechanic
[22]	2019	Online detection for bearing incipient fault based on deep transfer learning	Image	Yes	Finetuning	Yes	VGG-16	Mechanic
[23]	2019	Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning	Tabular/ bigdata	Yes	Finetuning	Yes	None	Medical data
[24]	2019	Deep Transfer Learning for Multiple Class Novelty Detection	Image	No	Finetuning	Yes	Alex-Net, VGG-Net	Vision
[25]	2019	A Digital-Twin-Assisted Fault Diagnosis Using Deep Transfer Learning	Tabular/ bigdata	No	Finetuning	No	None	Mechanic
[26]	2019	Learning to Discover Novel Visual Categories via Deep Transfer Clustering	Image	No	Finetuning	Yes	None	Vision
[27]	2018	Deep Transfer Learning for Person Re-identification	Image	No	Finetuning	Yes	None	Identification/ security
[28]	2018	Deep Transfer Learning for Art Classification Problems	Image	No	Finetuning	Yes	None	Art
[29]	2018	Classification and unsupervised clustering of LIGO data with Deep Transfer Learning	Image	No	Finetuning	Yes	None	Physics/ Astrophysics
[30]	2018	Empirical Study and Improvement on Deep Transfer Learning for Human Activity Recognition	Tabular/ bigdata	Yes	Finetuning	Yes	None	Human Activity Recognition
[31]	2018	Automatic ICD-9 coding via deep transfer learning	Tabular/ bigdata	No	Finetuning	Yes	None	Medical
[32]	2017	Video-based emotion recognition in the wild using deep transfer learning and score fusion	Video (audio and visual)	Yes	Finetuning	Yes	VGG-Face	Human science/ psychology
[33]	2022	Deep transfer learning-based visual classification of pressure injuries stages	Image	No	Freezing CNN layers	Yes	Dense-Net 121, Inception V3, MobilNet V2, Res-Nets, VGG16	Medical image
[34]	2021	Deep Transfer Learning for WiFi Localization	Tabular/ bigdata	No	Freezing CNN layers	Yes	None	WiFi Localization
[35]	2020	Automated invasive ductal carcinoma detection based using deep transfer learning with whole-slide images	Image	No	Freezing CNN layers	Yes	Res-Net, Dense-Net	Medical image
[36]	2019	Deep Transfer Learning for Signal Detection in Ambient Backscatter Communications	Tabular/ bigdata	No	Freezing CNN layers	Yes	None	Tele-communication
[37]	2019	Brain tumor classification using deep CNN features via transfer learning	Image	No	Freezing CNN layers	Yes	Google-Net	Medical image
[38]	2018	Comparison of Deep Transfer Learning Strategies for Digital Pathology	Image	No	Freezing CNN layers	Yes	None	Medical image
[39]	2018	Deep transfer learning for military object recognition under small training set condition	Image	No	Freezing CNN layers	Yes	None	Military
[40]	2018	Deep Transfer Learning for Image-Based Structural Damage Recognition	Image	No	Freezing CNN layers	Yes	VGG-Net	Civil engineering
[41]	2017	Deep Transfer Learning for Modality Classification of Medical Images	Image	No	Freezing CNN layers	Yes	VGG-Net, Res-Net	Medical image
[42]	2017	Folding Membrane Proteins by Deep Transfer Learning	Tabular/ bigdata	No	Freezing CNN layers	Yes	Res-Net	Chemistry
[43]	2021	Progressive Transfer Learning Approach for Identifying the Leaf Type by Optimizing Network Parameters	Image	No	Progressive learning	Yes	Res-Net50	Plant science
[44]	2020	An Evaluation of Progressive Neural Networks for Transfer Learning in Natural Language Processing	NLP/text	No	Progressive learning	No	None	NLP
[45]	2020	Progressive Transfer Learning and Adversarial Domain Adaptation for Cross-Domain Skin Disease Classification	Image	No	Progressive learning	Yes	None	Medical image
[46]	2017	Progressive Neural Networks for Transfer Learning in Emotion Recognition	Image and audio	Yes	Progressive learning	No	None	Para-linguistic
[47]	2020	A deep transfer learning model with classical data augmentation and CGAN to detect COVID-19 from chest CT radiography digital images	Image	No	Adversarial-based	Yes	Alex-Net, VGG-Net16, VGG-Net19, Google-Net, Res-Net50	Medical image
[48]	2019	Diagnosing Rotating Machines with Weakly Supervised Data Using Deep Transfer Learning	Tabular/ bigdata	Yes	Adversarial-based	Yes	None	Mechanic
[49]	2017	A New Deep Transfer Learning Based on Sparse Auto-Encoder for Fault Diagnosis	Tabular/ bigdata	Yes	Sparse Auto-Encoder	No	None	Mechanic

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Iman, M.; Arabnia, H.R.; Rasheed, K. A Review of Deep Transfer Learning and Recent Advancements. Technologies 2023, 11, 40. https://doi.org/10.3390/technologies11020040

AMA Style

Iman M, Arabnia HR, Rasheed K. A Review of Deep Transfer Learning and Recent Advancements. Technologies. 2023; 11(2):40. https://doi.org/10.3390/technologies11020040

Chicago/Turabian Style

Iman, Mohammadreza, Hamid Reza Arabnia, and Khaled Rasheed. 2023. "A Review of Deep Transfer Learning and Recent Advancements" Technologies 11, no. 2: 40. https://doi.org/10.3390/technologies11020040

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review of Deep Transfer Learning and Recent Advancements

Abstract

1. Introduction

2. Deep Learning

3. Deep Transfer Learning (DTL)

4. From Transfer Learning to Deep Transfer Learning, Taxonomy

5. Review of Recent Advancements in DTL

6. Experimental Analyzations of Deep Transfer Learning

7. Discussion

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI