Next Article in Journal
A Morphological Post-Processing Approach for Overlapped Segmentation of Bacterial Cell Images
Previous Article in Journal
Semantic Interactive Learning for Text Classification: A Constructive Approach for Contextual Interactions
 
 
Article
Peer-Review Record

Evidence-Based Regularization for Neural Networks

Mach. Learn. Knowl. Extr. 2022, 4(4), 1011-1023; https://doi.org/10.3390/make4040051
by Giuseppe Nuti 1,*,†, Andreea-Ingrid Cross 2,† and Philipp Rindler 3,†
Reviewer 1:
Reviewer 2:
Mach. Learn. Knowl. Extr. 2022, 4(4), 1011-1023; https://doi.org/10.3390/make4040051
Submission received: 14 September 2022 / Revised: 26 October 2022 / Accepted: 2 November 2022 / Published: 15 November 2022
(This article belongs to the Section Network)

Round 1

Reviewer 1 Report

Overall, the paper is quite well written and it flows nicely.

As the authors themselves admit, the proposed regularization method is simple. There is nothing wrong with that per se, but - to be fair - the observed performance improvements are very modest. While the use of benchmark problems such as Fashion-MNIST may be okay to test the viability of a learning paradigm, it is difficult to make a compelling case for newly proposed method based on such problems. In the case of a method that aims and claims to improve on existing regularizations, it would have been far more interesting to compare the performance on tough problems (e.g. international competition data sets). The authors mention financial applications; now that would be a far more interesting and challenging problem on which to demonstrate the superiority of this proposed regularization method. I would like to see the method applied to tougher problems.

A minor technical point: the proposed methods aims to more evenly 'distribute' the neuron activations. What implications does that have for other regularization methods, e.g. weight decay of weight elimination? Is the objective of the latter that different, particularly when one consider a continuous cycle of weight pruning and retraining?

In closing, I would be happy for this work to be published here, but I would like for its story to be more compelling.

Author Response

Thank you for the comment and feedback for our paper.

We added additional results for CIFAR10 (with clean labels as well as noisy labels) to demonstrate the validity of the proposed evidence-based regularizer for more challenging data.

Any particular regularizer in a neural network will invariably interact with many other choices of the architecture, the optimization algorithm, the data properties, and other aspects. It is impossible to investigate the possible interactions with all these elements. We do not mean to argue that a particular regularizer is superior to any other – as there are various reasons for the intuition, purpose, and inductive biases that different regularization strategies imply.

The results we present in the paper show that there is a reduction in overfitting in several settings and provide empirical evidence for the intuition of what the evidence-based regularizer is meant to achieve. We therefore consider that a regularization to push nodes towards the evidence of the data has merit next to bulk shrinking of weights (L1/L2) or turning random nodes off (dropout).

Reviewer 2 Report

This is an interesting approach towards a very critical issue and is worthy of being published.  I have a few minor comments below.  My major issue is that the results are based on just one dataset Fashion-MNIST.  The authors should expand the results to include some other datasets in order to get a better feel for the general applicability of this approach.  I would suggest this be done and resubmitted, then I would be supportive of publishing.

One minor issue is for the authors to discuss the fact that sometimes it is beneficial for a neural network to ignore some hidden nodes, especially in shallow networks where more than enough hidden nodes are supplied. EBR might not allow that, and discussion of whether this is an issue or not would be nice.

Following are some minor fixes with line number:

·      8 MNIST not MINST

·      106 10 threshold is awkwardly worded

·      111 distorting

·      149 initialized

 

 

Author Response

Thank you for the comment and feedback for our paper.

We added additional results for CIFAR10 (with clean labels as well as noisy labels) to demonstrate the validity of the proposed evidence-based regularizer for more challenging data. We agree that additional data can provide additional comfort. However, any particular regularizer in a neural network will invariably interact with many other choices of the architecture, the optimization algorithm, the data properties, and other aspects. It is impossible to investigate the possible interactions with all these elements.

The results we present in the paper show that there is a reduction in overfitting in several settings and provide empirical evidence for the intuition of what the evidence-based regularizer is meant to achieve. We therefore consider that a regularization to push nodes towards the evidence of the data has merit next to bulk shrinking of weights (L1/L2) or turning random nodes off (dropout).

Round 2

Reviewer 1 Report

The authors tested their method on a more difficult data set; the results support their assertion that it has merit.

There is a typo on line 203: 'regularitzationa' should read 'regularization'

 

Reviewer 2 Report

Adding the CIFAR tests help.

Back to TopTop