3.1. Performance
A grayscale image of size with 1024 pixels is used as input; the image content is obtained by sequentially generating the background and the object. Each pixel point in the background has the same grayscale value, which is generated randomly. The object consists of several randomly positioned but continuous pixel points with random and consistent grayscale values. The object is randomly placed at a random position above the background, and then the object is allowed to move in one of eight directions (↑, ↗, →, ↘, ↓, ↙, ←, ↖), and two successive frames of the motion are extracted and input to AVS. In multiple types of instances, comparing the actual object motion direction with the detection results of AVS, statistical detection accuracy is used to analyze its effectiveness. Considering that different object sizes may affect the detection performance, the motion instances of objects of 10 different sizes (pixel scale of 1, 2, 4, 8, 16, 32, 64, 128, 256, 512) in 8 directions are implemented separately; different sizes of objects are tested 125 times in each motion direction, so a total of 10 sizes of objects in 8 motion directions are 10,000 testing experiments.
The experimental results are reported in
Table 1 using AVS to detect the motion direction of objects in different instances. The results show that AVS can achieve a 100% detection accuracy for the motion of 10 different sizes of objects in grayscale images in different directions. This indicates that AVS is able to effectively detect the direction of object motion through successive grayscale image frames in the video, regardless of the size and shape of the objects and regardless of where they are located in the image.
Figure 8a shows an instance of AVS motion direction detection in the theoretical image condition. It shows two successive grayscale images of a randomly shaped object consisting of 16 pixels moving downward in an image field consisting of 1024 pixels in
.
Figure 8b depicts the number of times the eight LMDNs were activated using the one-dimensional statistics used in neuroscience studies. It was shown in mammalian visual cortex that, for a motion direction occuring in the global receptive field, neurons also have direction-sensitivities. The global motion direction corresponds to the GMDN with the highest activation intensity which is highly similar to phenomina in the primary cortex shown in
Figure 3. LMDNs were activated 12, 9, 11, 6, 12, 7, 21, and 7 times, which indicates the activation intensity of the corresponding GMDNs, respectively, in the time of
, with the neuron sensitive to the downward motion being activated 21 times and the density of the corresponding activation bar being significantly higher than any of the others, the corresponding direction is consistent with the actual motion direction of the object in the image, and thus a successful detection.
Figure 8c counts the sum number of activations of each LMDNs, i.e., the corresponding GMDNs response strength, with a bar chart.
Figure 8d marks the pixel locations that caused the LMDNs to activate during the
in blue, a region that clearly depicts the general outline of the moving object, due to the particular sensitivity of the LMDNs to the edges of moving objects within the local receptive field. Thus, the model can also perform edge tracking of objects in motion, dynamically displaying the direction in which a moving object is located in the global receptive field in real-time. LMDNs naturally demonstrate the edge-sensitivity in animal vision along with the local motion direction selectivity where the response to non-preferred directions is null. Furthermore, the LMDN has the ability to correctly extract directional information in object motion phenomena at arbitrary greyscale, which is consistent with Barlow’s basic experimental conclusion that changes in light do not interfere with directional selectivity. The result also excludes the significant resistance to the interpretation of the motion detection mechanism posed by the opposite functions of the ON and OFF response regions in the same local receptive field. In conclusion, all the properties of AVS correspond highly to the core physiological experimental findings.
3.3. Performance in Complex Environments
Although AVS is able to achieve error-free in the theoretical conditions, some random and unknown interference information might affect the AVS performance in complex environments. To demonstrate more comprehensively the performance of AVS in complex environments. We experiment with static and dynamic noise separately. In addition, we further classify them as in the background or being on top of the image. Thus, four types of noise are given, namely static background noise, dynamic background noise, static full-image noise, and dynamic full-image noise. In the experiments, the grayscale value of each noise pixel is randomly given. By controlling the number of noise pixels added, we tested 3 progressive levels of complexity for this noise environment. The number of noise pixels was set to 10%, 20%, and 30% of the total number of picture pixels, respectively. The purpose was to examine the noise immunity trend of AVS in each type of complex environment.
The first type of complex environment, static background noise. From the background of each theoretical image, a number of pixels at the same position are selected and they are replaced with an equal number of noise pixels. This noise can make the static background of the moving object more cluttered, thus increasing the complexity of the environment.
Figure 11a shows an detection example of AVS under static background noise. It shows two successive images of a randomly shaped object consisting of 128-pixel points moving downward in an image consisting of 1024 pixel points in
. In these two images, some pixel points in the noiseless background are replaced by new random grayscale pixel points with fixed positions.
Figure 11b records the responses of the LMDNs in an intuitive one-dimensional statistical method used in neuroscience studies, the sum of which is the activation intensity of each corresponding GMDN. LMDNs were activated 47, 32, 35, 38, 44, 35, 83, and 40 times, respectively, during
, with the LMDN preferring downward motion being activated 83 times and its bar density was significantly higher than any of other neurons and thus was inferred as the global motion direction; the sensitive direction of this class of neurons was consistent with the actual motion direction of the object in the image, and thus a successful detection. The histogram in
Figure 11c records the response strength of GMDNs. The pixels activated by either neuron in
time are highlighted in blue in
Figure 11d.
The second type of complex environment, dynamic background noise. From the background of each theoretical image, a number of pixels at different positions are independently selected and replaced with an equal number of noise pixels. This noise can make the background of a moving object more cluttered while also containing real-time changes, thus further increasing the complexity of the environment.
Figure 12a shows an detection example under dynamic background noise. It shows two successive frames of a randomly shaped object consisting of 64-pixel points moving upper leftward in an image consisting of 1024 pixel points in
. In these two frames, some pixel points at different locations in the background are replaced by random grayscale noise points. It can be seen from the figure that the noise points are present only in the background and change dynamically during the motion of the object.
Figure 12b depicts the number of activations of LMDNs, i.e., the activation intensity of each corresponding GMDN. LMDNs were activated 81, 88, 90, 116, 89, 92, 81, and 84 times, respectively, during
, with the LMDN preferring upper leftward motion being activated 116 times and the density of activation bars being significantly higher than any of the others. The sensitive direction of this class of neurons is consistent with the actual motion direction of the object in the image, and thus a successful detection. The bar graph in
Figure 12c records the response strength of GMDNs. The pixels activated by either neuron in
time are highlighted in blue in
Figure 12d.
The third type of complex environment, static full-image noise. From each theoretical image, a number of pixels at the same position are selected and replaced with an equal number of noise pixels. This type of noise not only has the characteristics of static background noise, but also obscures object pixels, causing clutter everywhere in the full image.
Figure 13a shows an example of detection under static full-image noise. It shows two successive frames of a randomly shaped object consisting of 32-pixel points moving lower rightward in an image consisting of 1024 pixels in
. The uppermost pixel points at the same location in both images are covered by random grayscale noise points. It can be seen that the noise acts on the uppermost layer of the image and that the noise point locations do not change during the object’s motion.
Figure 13b depicts the number of times the LMDNs were activated, i.e., the response strength of each corresponding GMDN. LMDNs were activated 12, 6, 9, 6, 7, 14, 11, and 24 times during the time interval of
. The LMDNs preferring lower rightward motion were activated 24 times with a significantly higher activation bar density than any of the others. The sensitive direction of this neuron is consistent with the actual motion direction of the object, and thus a successful detection. The bar graph in
Figure 13c records the response intensity of the GMDNs. The pixels activated by either neuron in
time are highlighted in blue in
Figure 13d.
The fourth complex environment, dynamic full-image noise. From each theoretical image, a number of pixels at different positions are independently selected and replaced with an equal amount of noise pixels. This noise not only has the characteristics of static full-image noise, but also contains real-time changes. This type of noise has the characteristics of the three aforementioned noise types in parallel, and thus is the most complex noise type.
Figure 14a shows an example of detection under static full-image noise. It shows two successive frames of a randomly shaped object consisting of 256-pixel points moving lower leftward in an image consisting of 1024 pixel points in
. Some pixel points at different locations of the two images are covered by random grayscale noise points. We can see that the noise acts on the uppermost layer of the images, and the noise point positions are dynamically changing during the object’s motion.
Figure 14b depicts the number of activations of LMDNs, i.e., the response strength of each corresponding GMDN. LMDNs were activated 113, 116, 118, 117, 119, 201, 138, and 127 times, respectively, in the time of
, where the LMDNs preferring the lower rightward motion were activated 201 times. The density of activation bars is significantly higher than any of the others, the sensitive direction of this neuron is consistent with the actual motion direction of the object, and the detection is successful. The bar graph in
Figure 14c records the response intensity of the GMDNs. The pixels activated by either neuron in
time are highlighted in blue in
Figure 14d.
The above four different types of noise are added to the noise-free object motion examples to generate four noise test sets to examine the noise immunity performance of AVS. The percentage of noise points amount to the total number of image pixels is set to 10%, 20%, and 30%, and the detection accuracy is counted. To further verify the noise immunity performance of AVS, the CNN is also employed to complete the direction detection task in such four noisy environments for comparison.
For each of the above noises, 10,000 sets of motion instances are generated by computer simulation respectively. All other experimental conditions are the same as the CNN comparison experiments without noise. The detection accuracies of objects of different sizes in the four noise environments are calculated separately. The experimental results are shown in
Table 3,
Table 4,
Table 5 and
Table 6.
Table 3 statistically shows the detection accuracies of AVS and CNN under static background noise. In the 10% noise environment, AVS can achieve 100% detection accuracy in all situations when detecting objects of scale larger than 4 pixels; however, a small number of detection errors occur when the object scale is smaller than 8 pixels. This is due to the fact that under a certain amount of noise, the smaller the object size, the greater the relative interference of the noise. the same trend exists for CNN detection, i.e., the accuracy is higher for objects of larger size. However, the CNN achieves only 69.70% for the largest scale object of 512 pixels in the test set, which is just not as good as the AVS for the smallest scale 1-pixel object, i.e., 92.5%. As the object size decreases, the accuracy of CNN reduces rapidly, and the accuracy of 1-pixel object detection is only 13.18%. When the noise ratio increases to 20%, the accuracy of both methods reduce to some extent relative to a 10% noisy environment, with AVS having an all-100% error-free detection capability in tasks for scales larger than 8 pixels, while the accuracy falls from 100% to 99.5% in tasks for 8-pixel objects. The detection accuracy was reduced by 3.5%, 3%, and 3.5% in the tasks of 1, 2, and 4-pixel objects, respectively, while the accuracy of CNN showed a significant reduction in tasks of each object size except for 1 and 2-pixel objects. In the 30% noise condition, AVS still maintains all-100% error-free detection performance for objects larger than 8, while the accuracy reduction is 7%, 5% and 0.5% for 1, 2, and 8-pixel objects, respectively, relative to the 20% noise environment. Although the accuracy in doing the 4-pixel object task becomes lower in the 20% noise environment, this increase should be considered as a singular value due to experimental randomness, because the overall trend shows that the error-free detection range of AVS becomes narrower as the background noise proportion increases, and the accuracy becomes worse overall in tasks with detection errors. CNN, on the other hand, similarly becomes worse on all scales excluding the 8-pixel object. The small accuracy increase of 8-pixel objects should also be considered as a singular value due to experimental randomness. In the experiments with three different noise concentrations of static background noise, AVS maintains high accuracy performance of 100% for all objects larger than 16 pixels. The mean values of AVS are 98.45%, 97.4%, and 96.4%, which are much higher than CNN, i.e., 30.75%, 25.55%, and 23.04%, respectively. It is obvious from the line graph of the mean values in
Figure 15a that the accuracy of both methods reduces as the noise percentage becomes larger, but AVS significantly outperforms CNN in all cases.
Table 4 provides statistics on the detection accuracy of AVS and CNN under dynamic background noise. In the 10% noise environment, AVS can achieve 100% accuracy for all objects larger than 32 pixels; however, detection errors occur for objects smaller than 64 pixels. CNN is also easier to recognize the object’s motion with larger object scales. Both the two methods had the lowest accuracy rates for the 1-pixel object, which is 29.00% and 12.68%, respectively; the mean accuracy values for the 10 different sizes of objects were 79.80% and 27.85%, respectively, with a decrease, compared to the static background noise, but AVS still significantly outperformed CNN. When the noise ratio increases to 20%, the detection accuracy of both methods reduce relative to the 10% noise environment, but AVS still maintains all-100% error-free detection capability in tasks for object scales larger than 32 pixels; in tasks for smaller scales, the accuracy decreases naturally. CNN, on the other hand, has decreased for all the object scales. AVS maintains all-100% error-free performance for those larger than 128 pixels in the 30% noise condition, while the accuracy decreases significantly for the smaller object, except for 1-pixel objects (which should be considered as singular values due to experimental randomness), relative to the 20% noise environment. The CNN similarly decreases at all object scales except for 1-pixel objects (which should be considered as singular values due to experimental randomness). In the three experiments with different concentrations of dynamic background noise, AVS maintains high accuracy performance of 100% for all the objects larger than 128 pixels. The mean accuracy of AVS is 79.80%, 69.65%, and 65.50%, respectively, which is much higher than that of CNN at 27.85%, 23.33%, and 20.72%. It is obvious from
Figure 15b that the accuracy of both methods decreases as the noise percentage becomes larger, but AVS significantly outperforms CNN in all cases. Relative to static background noise, dynamic background noise produces stronger interference in the motion direction recognition task.
Table 5 statistically shows the detection accuracies of AVS and CNN under static full-image noise. In a 10% noise environment, AVS can achieve 100% detection accuracy for all objects larger than 8 pixels; however, a small number of detection errors occur for objects smaller than 16 pixels. CNN obtains the highest accuracy for the largest scale 512-pixel object in the test set, but it is only 55.15%, which is even far less than the accuracy of AVS when detecting 1-pixel objects, i.e., 77.5%. Both the two methods have the lowest accuracy for the task of 1-pixel objects, which is 77.5% and 13.22%, respectively; the mean accuracy values for 10 different scales of objects are 94.4% and 27.29%, respectively, with AVS significantly outperforming CNN. When the percentage of noise increases to 20%, the accuracy of both methods decreases relative to the 10% noisy environment, but AVS still maintains an all-100% error-free detection capability in object detection tasks larger than 16 pixels; the accuracy decreases in tasks of smaller sizes. The AVS maintains all-100% error-free performance for objects larger than 64 pixels in the 30% noise condition, while the accuracy decreases in the smaller object tasks compared to the 20% noise environment, and the CNN also decreases in tasks at all object scales. In the three experiments with different concentrations of static full-image noise, AVS maintains a high accuracy performance of 100% for all objects larger than 64 pixels. the mean accuracy of AVS is 94.4%, 88.4%, and 83.35%, respectively, which is much higher than that of CNN at 27.29%, 21.35%, and 17.79%. It is obvious from
Figure 15c that the accuracy of both methods decreases as the percentage of noise becomes larger, but AVS is significantly better than CNN in all cases. Static full-image noise is a stronger noise compared to static background noise.
Table 6 statistically shows the detection accuracies of AVS and CNN under dynamic full-image noise. The accuracy of the CNN is also easier to recognize the movement of objects on a larger scale. Both the two methods had the lowest accuracy rates in the 1-pixel object task, at 23% and 12.87%, respectively; the mean accuracy values for the 10 different sizes of objects were 77.6% and 24.23%, respectively, which were lower compared to the three noise types mentioned above, but AVS still significantly outperformed CNN. When the noise ratio increases to 20%, the accuracy of both methods decreases relative to the 10% noise environment, but AVS still maintains all-100% error-free detection capability for the task of objects larger than 32 pixels; in smaller object tasks, the accuracy decreases, while CNN decreases in all different sizes in experiments. AVS maintains all-100% error-free performance for objects larger than 128 pixels in the 30% noise condition, while the accuracy decreases significantly in the smaller object tasks compared to the 20% noise environment, and the CNN also decreases in tasks of all object sizes except for 2-pixel objects (which should be considered as singular values due to experimental randomness). In three experiments with different concentrations of dynamic full-image noise, AVS maintains high accuracy. In three experiments with different concentrations of dynamic full-image noise, AVS maintains high performance of accuracy of 100% for objects larger than 128 pixels. The mean accuracy values of AVS are 77.6%, 66.55%, and 59%, respectively, which is much higher than that of CNN at 24.23%, 19.14%, and 16.45%. It is obvious from the mean value line graph in
Figure 15d that the accuracy of both methods decreases as the noise percentage gets larger, but AVS significantly outperforms CNN in all cases. Relative to the first three types of noise, the dynamic full-image noise produces the strongest interference in the motion direction recognition task. In different concentration environments of the four types of noise, the accuracy of both detection methods decreases gradually with the decrease of object size.
Figure 15 depicts the relationship between different noise ratios and the detection accuracy of the two methods under the four types of noise environments with line graphs, respectively. From the figure, it can be seen that for both methods, the impact of static noise on detection difficulty is smaller than that of dynamic noise under the same other noise conditions, and the impact of background noise is smaller than that of full-image noise. However, when compared to CNN for each type of noise, AVS demonstrates superior noise immunity performance and also a significantly higher noise-free detection capability.
The experimental results show that AVS achieves very high detection accuracy and robustness under both noise-free and four different types of noise compared to the CNN method. At the same time, since the design of AVS is based on the real function of each corresponding cell in the retina to reproduce the physiological collaboration phenomenon of the relevant direction-sensitive pathway, there is no learning optimization process, no teaching data import, and no need to introduce any parameters in AVS. Compared with CNNs, which require a huge model parameter optimization task for a large amount of data during the learning process, AVS has the inherent advantage of being an extremely efficient motion direction detector even if the learning process is skipped, thus saving significant computational resources. This is because AVS has a strong physiological background, which makes it fully interpretable; whereas CNN has been considered a black-box optimizer with a lack of interpretability. Moreover, in terms of hardware implementation, AVS is designed based entirely on the physiological computational properties of relative neurons, requiring only three simple operation modes: comparison, summation, and logic operations, and is therefore extremely easy to implement in hardware, allowing for a very high computational speed while greatly liberating the scope of application scenarios. In summary, AVS has four important characteristics of high effectivity, robustness, efficiency, and rationality at the same time.