All articles published by MDPI are made immediately available worldwide under an open access license. No special
permission is required to reuse all or part of the article published by MDPI, including figures and tables. For
articles published under an open access Creative Common CC BY license, any part of the article may be reused without
permission provided that the original article is clearly cited. For more information, please refer to
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature
Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for
future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive
positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world.
Editors select a small number of articles recently published in the journal that they believe will be particularly
interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the
most exciting work published in the various research areas of the journal.
The aim of this article is to develop efficient methods of expressing multilevel structured information from various modalities (images, speech, and text) in order to naturally duplicate the structure as it occurs in the human brain. A number of theoretical and practical issues, including the creation of a mathematical model with a stability point, an algorithm, and software implementation for the processing of offline information; the representation of neural networks; and long-term synchronization of the various modalities, must be resolved in order to achieve the goal. An artificial neural network (ANN) of the Cohen–Grossberg type was used to accomplish the objectives. The research techniques reported herein are based on the theory of pattern recognition, as well as speech, text, and image processing algorithms.
Some problems in the field of pattern recognition have been successfully solved. Commercial systems for speech recognition, image recognition, and automatic text analysis are known. The degree of success in solving these problems depends on the degree of formalized description of the subject area . Face image recognition is solved in isolation. The problem of identifying grammar and syntax errors is a more complex task. Recognition of images and scenes, dictation of texts from a microphone, and automatic classification of text are unsolved tasks. Existing systems only demonstrate their level of complexity. The difficulties that arise in solving these problems are in the synchronization of the analyzed information. This leads to the formation of large, pure hypotheses. In the case of the processing and synchronization of a large amount of information, their verification becomes a non-trivial task and is also unsolved within the framework of the applied methods. Currently, the complexity of the methods for representing semantic and mathematical information with both metalinguistic and figurative means practically does not allow for their effective use to solve problems. Within the scientific direction of artificial intelligence, numerous attempts have been made and are being made to use semantic and pragmatic information, mainly to solve the problem of human–machine communication in natural language. The works of Alkon, Alwang, and Bengio [2,3,4] are widely known. Their success is due to the fact that the semantic picture is replaced by the rigid structure of the relational database, from which natural language interpretations are made and attempts to interpret statements in terms of concepts are made. However, the great ambiguity of these interpretations arises from the inaccuracy of language models. It is not possible to automatically form a model based on texts alone. Less well-known are the ways in which semantic information is used. Image recognition Quasi-Zo [4,5] has been used as a world model to analyze scenes in which individual objects are represented by generalized geometric shapes such as balls and cylinders. With the help of this model, objects in the scene are represented, segmented, and identified and further described in metalinguistic terms, as well as the relations between them and their dynamics. Furthermore, all these steps are processed separately.
The development of methods for the representation of information at the semantic and pragmatic levels (equally convenient for both linguistic and image recognition tasks) is a key point in improving both the quality and functionality of these systems and in the transition to the next stage of development of intelligent systems (IS), i.e., the stage of creating integrated multimodal systems for information processing and storage. The existence of these tasks makes us look for new approaches to the methods of presenting and processing information from different modalities—verbal, visual, and supermodal (semantic and pragmatic) information—in synchrony .
Introducing knowledge into artificial IS is effective not due to modeling of individual intellectual functions but due to modeling of the computing environment in which entire tasks are solved. Intellectual systems are those that perform intellectual functions within the framework of cognitive behavior: perception, learning, formation of thinking patterns (using a pattern to solve current problems), problem solving, prediction, decision making, linguistic behavior, etc. Therefore, IS include natural language processing information systems, word processing systems, and automated systems. The classification of existing systems allows them to be divided into two classes: single-level systems that recognize speech events using one-way or modified Bayesian rules (implemented on a neural network) and synchronous processing systems using empirical linguistic rules .
For example, after an acoustic speech signal is fed into the system, it is digitized, cleaned up, normalized by amplitude, and freed from related information. Then, its pieces are compared to the standards for each level that were made during the training stage.
In the case of solving simple recognition problems, commands with a limited vocabulary and single-level statistical approaches are most often used. In order to solve more complicated problems, such as finding keywords in a stream of continuous speech, the structural approach needs to use information from all levels of language, from morphology to syntax, as well as information from outside of languages, such as semantics and pragmatics. The complexity of the task of building speech recognition systems comes from the fact that a lot of information with different internal structures to be processed by different algorithms needs to be put together into a single whole. In addition, the use of practical solutions to the problem of speech recognition encounters a psychological barrier, which consists of the fact that a person expects the same possibilities in communication from speech recognition systems as in communication with a person. Solving the latter task involves recreating—if possible—processing, and presenting the information at one’s disposal. This means that in addition to integrating linguistic and extralinguistic sources of knowledge at different levels, it is necessary to integrate information processing subsystems from other modalities—primarily visual. When three problems are solved, effective synchronization and integration of a large amount of disparate information become possible. First, it is necessary to use the same algorithms to process information with different structures. Second, it is desirable to implement these algorithms with the use of specialized (directed precisely to these algorithms) equipment instead of universal processing means. Thirdly, it is necessary to implement an associative means of synchronizing information.
Analysis of existing systems showed that, as in the case of speech recognition, when solving the problem of image recognition, two main approaches are used: geometric and linguistic. Image recognition has its own problems when it comes to sorting through large quantities of information because it requires a considerable amount of math. IS that work well can only be made if they are synchronized and have a high level of resilience. They can be represented in the form of a set of rules or a declarative representation of knowledge when the information is represented in the form of a database. Solving the problem of integrating information from different modalities would allow us to escape this vicious circle [8,9,10].
The aim of this work is to identify effective ways of synchronizing multilevel structured information from different modalities (images, speech, and text), which allows for natural reproduction of the structure of information as it occurs in the human brain. Processing optimization methods should allow for modeling of a sustainable process. For this purpose, neural network representation and processing of different modalities can be used. It is also necessary to develop a method and algorithm to train neural networks for robustness and synchronization.
The current paper contains five sections in which we described the used of CGNN training to analyze time sequences to represent speech as textual information, resulting in h stability for dynamic information generation. Then, a description of the algorithm and an example implementation of a stability model in a CGNN are presented.
2. Cohen–Grossberg Network Training for Different Modalities
Using an ANN and considering existing solutions showed that they can be broken down into two types: static and dynamic systems. Classical networks with elements that are like neurons can solve the problem of recognizing spatial images and speech characteristics. Dynamic images and speech can also be recognized by using networks with delay elements and dynamic neural networks. In this case, special techniques are used to take into account the way information is organized over time.
An ANN that takes into account dynamic time-based information can be used to analyze temporal sequences in which the representation of both speech and visual/textual information is reduced, such as a Cohen–Grossberg network.
Cohen and Grossberg presented their variation of an ANN in  represents self-organization, competitiveness, etc. Grossberg designed a continuous time-based racing network based on the human visual system. His work is characterized by the use of non-linear mathematics to model specific functions. The topics of their papers include specific areas, such as how adversarial networks can provide an improvement in recognizable information in vision, and their work is characterized by a high level of mathematical complexity [12,13].
In order to take the temporal structure of the information into account, a special technique is used. Information is fed with delays due to additional network inputs, and the highest output is expected to be produced. In this case, the network begins to take into account the time context of the input, and dynamic images are automatically formed.
We propose the use of the learning law of the adaptive weights in the Grossberg network, which W. Grossberg calls long-term memory (LTM) because the rows of W represent patterns that have been stored and can be recognized by the network. The stored pattern that is closest to the input produces the highest output in the second layer.
One law of learning for is given by:
where is yjr learning rate coefficient, W is the input vector, and t is the time variable. The two-layer equations show a passive decay term in the first term in the bracket on the right and a Hebbian-like learning process in the second term. Combined, these terms result in the deterioration of Hebb’s rule.
The first-layer equation normalizes the strength of an input pattern after receiving external inputs. The input vector (p) is used to calculate the excitatory and inhibitory inputs. It takes the shape of
where determines the speed of response, p is the input vector, t is the time variable, and b is inhibitory bias.
Equation (2) is an intriguing shunt model with the following input:
The sum of each element in the input vector—aside from the ith element—therefore constitutes the inhibitory input to the ith neuron.
The on-center/off-surround pattern is created by the two matrices: and because the inhibitory input, which shuts the neuron off, comes from locations outside of the input vector, while the excitatory input, which includes the neuron, comes from the ith element of the input vector, which is centered at the same point. The input pattern is normalized by this style of binding pattern.
The lower bound of the maneuver pattern is set to zero by setting the inhibitory bias () to zero for the sake of simplicity. Moreover, it uniformly adjusts all components of the excitation bias ():
As a result, the upper bound for each neuron is equal. Let us examine the first layer’s normalizing effect, where the ith neuron’s response has the following form:
At steady state, gives us:
If we opt for neuron steady-state output, the outcome is:
The relative intensity of the ith input is defined as follows:
The steady-state activity of neurons takes the following form:
Hence, regardless of the size of the overall input (P), is always proportional to the relative intensity (). Moreover, the neuron’s overall activity is modest:
The input vector is normalized to maintain the relative intensities of its separate components while reducing the overall activity to less than . As a result, rather than encoding the instantaneous variations in the total input activity (P), the first layer’s outputs () encode the relative input intensities (). This outcome is the result of the shunt model’s nonlinear gain control and the on-center/off-surround coupling of the inputs.
The consistency of the information that has been processed and the dynamic properties of the visual system are explained in the first layer of the Grossberg network. The network responds to relative, not absolute, picture intensities.
The continuous-time period layer, the second layer of the Grossberg network, serves a number of purposes. The entire activity in the layer is first normalized, just like the first layer. Second, the detected information enhances its model, making it more likely that the neuron with the greatest input also produces the strongest response. Lastly, it stores the amplified model, acting as short-term memory (STM).
The presence of feedback in the second layer is the primary distinction between the two layers. It enables the network to retain a pattern even when the input is no longer present. The band also engages in competition, which amplifies the information that may be recognized in the pattern.
The equation for the second layer takes the following form:
This is a shunt model with an excitation input of , while on-center feedback is expressed as , and adaptive weights similar to those in a Kohonen network make up . Following training, the rows of indicate prototype models. The off-surround feedback provided by is the inhibitory input to the shunting model. provides this feedback.
The following example of a network with two neurons can be taken into consideration to demonstrate the impact of the second layer of a Grossberg network:
The layer equations are:
The prototype models (rows of the weight matrix ()) and the output of the first layer serve as the internal multipliers for the second layer (normalized input model). The prototype model that is most similar to the input model has the highest internal multiplier. The second layer then engages in competition among neurons, which has the effect of supporting large outputs while attenuating small outputs, thereby tending to improve the output pattern. Competition in a Grossberg network preserves large values while reducing small values, yet it need not necessarily reduce all small values to zero. The activation function controls how much recognizable information is amplified .
Two key characteristics are mentioned. First, some information augmentation occurs before the input is eliminated. According to the second layer’s inputs:
As a result, the input to the second neuron is 1.5 times that of the first neuron. However, after a quarter of a second, the second neuron’s output surpasses that of the first neuron by a factor of 6.34.
The network subsequently develops and saves the pattern once the input is set to zero, which is the second distinguishing feature of the response. Even after the input is stopped, the output continues. Grossberg  refers to this tendency as reverberation. The network can store the pattern and the on-center versus off-surround pattern of the connections, which are determined by and , thanks to nonlinear feedback. This leads to an improvement.
It is taken into consideration that both levels of the Grossberg network use an on-center/off-surround structure . Other connection patterns are available for usage in various applications. The directed receptive field has been suggested as a structure to implement this technique . The “on” (excitatory) connections for this structure originate from one side of the field, whereas the “off” (inhibitory) connections originate from the other side of the field.
When is not active, it is feasible to disable learning in specific circumstances. The equation in this case has the following training form:
which is expressed in the form of a vector as
where .The elements of the ith row of make up the vector.
Learning is only possible when the terms on the right-hand side of Equation (1) are multiplied by the integer . This is an ongoing application of the principle of learning from the beginning. The topology and structure of the data being converted are preserved because of the learning law of adaptive weights. Similar pieces are converted along the same trajectory, whereas distinct fragments are converted along various paths. In this scenario, the network starts to consider the temporal context of the input. Then, it is possible to automatically create dynamic picture standards [12,13].
3. H-Stability Results
The recurrent ANN performed well when used to handle temporal information; however, the act of manually transforming the structure into recognizable data is where the issue lies. A neural network that performs multilevel information processing must make use of the robustness concept with regard to manifolds and robustness criteria in order to successfully resolve this issue . ANNs are useful for the study of temporal sequences in which the presentation of speech, visual, and textual information is condensed. In these networks, impulse events are realized at a given moment and can be derived as a result of the current H-stability result. The major findings concerning the equilibrium state of the (20) model’s h stability are taken into consideration. The authored lemma and authored theorem are found in [16,17,18].
Equation (20) was taken from , which presents the theoretical model mentioned in Theorem 1 from the same paper.
Let us assume that:
There exists a positive number (μ), and
The functions and are such that
where , , , , ;
Such a function exists (), where the following inequalities hold [19,20,21]:
where exists for each .
When this happens, the equilibrium () of a pulsed CGNN with bidirectional associative memory and a delay of (20) is globally exponentially stable with regard to the function h [17,22,23].
where it is known that and only for , . The last estimate concludes the global exponential stability of the equilibrium state () of (20) with respect to the function (h) from .
Associative reproduction and dynamic information generation make up the neural network technology for processing unstructured data from various modalities that are being presented. It is based on neural elements in a steady state. Such associative memory data consist of a collection of pieces resembling neurons that are connected in parallel, share an input and an output, and differ from one another in the order of the signals of the synaptic connections on a generalized dendrite. The links each weigh one pound.
A sequence along a trajectory in a multidimensional signal space represents the changed information [21,24,25,25].
4. Algorithms of a Stability Model in Cohen–Grossberg-Type Neural Networks
The software implementation of the mathematical model was developed in C programming language. We used OpenMPI technology  on a cluster of eight machines, each equipped with four Intel® Xeon®  processors.
A neural network based on the generalized CGNN model was used for the software implementation. The implemented model has the following form :
with impulse disturbances of the following type:
There are fixed values for the arrays () only in the procedural implementation, as presented in (25). In the implementations using a parallel technique, the initial values of the arrays are set with the minimum value passed as a parameter to the program [26,29].
All assumptions in Theorem 1 are assumed to be satisfied; therefore, hypotheses 1 to 5 are satisfied [18,19,24]. We set the corresponding constants as follows:
A stable point is sought such that there exists a positive number () that satisfies Equation (21). For each value of the four arrays (), a cyclic calculation is performed for the value of , as well as a check for the fulfillment of Equation (21).
All input parameters are entered and saved in a structure for faster access. A link is made to records in different files, depending on the obtained result. All calculated values for the searched function are saved in a common file [25,30]. Only obtained values are stored in a separate file, where the system is stable according to the theorem, i.e., the obtained results are for . The algorithm used for validation of the stability point is presented in Figure 1.
In a third separate file, the maximum values of the increasing stability function are recorded, together with the corresponding values for the given arrays. For this purpose, at the time of calculations, the last calculated highest value of is kept in a structure and compared with the current calculated value. A higher value is recorded and saved as the next reference value to check. Graphically summarized data are presented in Figure 2 and Figure 3.
Based on the obtained results, we built an ANN with sixteen input parameters and sixty-four neurons in the hidden layer. Supplied values for the parameters were determined according to Equation (25). The stability results are shown in Figure 4. The blue and red lines represent the response of the first and second layer of the neural network.
The example in Figure 4 explains how impulsive perturbations can be used to influence the stability behavior of CGNNs and demonstrates the usefulness of the theoretical findings that have been proposed. The blue and red colors represent the responses of the first and second layers of the neural network for the period of 8. After period, we can see that the neural network stabilizes.
Based on an analysis of existing intelligent systems, we suggest the use an ANN of the Cohen–Grossberg type to model a sustainable processing of any information in a multidimensional space and a multilevel time structure using dynamic networks with neuro-like elements and a sustainable signal amount. Different modalities using a homogeneous representation of information for a neural network allow for easily integrated information at all levels of the decision-making process. The qualitative properties and global exponential robustness of the solutions with respect to the manifold, as defined by a function for the bidirectional associative memory neural network with time-varying delays, were investigated. A procedural implementation was applied to demonstrate the validity of the obtained criteria for the h stability of the equilibrium state of the model. The software implementation of the mathematical model was developed in C language. The results were examined through a representative sample, since the volume of data was large. Values that meet the requirements of the H-stability theorem, namely positive values of , were used, which is a mechanism to account for the statistical properties of the information, along with a nonlinear transformation, and recovery was allowed. Changes in input parameters for the model with all network training were considered to find a stable point. The detailed constraints are discussed in  and specified in Section 4. Using the dictionary of elements of the internal structure of the information sequence, an ANN including a dictionary was formed. In future work, we will explore multimodality synchronization by including video and audio data.
Conceptualization, E.G. and I.T.; methodology, E.G. and I.T.; software, I.T.; validation, E.G. and I.T.; formal analysis, E.G. and I.T.; investigation, E.G. and I.T.; resources, E.G. and I.T.; data curation, E.G. and I.T.; writing—original draft preparation, E.G. and I.T.; writing—review and editing, E.G. and I.T.; visualization, E.G. and I.T.; funding acquisition, E.G. All authors have read and agreed to the published version of the manuscript.
This research was funded by the Research and Development Sector of the Technical University of Sofia.
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Data sharing not applicable.
The authors would like to thank the Research and Development Sector of the Technical University of Sofia for the financial support.
Conflicts of Interest
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
The following abbreviations are used in this manuscript:
Artificial neural network
Cohen–Grossberg neural network
Open message-passing interface
Ravindran, K.; Rabby, M. Software cybernetics to infuse adaptation intelligence in networked systems. In Proceedings of the 2013 Fourth International Conference on the Network of the Future (NoF), Pohang, Republic of Korea, 23–25 October 2013; pp. 1–6. [Google Scholar]
Rumelhart, D.; Zipser, D. Parallel distributed processing: Explorations in the microstructure of cognition. In Chapter Feature Discovery by Competitive Learning; The MIT Press: Cambridge, MA, USA, 1986; Volume 1. [Google Scholar]
Schwenk, H. Efficient training of large neural networks for language modeling. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), Budapest, Hungary, 25–29 July 2004; Volume 4, pp. 3059–3064. [Google Scholar]
Grossberg, S. Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors. Biol. Cybern.1976, 23, 121–134. [Google Scholar] [CrossRef] [PubMed]
Grossberg, S. Studies of Mind and Brain: Neural Principles of Learning, Perception, Development, Cognition, and Motor Control; Boston Studies in the Philosophy and History of Science; Springer: Dordrecht, The Netherlands, 1982. [Google Scholar]
Anderson, J.A. A simple neural network generating an interactive memory. Math. Biosci.1972, 14, 197–220. [Google Scholar] [CrossRef]
Grossberg, S.; Mingolla, E.; Todorovic, D. A neural network architecture for preattentive vision. IEEE Trans. Biomed. Eng.1989, 36, 65–84. [Google Scholar] [CrossRef] [PubMed]
Stamov, G.; Simeonov, S.; Torlakov, I. Visualization on Stability of Impulsive Cohen-Grossberg Neural Networks with Time-Varying Delays. In Contemporary Methods in Bioinformatics and Biomedicine and Their Applications; Springer International Publishing: Berlin/Heidelberg, Germany, 2022; pp. 195–201. [Google Scholar] [CrossRef]
Stamov, G.; Stamova, I.; Simeonov, S.; Torlakov, I. On the Stability with Respect to H-Manifolds for Cohen–Grossberg-Type Bidirectional Associative Memory Neural Networks with Variable Impulsive Perturbations and Time-Varying Delays. Mathematics2020, 8, 335. [Google Scholar] [CrossRef][Green Version]
Stamov, G.T.; Stamova, I.M. Integral manifolds for uncertain impulsive differential–difference equations with variable impulsive perturbations. Chaos Solitons Fractals2014, 65, 90–96. [Google Scholar] [CrossRef]
Stamova, I.M. Impulsive control for stability of n-species Lotka–Volterra cooperation models with finite delays. Appl. Math. Lett.2010, 23, 1003–1007. [Google Scholar] [CrossRef][Green Version]
Stamova, I.M.; Stamov, G.T. Impulsive control on global asymptotic stability for a class of impulsive bidirectional associative memory neural networks with distributed delays. Math. Comput. Model.2011, 53, 824–831. [Google Scholar] [CrossRef]
Stamova, I.M.; Stamov, T.; Simeonova, N. Impulsive control on global exponential stability for cellular neural networks with supremums. J. Vib. Control2013, 19, 483–490. [Google Scholar] [CrossRef]
Wang, Y.; Feng, B.; Ding, Y. QGTC. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, 2–6 April 2022. [Google Scholar] [CrossRef]
Song, Q.; Yang, X.; Li, C.; Huang, T.; Chen, X. Stability analysis of nonlinear fractional-order systems with variable-time impulses. J. Frankl. Inst.2017, 354, 2959–2978. [Google Scholar] [CrossRef]
Open MPI, Software in the Public Interest, Open MPI: Open Source High Performance Computing. Available online: https://www.open-mpi.org/ (accessed on 17 November 2022).
Cohen, M.A.; Grossberg, S. Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Trans. Syst. Man Cybern.1983, SMC-13, 815–826. [Google Scholar] [CrossRef]
Stamova, I.; Stamov, G. Applied Impulsive Mathematical Models; CMS Books in Mathematics; Springer International Publishing: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Algorithm for validation of the stability point.
Algorithm for validation of the stability point.
Positive values for C.
Positive values for C.
Positive values for a D array.
Positive values for a D array.
Stability result for and .
Stability result for and .
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely
those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or
the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas,
methods, instructions or products referred to in the content.