1. Introduction
As information technology has advanced, graph networks have become commonplace in daily life. Since graph networks are frequently used to describe connections among items, they can be further mined and analyzed, allowing for a deeper understanding of the networks through data mining and network analysis. Graph data exist widely in various scenarios, and they often feature a large scale, complex structure, and multiple information [
1]. The diverse entities and inter-entity associations in these data constitute a series of different information networks [
2,
3,
4]. For example, in social-networking platforms, social networks are formed by friends or followers among users; citation networks are formed between papers in academic websites; and the World Wide Web is formed by Web pages. In addition, the network model has also contributed to the epidemiological study of the global pandemic of COVID-19 [
5]. Common network analysis tasks include social recommendation, anomaly detection, node classification, node clustering, and community discovery [
6].
In the era of fast-paced information development, the scale of real-life networks is often very large, with numerous nodes exhibiting complex attributes. Traditional network analysis algorithms are therefore inadequate for deployment and application in such colossal networks. Consequently, efficiently mining crucial knowledge from these information networks has emerged as a recent research hotspot and critical research direction within the artificial intelligence and data-mining domains, delivering significant societal worth. The majority of deep-learning-based techniques are currently being utilized to learn potential graph representations by fusing node attribute and graph topology data. For example, the GNN-based model [
4], which has excelled in graph embedding, is able to fuse topological and feature information better. Gated graph sequence neural networks (GGNN) [
7] optimizes the previously proposed graph neural network, and the researchers introduce the gated cycle GRU for neural network coding. Since GNNs are by nature vulnerable to hostile attacks, or small intentional perturbations on inputs, there are also many studies that introduce adversarial attack methods into graph data learning [
8], using adversarial training based on generators and discriminators to improve the model capabilities. The graph autoencoder [
9] is widely used in unsupervised network representation learning. The basic idea is to learn low-dimensional network node representation by taking the adjacency matrix or its variant as the original features of the nodes and using the autoencoder to achieve dimension reduction. Both the encoder and decoder in the model are multi-layer perceptron structures with multiple hidden layers; that is, they try to compress the graph structure information into low-dimensional vectors and then reconstruct its original structural features. Deepening the number of layers in a deep-learning network can allow us to learn multi-order neighbor information, but it often leads to over-smoothing [
10], making features less distinguishable between nodes, which results in the opposite effect.
To more effectively address the issues listed above, we introduce centrality encoding in the node attributes and add its consideration to the attention mechanism to better distinguish the importance of neighboring nodes, while adding random walk regularization makes it possible to sample neighbors that satisfy specific conditions each time in order to learn a potential representation of the node. These improvements help to address these problems more effectively. This will enable us to better capture neighborhood information in the attribute network and learn a stronger graph-embedding representation. We employ the features acquired by the model for node-clustering and link prediction tasks in order to further demonstrate the efficacy of our proposed methodology. Experimental results demonstrate that our technique performs better than other baseline methods, which supports both our hypothesis and the validity of the model. The following three main points sum up our contribution in this work:
We consider an attention-based convolutional layer with centrality encoding. In order to effectively aggregate and identify the significance and influence of various neighbor nodes, we apply a novel attention technique to integrate multi-hop neighborhood information;
We propose a novel attributed graph-embedding approach called RCAGE. An attention mechanism based on centrality encoding is employed for node attributes and graph structure information, while random walk regularization is introduced to learn the latent representation;
With various datasets, we perform node-clustering and link prediction tasks while utilizing the characteristics discovered by RCAGE. The experimental results indicate that the model achieves good performance in the corresponding tasks, proving its effectiveness and plausibility.
2. Related Works
The ubiquitous attribute graph network data are usually nonlinear, sparse, dynamic, and heterogeneous, which brings many challenges to the problem of attribute-network-related analysis. The aim of network representation learning is to obtain a reduced-dimensional vector representation, which enables nodes with analogous structures in the network to acquire comparable representations. Due to the impact of deep-learning techniques on the excellent capability of low-dimensional representation learning from data, the representation learning of attribute networks has recently attracted fresh attention in the field of study.
Early network-representation-learning techniques concentrate on dimensionality reduction by calculating the eigenvectors of the network connection matrix, such as the adjacency matrix and Laplacian matrix [
11]. The typical techniques for spectral clustering are Laplacian eigenmaps (LE) [
12], locally linear embedding (LLE) [
13], etc. It is challenging to apply such techniques to bigger networks because the feature vector’s computational cost is nonlinear compared to the eigenvectors of the matrix. Then, Perozzi et al. [
14] proposed the DeepWalk algorithm, which analogizes the sequence of nodes obtained by random wandering to sentences in natural language processing, and then performs the representation learning of nodes in the network by applying the SkipGram [
15,
16,
17] model. Subsequent researchers discovered that employing various random walk algorithms may result in various node representations, which also led to the creation of classical models for learning graph structure information, such as Node2vec [
18] and Struc2vec [
19]. LINE [
20] compensates for the sparse first-order proximity problem by defining the first-order and second-order proximity among the nodes and modeling the probability separately. DNGR [
6] learns the low-dimensional vector representation of nodes with stacked denoising self-encoders. SDNE [
21] uses a deep self-encoder to model the similarity between nodes. The approaches described above only take into account the network’s structural information, while actual networks typically also contain a significant quantity of attribute information. To more effectively maintain the information in the network, which is a hot topic for future study, both the attribute information and the structure information of nodes are required to be learned.
In recent years, with the development of deep learning, the emergence of graph neural networks especially has efficiently solved the above problems. Initially, some researchers [
22,
23] applied the CNN to analyze graph structure data, then employed the Fourier transform to decompose the graph’s Laplacian matrix before using graph convolution to extract features. Subsequently, Kipf et al. [
4] simplified the prior approach by proposing the graph convolutional network (GCN) algorithm. The graph attention network (GAT) [
24] extends the GCN with the introduction of the attention mechanism [
25,
26] and it utilizes a masked self-attentive layer to assign different weights to different nodes based on the features of their neighborhoods. SANE [
27] uses the attention mechanism and CBOW [
16] model to weight the interaction strength between nodes while capturing the similarity of the network topology and attribute information. DANE [
28] employs two deep models to capture and maintain a high level of nonlinearity as well as numerous similarities in the topology and node attributes. There is also the more common graph adversarial learning, whose various methods are still based on generators and discriminators that improve the ability of the model by means of adversarial training. The discriminator in GraphGAN [
29] makes the node pairs in the original network graph more similar, and lets the node pairs generated by the generator have less similarity. ANE [
30] applies generative adversarial networks as an additional regularization to existing network-representation-learning methods by treating prior distributions as real data and node vectors as generative samples. GraphSGAN [
31] designs a new optimization objective with multiple complex loss terms by means of semi-supervised learning to ensure that samples are generated in the density difference when the generator is at equilibrium. NetGAN [
8] instead treats graph generation as learning a distribution with biased random wandering and proposes a generative adversarial network framework for generating and distinguishing random wandering using LSTM.
The autoencoder [
32] is an unsupervised neural network model with two stages, decoding and encoding, and it typically employs a deep neural network. The graph autoencoder (GAE) [
9] invokes the idea of the autoencoder, using the GCN in the encoding phase and using the form of inner product in the decoding part, which is suitable for unsupervised learning. The graph auto-encoder aims to learn a condensed graph representation by minimizing the difference between the reconstructed adjacency matrix and the original matrix, which serves as the loss function to train the model and learn node features. The graph variational autoencoder (VGAE) [
9] introduces a Gaussian distribution to constrain the distribution of low-dimensional vectors based on the GAE, and by sampling in the low-dimensional vector distribution, it can obtain approximately real samples. DNENC [
33] employed a neighbor-aware GAE and an end-to-end learning approach to gather neighbor information. Building on these models, numerous following models are developed for encoders by incorporating regularization, higher-order neighbor information, and so on. ARGA [
34], DAEGC [
35], AGC [
36], GEC-CSD [
37], and other common approaches are listed below. In addition, several optimization methods such as the reconfiguration loss optimizer [
38] and the modularity optimizer [
39] are given. Motivated by these methodologies, we propose our approach in this paper.
3. Methodology
In this section, we focus on the model framework designed for attributed graph embedding. The overall architecture of RCAGE is shown in
Figure 1.
3.1. Problem Description
We define an attributed graph as . represents the nodes in the graph , which can be expressed as ( is the number of nodes). is a set of edges, where denotes the edge between node and node . is the features of the nodes in the graph , where represents the feature of node . We use an adjacency matrix to represent the edges in order to better express the graph topology, where if ; otherwise, .
We want to obtain a d-dimensional vector for each node in the attributed graph by training it with a function F. This process can be expressed as , where (d << n) is the final learned embedding matrix. We want to retain as comprehensive the information as possible about node attributes and graph topology in order to have better performance in downstream tasks.
In this paper, we pick node clustering and link prediction as the graph downstream tasks. The purpose of the node-clustering task is to partition all nodes into different classes so that the similarity of node features within the same class is as large as possible. The link prediction task determines whether a link exists between two nodes based on their characteristics.
3.2. Graph Autoencoder
3.2.1. Centrality Encoding
Different nodes in a network may have varying degrees of significance. The self-attention module, which primarily uses node semantic properties to determine similarities, however, does not take into account this information. Node centrality, which gauges a node’s importance in the network, is often a powerful indicator of graph comprehension [
40]. For example, celebrities with enormous followings are a key component in anticipating social-networking trends. Such content should be a useful signal for graph learning, but it is ignored in the present attention computation. As an extra signal to the neural network, we employ the degree centrality, one of the accepted centrality metrics in the paper. We include it in the input node attributes when we apply the centrality encoding to each node.
In Equation (1), learnable embedding vectors are determined by the outdegree and indegree . For undirected graphs, the above two can be unified as . In this way, the model can better capture the node importance during training with the attention mechanism.
3.2.2. Graph Attentional Encoder
In this paper, we design a variation of the graph attention network as an encoder to capture both node attributes and graph structure in a consolidated framework. Depending on the node’s degree in a real graph network, the neighbors’ level of contribution to the central node will vary. By introducing an attention mechanism, we may increase the weights of neighbor nodes that are more pertinent to the central node when learning the node representation in order to gauge the significance of various neighbors. The expression is as follows:
In the Equation (2), depicts the output representation for node , with representing its neighbor set. The attention coefficient is used to measure the significance of adjacent node to node . is a nonlinear function.
For the node attributes, a single-layer fully connected network is used to calculate the similarity coefficients, and the weight vector is denoted as
.
In terms of graph topology, the impact of various-order neighbor nodes must be considered. We cannot take into consideration merely 1-hop neighbor information as in the GAT model, due to the complexity of the graph structure relationship. Here, by setting a parameter, we give consideration to multi-order neighbor information.
where
,
if
, and
otherwise.
and
mean the outdegree and indegree of node
, respectively.
stands for the topological correlation between nodes
and
up to
-hops.
is a parameter that can be set to a value of your choice for different datasets. For undirected graphs,
and
may not be distinguished and are uniformly defined as
.
To make the attention coefficients easily comparable across nodes, they are normalized in the set
with the softmax function. The formula is as follows, where
:
The following equation can be used to express the attention coefficient once the activation function LeakyReLU and centrality parameter have been introduced to this base.
where
is input to the model. It is then trained in two stacked graph attention layers to integrate node attributes and graph structure, and finally output the embedding results
.
3.3. Random Walk Regularization
In this section, we apply random walk regularization, drawing inspiration from DeepWalk and Word2vec, to improve the learning of potential node representation information. We use random walk with restarts to sample and its neighbor nodes satisfying certain conditions as a combination, and apply the SkipGram idea to learn the potential representation of nodes.
Random walk with restarts [
41] algorithm is an enhancement to the random walk algorithm. Beginning with node
in the graph, our approach presents two possibilities at each step: selecting a neighboring node at random or going back to the origin. The parameter p governs the likelihood of resuming from the original node, whereas
controls the chance of shifting to an adjacent node. According to this method, we can obtain a set of context nodes
, which can capture the multifaceted relationship between two nodes and the overall structural information of the graph.
Analogous to the NLP tasks, we consider the sampled set
as a sentence, and we aim to maximize the co-occurrence probability of node
with other nodes in this window. This can be expressed as the following equation:
In Equation (10), and denote the potential representation of the node with encoder.
3.4. Decoder
Now, the graph decoder mainly includes three types: reconstructing attributes, reconstructing graph topology, or both of the above. In this paper, the embedding matrix we finally obtain already includes both node attributes and graph topology information, so we directly adopt the form of inner product decoder:
3.5. Reconstruction Loss
We use the loss of decoder reconstructing attributes and graph topology as reconstruction loss, which is a flexible and efficient method. The specific formula can be expressed as follows:
5. Conclusions
In this paper, we propose the RCAGE model and apply it to graph representation learning. In this study, we employ centrality coding to quantify the significance of each node in the network. This information, together with the raw features and graph structure, are given into the model. To better combine node characteristics and graph topology information, we adopt an attention mechanism that takes into account the effect of the node degree. We also use random walk with restarts to sample node neighbors and use it as a regularization to learn potential representations of nodes. The final experimental results show that our model performs well for unsupervised learning tasks such as node clustering and link prediction.
In the future, we plan to investigate expanding the framework to more sophisticated and time-varying graphs, as well as further learning of edge attributes and global location information, in order to allow more accurate graph-embedding representations.