1. Introduction
Given a graph
$G(\tilde{\mathcal{N}},E)$, where
$\tilde{\mathcal{N}}$ is the set of vertices of cardinality
$\tilde{\mathcal{N}}=N$, and
E is the set of edges of cardinality
$\leftE\right=M$, finding the maximum set of vertices wherein no two of which are adjacent is a very difficult task. This problem is known as the maximum independent set problem (MIS). The maximum independent set problem can be visualized as a quest to find the largest group of nonneighboring vertices within a graph. Imagine a party where guests represent vertices and the friendships between them represent edges. The MIS is akin to inviting the maximum number of guests such that no two of them are friends, thus ensuring no prior friendships exist within this subset of guests. In graph theoretic terms, it seeks to identify the largest subset of vertices in which no two vertices share an edge. This problem has broad implications and applications, ranging from network design, scheduling, and even in areas such as biology, where one may wish to determine the maximum set of species in a habitat without competition. It was shown to be NPhard, and no known polynomial algorithm can be guaranteed to solve it [
1]. In other words, finding a set
$\mathcal{I}$ of vertices, with the maximum cardinality, such that for every two vertices
$i,j\in \mathcal{I}$, there is no edge connecting the two, i.e.,
$(i,j)\notin E$, needs a time that is superpolynomial if P ≠ NP.
For example, the first nontrivial exact algorithm for the MIS was due to Tarjan and Trojanowski’s
$O\left({2}^{N/3}\right)\sim O\left({1.2599}^{N}\right)$ algorithm in 1977 [
2]. Since then, many improvements have been obtained. Today, the best algorithm that can solve the MIS exactly needs a time
$O\left({1.1996}^{N}\right)$ [
3]. Those results are bound obtained in the worstcase scenario [
4]. We direct the interested reader to [
3], and the references therein, for a complete discussion on exact algorithms.
The MIS is important for applications in computer science, operations research, and engineering via such uses as graph coloring, assigning channels to the radio stations, register allocation in a compiler, artificial intelligence etc. [
5,
6,
7,
8].
In addition to having several direct applications [
9], the MIS is closely related to another wellknown optimization problem, the maximum clique problem [
10,
11]. In order to find the maximum clique (the largest complete subgraph) of a graph
$G(\tilde{\mathcal{N}},E)$, it suffices to search for the maximum independent set of the complement of
$G(\tilde{\mathcal{N}},E)$.
The MIS has been studied on many different random structures, in particular on ErdősRényi graphs (ER) and random dregular graphs (RRG). An ErdősRényi graph ${G}_{ER}(N,p)$ is a graph that is selected from the distribution of all graphs of order N, where two different vertices are connected to each other via a probability p. A random dregular graph is a graph that is selected from the distribution of all dregular graphs on N vertices, with $Nd$ being even. A dregular graph is defined as a graph where each vertex has the same number of neighbors, i.e., d.
For the ErdősRényi class, with
$p=0.5$, known local search algorithms can find solutions at a rate of only up to half the maximum independent set present, which is
$\sim 2{log}_{1/(1p)}N$ [
12] in the limit
$N\to \infty $.
This behavior also appears for random dregular graphs ${G}_{d}\left(N\right)$.
1.1. Related Works
For example, Gamarnik and Sudan [
13] showed that, for a sufficiently large value of
d, local algorithms cannot find the size of the largest independent set in a
dregular graph of a large girth with an arbitrarily small multiplicative error.
The result of Gamarnik and Sudan [
13] was improved by Rahman and Virág [
14], who analyzed the intersection densities of many independent sets in random
dregular graphs. They proved that for any
$\u03f5>0$, local algorithms cannot find independent sets in random
dregular graphs with an independence ratio larger than
$(1+\u03f5)\frac{lnd}{d}$ if
d is sufficiently large. The independence ratio is defined as the density of the independent set; thus,
$\alpha =\left\mathcal{I}\right/\tilde{\mathcal{N}}$. Recently, the exact value of the independence ratio for all sufficiently large
d values was given by Ding et al. [
15].
However, these results appear to say nothing about small and fixed d values. When d is small and fixed, e.g., $d=3$ or $d=30$, indeed, only lower and upper limits, expressed in terms of the independence ratio, are known.
Lower bounds on the independent sets’ size identify sets that an efficient algorithm can find, while upper bounds are on the actual maximum independent set, not just on the size an algorithm can find.
The first upper bound for such a problem was given in 1981 by Bollobás [
16]. He showed that the supremum of the independence ratio of 3regular graphs with large girths was less than
$6/13\sim 0.461538$ in the limit of
$N\to \infty $.
McKay, in 1987, improved and generalized this result to
dregular graphs with large girths, for different values of
d [
17], by using the same technique and a much more careful calculation. For example, for the cubic graph (the 3regular graph), he was able to push Bollobás upper bound down to
$0.455370$. However, since then, only for cubic graphs, the upper bound has been improved by Balogh et al. [
18], namely, to
$0.454$. Cavity methods suggest a slightly lower upper bound and, thus, a smaller gap at small values of
d [
19]. For example, the upper bound given in [
19] for
$d=3$ was
$0.4509$, while for
$d=4$, it was
$0.4112$. In [
15], it was shown that this approach can be rigorously proven, but again, only for large
d values. Recently, however, this approach has been proven for
$d\le 19$ in [
20].
Remarkable results for lower bounds were first obtained by Wormald in 1995 [
21]. He considered processes in which random graphs are labeled as they are generated and derived the conditions under which the parameters of the process concentrate around the values that come from the solution of an associated system of differential equations, which are equations for the populations of various configurations in the graph as it is grown. By solving the differential equations, he computed the lower bounds for any fixed
d returned by a prioritized algorithm, thereby improving the values of the bounds given by Shearer [
22].
This algorithm is called prioritized, because there is a priority in choosing vertices added to the independent set [
23]. It follows the procedure of choosing vertices in the independent set
$\mathcal{I}$ one by one, with the condition that the next vertex is chosen randomly from those with the maximum number of neighbors adjacent to the vertices already in
$\mathcal{I}$. After each new vertex in
$\mathcal{I}$ is chosen (or labeled with an
I), we must complete all of its remaining connections and label the neighbors, which are identified as members of the set
$\mathcal{V}$ (for vertex cover). Although each vertex in
$\mathcal{I}$ can be chosen according to its priority, the covering vertices that complete its unfilled connections must then be chosen at random among the remaining connections to satisfy Bolobas’ configuration model [
21].
This priority is a simple way to minimize the size of the set of covered vertices and maximize the number of sites remaining as candidates for the set $\mathcal{I}$. More precisely, we are given a random dregular graph ${G}_{d}\left(N\right)$, and we randomly choose a site i from the set of vertices $\tilde{\mathcal{N}}$. We set i into $\mathcal{I}$, and we set all of the vertices neighboring i into a set $\mathcal{V}$. We label the elements of $\mathcal{I}$ with the letter I, while the elements of $\mathcal{V}$ are labeled with the letter V. Then, from the subset of vertices in $\tilde{\mathcal{N}}$ that are neighbors of vertices in $\mathcal{V}$, but are not yet labeled I or V, we randomly choose the element k that has the maximum number of connections with sites in $\mathcal{V}$. We set it into $\mathcal{I}$. The vertices neighboring k, which are not in $\mathcal{V}$, are added to the set $\mathcal{V}$. This rule is repeated until $\left\mathcal{I}\right+\left\mathcal{V}\right=N$. Along with this algorithm, one can consider an associated algorithm that simultaneously generates the random dregular graph ${G}_{d}\left(N\right)$ and labels vertices with the letter I or V. This associated algorithm, which will be described in detail in the next sections, allowed Wormald to build up the system of differential equations used for computing lower bounds for the MIS.
Improvements on this algorithm were achieved by Duckworth et al. [
24]. These improvements were obtained by observing, broadly speaking, that the size of the structure produced by the algorithm is almost the same for
dregular graphs of very large girths as it is for a random
dregular graph. However, since then, new lower bounds have been achieved only at small values of
d, e.g.,
$d=3$ and
$d=4$. Interesting results at
$d=3$ have been achieved by Csóka, Gerencsér, Harangi, and Virág [
25]. They were able to find an independent set of cardinality of up to
$0.436194\phantom{\rule{0.166667em}{0ex}}N$ using invariant Gaussian processes on the infinite
dregular tree. This result was once again improved by Csóka [
26] alone, who was able to increase the cardinality of the independent set on largegirth 3regular graph by up to
$0.445327N$ and on a largegirth 4regular graph by up to
$0.404070N$, by numerically solving the associated system of differential equations.
These improvements were obtained by deferring the decision as to whether a site $i\in \tilde{\mathcal{N}}$ must be labeled with the letter I or V. More precisely, this requires that the sites for which a decision is deferred need additional (temporary) labels. This means that counting the evolution of their populations, either through a differential equation or with an experiment, becomes more complicated.
Csóka [
26] was able to improve the lower bounds for
$d=3$ and
$d=4$, but his method was not applicable to values of
$d>4$. These bounds cannot be considered fully rigorous, as they require some kind of computer simulation or estimation [
20].
1.2. Main Results
This paper aims to compute independent set density for any
$d\ge 5$ using an experimental approach, i.e., algorithms that are linear in
N.
Table 1 presents the best upper and lower bounds for
$d\in [5,100]$ in the first and second columns, respectively. Recently, the authors in [
27] presented a Monte Carlo method that can experimentally outperform any algorithm in finding a large independent set in random
dregular graphs, in a (using the words of the authors) “
running time growing more than linearly in N” [
27]. These authors conjectured lower bound improvements only for
$d=20$ and
$d=100$, but with experimental results obtained on random
dregular graphs of the order
$N=5\xb7{10}^{4}$. However, in this work, we are interested in comparing our results with the ones given by the family of prioritized algorithms, because we believe that a rigorous analysis of the computational complexity should be performed on these types of algorithms.
In this paper, as stated above, we present the experimental results of a greedy algorithm, i.e., a deferred decision algorithm built upon existing heuristic strategies, which leads to improvements on the known lower bounds of a large independent set in random
dregular graphs
$\forall d\in [5,100]$ [
21,
24,
28]. These bounds cannot be considered fully rigorous, as they require computer simulations and estimations. However, the main contribution of this manuscript is to present a new method that is able to return, on average, independent sets for random
dregular graphs that before were not possible. This new algorithm runs in linear time
$O\left(N\right)$ and melds Wormald’s, Duckworth’s and Zito’s, and Csoka’s ideas of prioritized algorithms [
21,
24,
26,
28]. The results obtained here are conjectured new lower bounds for a large independent set in random
dregular graphs. They were obtained by inferring the asymptotic values that our algorithm can reach when
$N\to \infty $ and by averaging sufficient simulations to achieve confidence intervals at
$99\%$. These results led to improvements on the known lower bounds
$\forall d\in [5,100]$ that, as far as we know, have not been reached by any other greedy algorithm (see fourth column
Table 1). Although the gap regarding upper bounds is still present, these improvements may imply new rigorous results for finding a large independent set in random
dregular graphs.
1.3. Paper Structure
The paper is structured as follows: in
Section 2, we define our deferred decision algorithm, and we introduce a site labeling, which will identify those sites for which we defer the
$I/V$ labeling decisions. In
Section 3, we present the deferred decision algorithm for
$d=3$, and we introduce the experimental results obtained on random 3regular graphs of sizes up to
${10}^{9}$ as a sanity check of our experimental results. We recall that the order of a graph
$G(\tilde{\mathcal{N}},E)$ is the cardinality of its vertex set
$\tilde{\mathcal{N}}$, while the size of a graph
$G(\tilde{\mathcal{N}},E)$ is the cardinality of its edge set
E. In
Section 4, we present our deferred decision algorithm for
$d\ge 5$, and the experimental results associated with it, using extrapolation on random
dregular graphs with sizes of up to
${10}^{9}$.
2. Notation and the General Operations of the Deferred Decision Algorithm
In this section, we define the notation used throughout this manuscript, and we define all of the operations that will be used to understand the deferred decision algorithm. As a starting point, and also for the following section, we define the set $\mathcal{N}=\tilde{\mathcal{N}}$ as the set of unlabeled nodes. We start by recalling that we deal with random dregular graphs ${G}_{d}\left(N\right)$, where d is the degree of each vertex $i\in \mathcal{N}$, where $\mathcal{N}$ is the set of vertices, and $N=\left\mathcal{N}\right$. All vertices $i\in \mathcal{N}$ are unlabeled.
In order to build a random
dregular graph, we used the method described in [
21] and introduced in [
16].
Definition 1 (Generator of Random dRegular Graphs Algorithm). We take $dN$ points, with $dN$ being even, and distribute them in N urns that are labeled $1,2,\cdots ,N$, with d points in each urn. We choose a random pairing $P={p}_{1},\cdots ,{p}_{dN/2}$ of the points such that ${p}_{i}=2\forall i$. Each urn identifies a site in $\mathcal{N}$. Each point is in only one pair ${p}_{i}$, and no pair contains two points in the same urn. No two pairs contain four points from just two urns. In order to build a dregular graph ${G}_{d}\left(N\right)$, we then connect two distinct vertices i and j if some pair has a point in urn i and one in urn j. The conditions on the pairing prevent the formation of loops and multiple edges.
The referred pairing must be chosen uniformly at random and subjected to the constraints given. This can be done by repeatedly choosing an unpaired point and then choosing a partner for this point to create a new pair. As long as the partner is chosen uniformly at random from the remaining unpaired points, and as long as the process is restarted if a loop or multiple edge is/are created, the result is a random pairing of the required type [
21].
In this paper, we use the method described above so that while we generate the random dregular graph ${G}_{d}\left(N\right)$, concurrently with our labeling process, we identify new links as labeling sites.
The graphs built using the
Generator of Random dRegular Graphs Algorithm prevent the formation of loops and multiple edges without introducing bias in the distribution where we sample the graphs [
21,
23].
We define two separate sets $\mathcal{I}$ and $\mathcal{V}$ for independent and vertex cover sites, respectively. $\mathcal{I}$ identifies the set of graph nodes that satisfies the property wherein no two nodes are adjacent, and $\mathcal{V}$ is its complement. A site $i\in \mathcal{I}$ is labeled with the letter I, while a site $j\in \mathcal{V}$ is labeled with the letter V.
We define ${\Delta}_{i}$ to be the degree of a vertex i, i.e., the number of links that a site is connected with, while ${\overline{\Delta}}_{i}$ is the antidegree of a vertex i, i.e., the number of free connections that i needs to complete during the graphbuilding process. Of course, the constraint ${\Delta}_{i}+{\overline{\Delta}}_{i}=d$ is always preserved $\forall i\in \mathcal{N}$. At the beginning of the graphbuilding process, all $i\in \mathcal{N}$ have ${\Delta}_{i}=0,\phantom{\rule{0.166667em}{0ex}}{\overline{\Delta}}_{i}=d$. At the end of the graphbuilding process, all graph nodes will have ${\Delta}_{i}=d,\phantom{\rule{0.166667em}{0ex}}{\overline{\Delta}}_{i}=0$. We define $\partial i$ to be the set that contains all of the neighbors of i.
For the sake of clarity, we define a simple subroutine on a single site
$i\in \mathcal{N}$ of the
Generator of Random dRegular Graphs Algorithm (Subroutine GA(
i,
${\overline{\Delta}}_{i}$)) that will be useful for describing the algorithm presented in the next sections. The Subroutine GA(
i,
${\overline{\Delta}}_{i}$) generates the remaining
${\overline{\Delta}}_{i}$ connections of site
i. It keeps the supporting data to reflect the evolution of the network growth.
Algorithm 1: Subroutine GA(i, ${\overline{\Delta}}_{i}$). 
Input: $i\in \mathcal{N}$, ${\overline{\Delta}}_{i}$;
Output: i connected with d sites; _{1} Using the rules in Definition 1, i is connected randomly with ${\overline{\Delta}}_{i}$sites; _{2} ${\overline{\Delta}}_{i}=0$; _{3} ${\Delta}_{i}=d$; _{4} return i connected with d sites;

The sites that we choose following some priority, either the one we describe or any other scheme, will be called P sites. The sites that are found by following links from the P sites (or by randomly generating connections from the P sites) are called C sites. More precisely, each site $j\in \mathcal{N}$ that we choose that is not labeled yet with any letter (C or P), s.t. ${\overline{\Delta}}_{j}\le 2$ and the random connection(s) present on j is(are) on site(s) in $\mathcal{V}$, is a P site. The set $\mathcal{P}$ defines the set of P sites. The set $\mathcal{P}$ is kept in ascending order with respect to the $anti$degree ${\overline{\Delta}}_{i}$ of each site $i\in \mathcal{P}$.
In general, a site
i that is labeled as
P will be surrounded by two sites that are labeled as
C. Because the labeling of those sites is deferred, we call those
$CPC$ structures
virtual sites. A single virtual site,
$\tilde{v}$, has an antidegree
${\overline{\Delta}}_{\tilde{v}}$ equal to the sum of all of the antidegrees of sites
l that compose site
$\tilde{v}$, i.e.,
${\overline{\Delta}}_{\tilde{v}}={\sum}_{l\in \tilde{v}}{\overline{\Delta}}_{l}$. The number of sites
$l\in \tilde{v}$ is equal to the cardinality of
$\tilde{v}$. The degree of
$\tilde{v}$ is
${\Delta}_{\tilde{v}}=d\phantom{\rule{0.166667em}{0ex}}\left\tilde{v}\right{\overline{\Delta}}_{\tilde{v}}$. As an example, we show in
Figure 1 the operation of how a virtual site is created from a site
$s\in \mathcal{P}$ with
${\overline{\Delta}}_{s}=2$ and
${\Delta}_{s}=1$, as well as two sites
$j,k\in \mathcal{N}$ with
${\overline{\Delta}}_{j}=3$,
${\overline{\Delta}}_{k}=3$,
${\Delta}_{j}=0$, and
${\Delta}_{k}=0$. Let us assume that a site
$s\in \mathcal{N}$ s.t.
${\overline{\Delta}}_{s}=2$ exists. This is possible because a site
$l\in \mathcal{V}$ is connected with it. This means that
s must be labeled as
P and put into
$\mathcal{P}$. Let us run Subroutine GA(
s,
${\overline{\Delta}}_{s}$) on
s and assume that the
s connects with two neighbors
$j,k\in \mathcal{N}$. Given that
$j,k\in \mathcal{N}$ are connected to a
P site, they are labeled as
C values. We then define
$\tilde{v}=\{s,j,k\}$. This set is a virtual node
$\tilde{v}$ with
${\overline{\Delta}}_{\tilde{v}}=4$.
We define
$\mathcal{A}$ to be the set of virtual sites. The set
$\mathcal{A}$ is kept in ascending order with respect to the
$anti$degree
$\overline{\Delta}$ of each virtual site
$\tilde{v}\in \mathcal{A}$. Virtual sites can be created—as described above—expanded, or merged together (creating a new virtual site
$\tilde{\theta}={\cup}_{i}{\tilde{v}}_{i}$). Two examples are shown in
Figure 2 and
Figure 3.
Figure 2 shows how to expand a virtual site
${\tilde{v}}_{1}$. Let us imagine that a site
$m\in \mathcal{P}$, with antidegree
${\overline{\Delta}}_{m}=2$, is chosen. Let us run Subroutine GA(
m,
${\overline{\Delta}}_{m}$) on
m. Assume that
m connects with
$n\in \mathcal{N}$ (
${\overline{\Delta}}_{n}=3$) and with
${\tilde{v}}_{1}\in \mathcal{A}$ (
${\overline{\Delta}}_{{\tilde{v}}_{1}}=4$). In this case,
${\tilde{v}}_{1}\in \mathcal{A}$ expands itself, thereby swallowing sites
m and
n and having a
${\overline{\Delta}}_{{\tilde{v}}_{1}}=5$.
Figure 3 shows how two virtual sites merge together. Let us imagine that a site
$p\in \mathcal{P}$, with antidegree
${\overline{\Delta}}_{p}=2$, is chosen during the graphbuilding process. Let us run Subroutine GA(
p,
${\overline{\Delta}}_{p}$) on
m. Assume that
p connects with two virtual sites
${\tilde{v}}_{1}\in \mathcal{A}$ and
${\tilde{v}}_{2}\in \mathcal{A}$, with
${\overline{\Delta}}_{{\tilde{v}}_{1}}=4$ and
${\overline{\Delta}}_{{\tilde{v}}_{2}}=4$. The new structure is a virtual site
$\tilde{\theta}\in \mathcal{A}$ with
${\overline{\Delta}}_{\tilde{\theta}}=6$.
We define in the following a list of operations that will be useful for understanding the algorithm presented in the next sections.
Definition 2 ($O{P}_{move}^{i}(i,\mathcal{X},\mathcal{Y})$). Let $\mathcal{X}$ and $\mathcal{Y}$ be two sets. Let $i\in \mathcal{X}$ and $i\notin \mathcal{Y}$. We define $O{P}_{move}^{i}(i,\mathcal{X},\mathcal{Y})$ to be the operation that moves the site $i\in \mathcal{X}$ from the set $\mathcal{X}$ to $\mathcal{Y}$, i.e., $i\in \mathcal{Y}$ and $i\notin \mathcal{X}$.
For example, $O{P}_{move}^{i}(i,\mathcal{N},\mathcal{V})$ moves $i\in \mathcal{N}$ from the set $\mathcal{N}$ to $\mathcal{V}$, i.e., $i\in \mathcal{V}$ and $i\notin \mathcal{N}$. Instead, the operation $O{P}_{move}^{i}(i,\mathcal{N},\mathcal{I})$ moves $i\in \mathcal{N}$ from the set $\mathcal{N}$ to $\mathcal{I}$, i.e., $i\in \mathcal{I}$ and $i\notin \mathcal{N}$. We recall that when a site is set into $\mathcal{I}$, it is labeled with I, while when a site is set into $\mathcal{V}$, it is labeled with V.
Definition 3 ($O{P}_{del}^{\tilde{v}}(\tilde{v},\mathcal{A})$). Let $\mathcal{A}$ be the set that contains the virtual nodes $\tilde{v}$. We define $O{P}_{del}^{\tilde{v}}(\tilde{v},\mathcal{A})$ to be the operation that deletes the site $\tilde{v}\in \mathcal{A}$ from the set $\mathcal{A}$, i.e., the element $\tilde{v}\notin \mathcal{A}$ anymore, and it applies the operation $O{P}_{move}^{i}(i,\mathcal{X},\mathcal{Y})$ on each site $i\in \tilde{v}$ through the following rule:
if $i\in \tilde{v}$ is labeled with the letter P, then $\mathcal{X}=\mathcal{N}$ and $\mathcal{Y}=\mathcal{I}$;
if $i\in \tilde{v}$ is labeled with the letter C, then $\mathcal{X}=\mathcal{N}$ and $\mathcal{Y}=\mathcal{V}$.
Definition 4 ($SWAPOP\left(\tilde{v}\right)$). Let $\tilde{v}\in \mathcal{A}$. We define $SWAPOP\left(\tilde{v}\right)$ to be the operation such that $\forall i\in \tilde{v}$:
if i is labeled as P, then the label P swaps to C;
if i is labeled as C, then the label C swaps to P.
Figure 4 shows how $SWAPOP\left(\tilde{v}\right)$ acts on a virtual site $\tilde{v}$. 3. The Deferred Decision Algorithm for $\mathit{d}=\mathbf{3}$
In this section, we present our algorithm, which is simpler and slightly different from the one in [
26], but it is based on the same idea for determining a large independent set in random
dregular graphs with
$d=3$, i.e.,
${G}_{3}\left(N\right)$. The algorithm in [
26] works only for
$d=3$ and
$d=4$. This exercise is performed as a sanity check for the algorithm. This algorithm will also be at the core of the algorithm developed in
Section 4.
As mentioned above, the algorithm discussed in this paper is basically a prioritized algorithm, i.e., an algorithm that makes local choices in which there is a priority in selecting a certain site. Our algorithm belongs to this class.
We start the discussion on the local algorithm for $d=3$ by giving the pseudocode of the algorithm in Algorithm 2.
The algorithm starts by building up a set of sites $\mathcal{N}$ of cardinality $\left\mathcal{N}\right=N$, and from the set it randomly picks a site i. Then, the algorithm completes the connections of site i in a random way by following the method described in Algorithm 1. Once all its connections are completed, site i has ${\Delta}_{i}=3$ and ${\overline{\Delta}}_{i}=0$. It is labeled with the letter V, erased from $\mathcal{N}$, and set into $\mathcal{V}$. In other words, operation $O{P}_{move}^{i}(i,\mathcal{N},\mathcal{V})$ is applied on it. Each neighbor of i, i.e., $j\in \partial i$, has degree ${\Delta}_{j}=1$ and antidegree ${\overline{\Delta}}_{j}=2$. Therefore, they are set into $\mathcal{P}$ and thus labeled P values.
The algorithm picks a site k from $\mathcal{P}$ with the minimum remaining connections. In general, if k has ${\overline{\Delta}}_{k}\ne 0$, the algorithm completes all its connections, and it removes it from $\mathcal{P}$. Each site connected with a P value is automatically labeled with the letter C. If a site $k\in \mathcal{P}$ connects to another site $j\in \mathcal{P}$, with $j\ne k$, j is removed from $\mathcal{P}$, and it is labeled as a C value.
If $k\in \mathcal{P}$ has ${\overline{\Delta}}_{k}=0$, the site k is set into $\mathcal{I}$, and it is removed from $\mathcal{N}$ and $\mathcal{P}$, i.e., the algorithm applies the operation $O{P}_{move}^{k}(k,\mathcal{N},\mathcal{I})$.
As defined in
Section 2, a
$CPC$ structure is equivalent to a single virtual site,
$\tilde{v}$, which has an antidegree
${\overline{\Delta}}_{\tilde{v}}$. Each virtual site
$\tilde{v}$ is created with
${\overline{\Delta}}_{\tilde{v}}>2$, and it is inserted into the set
$\mathcal{A}$.
Once the set $\mathcal{P}$ is empty, and if and only if in $\mathcal{A}$ the virtual sites with antidegrees less than or equal to 2 are not present, the algorithm selects a site $\tilde{v}\in \mathcal{A}$ with the largest antidegree ${\overline{\Delta}}_{\tilde{v}}$, and it applies the operation $O{P}_{del}^{\tilde{v}}(\tilde{v},\mathcal{A})$ after having completed all the connections $\forall i\in \tilde{v}$ with ${\overline{\Delta}}_{i}\ne 0$ by using Algorithm 1 on each $i\in \tilde{v}$ with ${\overline{\Delta}}_{i}\ne 0$.
We apply operation $O{P}_{del}^{\tilde{v}}(\tilde{v},\mathcal{A})$ on virtual sites $\tilde{v}\in \mathcal{A}$ with the largest antidegree, because we hope the random connections outgoing from those sites will reduce the antidegrees of the existing virtual sites in $\mathcal{A}$ in such a way that the probability of having virtual nodes with antidegrees of ${\overline{\Delta}}_{\tilde{v}}\le 2$ increases. In other words, we want to create islands of virtual sites that are surrounded by a sea of V sites in order to apply the $SWAPOP\left(\tilde{v}\right)$ on those nodes. This protocol, indeed, makes it possible to increase the independent set cardinality and decrease the vertex cover set cardinality.
Algorithm 2: local algorithm for d = 3. 

For this reason, if virtual nodes with antidegrees of ${\overline{\Delta}}_{\tilde{v}}\le 2$ exist in $\mathcal{A}$, those sites have the highest priority in being selected. More precisely, the algorithm follows the priority rule:
$\forall \tilde{v}\in \mathcal{A}$ s.t. ${\overline{\Delta}}_{\tilde{v}}=0$, the algorithm sequentially applies the operation $SWAPOP\left(\tilde{v}\right)$ and then the operation $O{P}_{del}^{\tilde{v}}(\tilde{v},\mathcal{A})$.
If no virtual sites $\tilde{v}\in \mathcal{A}$ with ${\overline{\Delta}}_{\tilde{v}}=0$ are present, then the algorithm looks for those that have ${\overline{\Delta}}_{\tilde{v}}=1$. $\forall \tilde{v}\in \mathcal{A}$ s.t. ${\overline{\Delta}}_{\tilde{v}}=1$, it applies the operation $SWAPOP\left(\tilde{v}\right)$, completes the last connection of the site $i\in \tilde{v}$ with ${\overline{\Delta}}_{i}=1$, applies on the last neighbour of i added to $O{P}_{move}^{j}(j,\mathcal{N},\mathcal{V})$, and then applies $O{P}_{del}^{\tilde{v}}(\tilde{v},\mathcal{A})$.
If no virtual sites $\tilde{v}\in \mathcal{A}$ with ${\overline{\Delta}}_{\tilde{v}}=0$ and ${\overline{\Delta}}_{\tilde{v}}=1$ are present, then the algorithm looks for those that have ${\overline{\Delta}}_{\tilde{v}}=2$. $\forall \tilde{v}\in \mathcal{A}$ s.t. ${\overline{\Delta}}_{\tilde{v}}=2$, it applies the operation $SWAPOP\left(\tilde{v}\right)$, completes the last connections of the sites $i\in \tilde{v}$ with ${\overline{\Delta}}_{i}\ne 0$, labels the new added sites with the letter C, and updates the degree and the antidegree of the virtual node $\tilde{v}$.
The algorithm proceeds by selecting virtual nodes and creating sites labeled as P values until $\mathcal{N}=\varnothing $. Once $\mathcal{N}=\varnothing $, it returns the set $\mathcal{I}$, which is the set of independent sites. The code of the algorithm can be released upon request.
We are comparing numerical results for independence ratios that agree with theoretical ones, at least up to the fifth digit. For this reason, we performed an accurate analysis on random 3regular graphs, starting from those that had an order of ${10}^{6}$ and pushing the number up to $5\xb7{10}^{8}$.
This analysis aimed to compute the sample mean of the independence ratio size
$\alpha \left(N\right)$ that was outputted by our algorithm. Each average was obtained in the following manner: for graphs of the order
$N={10}^{6}$, we averaged over a sample of
${10}^{4}$ graphs; for the order
$N=2.5\xb7{10}^{6}$, we made an average over a sample of
$7.5\xb7{10}^{3}$ graphs; for the order
$N=5\xb7{10}^{6}$, we made an average over a sample of
$5\xb7{10}^{3}$ graphs; for the order
$N={10}^{7}$, the average was performed over a sample of
${10}^{3}$ graphs; we performed this average for the
$N=2.5\xb7{10}^{7}$ over
$7.5\xb7{10}^{2}$ graphs, for the
$N=5\xb7{10}^{7}$ over
$5\xb7{10}^{2}$ graphs, for the
$N={10}^{8}$ over
${10}^{2}$ graphs, for the
$N=2.5\xb7{10}^{8}$ over 50 graphs, and for the
$N=5\xb7{10}^{8}$ over 10 graphs. The mean and the standard deviation for each analyzed sample are reported in
Table 2. Upon observing that the values of each independent set ratio sample mean reached an asymptotic value, we performed a linear regression on the model
$f\left(N\right)=(a/lnN)+{\alpha}_{\infty}$ in order to estimate the parameter
${\alpha}_{\infty}$ (blue line in
Figure 5). When
$N\to \infty $, the first term of the regression, i.e.,
$(a/lnN)$, went to 0, thereby leaving out the value of
${\alpha}_{\infty}$ that describes the asymptotic value of the independence ratio that our algorithm can reach. After adopting the model
$f\left(N\right)=(a/lnN)+{\alpha}_{\infty}$, one might naturally question the rationale behind the specific choice of
$\frac{1}{lnN}$ as a regressor. This decision is grounded in the intrinsic nature of large graphs. In numerous instances, as seen in smallworld networks, the properties of these networks scale logarithmically with their size. The logarithmic factor effectively translates the expansive range of data values into a scale that is more analytically tractable. By employing
$\frac{1}{lnN}$ as our regressor, we are capturing the progressively diminishing influence of augmenting
N on our parameter,
${\alpha}_{\infty}$. With each incremental increase in
N, the relative change it induces becomes less significant. This logarithmic term encapsulates this tapering sensitivity, thereby making it a fitting choice for our regression model. Using the numerical standard errors obtained from each sample, we applied a general least square (GLS) method [
29] in order to infer the values of the parameters
${\alpha}_{\infty}$, thereby averaging sufficient simulations to achieve a confidence interval of
$99\%$ on them. The value of
${\alpha}_{\infty}$ is the most important, because it is the asymptotic value that our algorithm can reach when
$N\to \infty $. From the theory of GLS, we know that the estimator of the parameter
${\alpha}_{\infty}$ is unbiased, consistent, and efficient, and a confidence interval on this parameter is justified. The analysis, performed on data reported in
Table 2, shows that the independent set ratio reached the asymptotic value
${\alpha}_{\infty}=0.445330\left(3\right)$. This value agrees with the theoretical value proposed in [
26].
4. The Deferred Decision Algorithm for $\mathit{d}\ge \mathbf{5}$
In this section, we present how to generalize the prioritized algorithm for all
$d\ge 5$. It, as with the one previously described in
Section 3, builds the random regular graph and, at the same time, tries to maximize the independent set cardinality
$\left\mathcal{I}\right$. The main idea that we propose is to melt down two existing algorithms, namely, the one in [
21] and the one described above, into a new prioritized algorithm, which is able to maximize the independent set cardinality, thereby providing improved estimates of the lower bounds. The new conjectured lower bounds come from extrapolation on random
dregular graphs of sizes up to
${10}^{9}$.
Before introducing the algorithm, we present a new operation that will allow us to simplify the discussion.
Definition 5 ($O{P}_{builddel}^{i}(i,\mathcal{N},\mathcal{I},\mathcal{V})$). Let $i\in \mathcal{N}$. We define $O{P}_{builddel}^{i}(i,\mathcal{N},\mathcal{I},\mathcal{V})$ as the operation that connects i to ${\overline{\Delta}}_{i}$ sites following the Algorithm 1 rules, applies $O{P}_{move}^{i}(i,\mathcal{N},\mathcal{I})$, and, $\forall j\in \partial i$, sequentially runs Algorithm 1 and applies the operation $O{P}_{move}^{j}(j,\mathcal{N},\mathcal{V})$.
The pseudocode of the last operation is described in Algorithm 3.
Algorithm 3: ${OP}_{\mathit{build}\mathit{del}}^{i}(i,\mathcal{N},\mathcal{I},\mathcal{V})$. 

We start the discussion on the local algorithm for $d\ge 5$ by giving the pseudocode of the algorithm in Algorithm 4.
The algorithm starts randomly selecting a site
z from the set of all nodes
$\mathcal{N}$, i.e.,
$z\in \mathcal{N}$. It then applies
$O{P}_{builddel}^{z}(z,\mathcal{N},\mathcal{I},\mathcal{V})$ on the site
z (see
Figure 6). This operation creates nodes with different degrees and antidegrees. The algorithm proceeds in choosing the node
m from those values with minimum
${\overline{\Delta}}_{m}$ values. If the node
m has
${\overline{\Delta}}_{m}>2$ values, the algorithm applies the operation
$O{P}_{builddel}^{m}(m,\mathcal{N},\mathcal{I},\mathcal{V})$ on site
m. In other words, we are using the algorithm developed in [
21] until a site
$m\in \mathcal{N}$ with
${\overline{\Delta}}_{m}\le 2$ pops up. When such a case appears, we label it as a
P site and we move it into the set
$\mathcal{P}$.
As described in
Section 3, once the set
$\mathcal{P}$ is not empty, the sites in the set
$\mathcal{P}$ have the highest priority in being processed for creating virtual nodes.
Algorithm 4: local algorithm for d $\ge $ 5. 

Until the set $\mathcal{P}$ is empty, the algorithm builds virtual sites, which are set into $\mathcal{A}$.
Once the set $\mathcal{P}$ is empty, the highest priority in being processed is placed on the virtual sites contained in $\mathcal{A}$. Again, we want the random connections outgoing from the virtual sites to reduce the antidegrees of the other existing virtual sites in $\mathcal{A}$ in such a way that the probability of having virtual sites with antidegrees of ${\overline{\Delta}}_{\tilde{v}}\le 2$ becomes bigger. In order to have that, we list below the priority order that the algorithm follows for processing the virtual sites contained in $\mathcal{A}$:
$\forall \tilde{v}\in \mathcal{A}$ s.t. ${\overline{\Delta}}_{\tilde{v}}=0\vee {\overline{\Delta}}_{\tilde{v}}=1$, the algorithm sequentially applies operation $SWAPOP\left(\tilde{v}\right)$ and the operation $O{P}_{del}^{\tilde{v}}(\tilde{v},\mathcal{A})$. (in the case that ${\overline{\Delta}}_{\tilde{v}}=1$, the algorithm applies on the last added site $j\in \partial i$ the operation $O{P}_{move}^{j}(j,\mathcal{N},\mathcal{V})$ before it completes the absent connection for the site $i\in \tilde{v}$ with ${\overline{\Delta}}_{i}=1$. Then on the virtual site $\tilde{v}$, it sequentially applies operations $SWAPOP\left(\tilde{v}\right)$ and $O{P}_{del}^{\tilde{v}}(\tilde{v},\mathcal{A})$).
If $\exists \tilde{v}\in \mathcal{A}$ s.t. ${\overline{\Delta}}_{\tilde{v}}=2\wedge \nexists \tilde{q}\in \mathcal{A}$ s.t ${\overline{\Delta}}_{\tilde{q}}=0\vee {\overline{\Delta}}_{\tilde{q}}=1$, the algorithm chooses with the highest priority the site with ${\overline{\Delta}}_{\tilde{v}}=2$. Then, it applies operation $SWAPOP\left(\tilde{v}\right)$ on $\tilde{v}$, it runs the Subroutine $GA(i,{\overline{\Delta}}_{i})$$\forall i\in \tilde{v}$ with ${\overline{\Delta}}_{i}\ne 0$, and it labels each neighbor(s) of i with letter C.
If $\exists \tilde{v}\in \mathcal{A}$ s.t. ${\overline{\Delta}}_{\tilde{v}}>2$∧$\nexists \tilde{p}\in \mathcal{A}$ s.t. ${\overline{\Delta}}_{\tilde{p}}\le 2$, the algorithm chooses a site $\tilde{v}\in \mathcal{A}$ with the maximum ${\overline{\Delta}}_{\tilde{v}}$, and it applies the $O{P}_{del}^{\tilde{v}}(\tilde{v},\mathcal{A})$ with the maximum ${\overline{\Delta}}_{\tilde{v}}$ after having run the Subroutine $GA(i,{\overline{\Delta}}_{i})$ on each $i\in \tilde{v}$ such that ${\overline{\Delta}}_{i}\ne 0$.
In the case that $\mathcal{P}=\varnothing \wedge \mathcal{A}=\varnothing \wedge \mathcal{N}\ne \varnothing $, the algorithm takes a site $t\in \mathcal{N}$ with a minimum ${\overline{\Delta}}_{t}$, and it applies the operation $O{P}_{builddel}^{t}(t,\mathcal{N},\mathcal{I},\mathcal{V})$.
The algorithm works until the following condition is true: $\mathcal{N}=\varnothing \wedge \mathcal{P}=\varnothing \wedge \mathcal{A}=\varnothing $. Then, it checks that all sites in $\mathcal{I}$ are covered only by sites in $\mathcal{V}$, and it checks that no site in $\mathcal{I}$ connects to any other site in $\mathcal{I}$.
The results obtained by the algorithm for different values of
d, and different orders
N, are presented in
Table 3,
Table 4 and
Table 5. The confidence intervals of the asymptotic independent set ratio values, obtained using the extrapolation described in the previous section, are presented in
Table 1. In other words, we performed simulations for each value of
d by computing the sample mean and the standard error of the independence ratio for some values of
N. Then, we used GLS methods in order to extrapolate the values of
${\alpha}_{\infty}$ and build up its confidence interval.
From our analysis, we observed that, $\forall d>4$, our results, as far as we know, exceed the best theoretical lower bounds given by greedy algorithms. Those improvements were obtained because we allowed the virtual nodes to increase and decrease their antidegrees. In other words, this process transforms the random dregular graph into a sparse random graph, wherein it is much easier making local rearrangements (our $SWAPOP(\xb7)$ move) to enlarge the independent set. More precisely, the creation of virtual nodes that increase or decrease their antidegrees allows us to deal with a graph that is no longer dregular but has average connectivity $\langle d\rangle $.
However, this improvement decreases as
d becomes large,
$\sim 1/d$, and disappears when
$d\to \infty $ (see
Figure 7, bottom panel). Indeed, the number of
P labeled sites decreased during the graphbuilding process (see
Figure 7, top panel), thus invalidating the creation of the virtual nodes that are at the core of our algorithm. This means that our algorithm for values of
d, such that
$d\to \infty $, will reach the same asymptotic independent set ratio values obtained by the algorithm in [
21].
In conclusion, for any fixed and small d, we have that the two algorithms are distinct, and our algorithm produces better results without increasing the computational complexity.
5. Conclusions
This manuscript presents a new local prioritized algorithm for finding a large independent set in a random dregular graph at fixed connectivity. This algorithm makes a deferred decision in choosing which site must be set into the independent set or into the vertex cover set. This deferred strategy can be seen as a depthfirst search delayed in time, without backtracking. It works, and it shows very interesting results.
For all
$d\in [5,100]$, we conjecture new lower bounds for this problem. All the new bounds improve upon the best previous bounds. All of them have been obtained using extrapolation on the samples of random
dregular graphs of sizes up to
${10}^{9}$. For random 3regular graphs, our algorithm is able to reach, when
$N\to \infty $, the asymptotic value presented in [
26]. We recall that the algorithm in [
26] cannot be used for any
$d>4$. However, we think that this approach can also provide conjectured lower bound for
$d>100$. Indeed, as shown in
Figure 7, there is still space to use our algorithm for computing independent sets. Moreover, new strategies could be implemented for improving our conjectured bounds.
The improvements upon the best bounds are due to reducing the density of the graph, thereby introducing regions in which virtual sites replace multiple original nodes, and optimal labelings can be identified. The creation of virtual sites makes it possible to group together nodes of the graph to label each at a different instant with respect to their creation. Those blobs of nodes transform the random dregular graphs into a sparse graph, where the searching of a large independent set is simpler.
Undoubtedly, more complex virtual nodes can be defined, and additional optimizations can be identified. This will be addressed in a future manuscript.