A Study on Graph Centrality Measures of Different Diseases Due to DNA Sequencing

Muhiuddin, Ghulam; Samanta, Sovan; Aljohani, Abdulrahman F.; Alkhaibari, Abeer M.

doi:10.3390/math11143166

Open AccessArticle

A Study on Graph Centrality Measures of Different Diseases Due to DNA Sequencing

¹

Department of Mathematics, Faculty of Science, University of Tabuk, Tabuk 71491, Saudi Arabia

²

Department of Mathematics, Tamralipta Mahavidyalaya, West Bengal 721636, India

³

Department of Mathematical Sciences, George Mason University, Fairfax, VA 22030, USA

⁴

Department of Biology, Faculty of Science, University of Tabuk, Tabuk 71491, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(14), 3166; https://doi.org/10.3390/math11143166

Submission received: 27 May 2023 / Revised: 3 July 2023 / Accepted: 8 July 2023 / Published: 19 July 2023

(This article belongs to the Special Issue Algorithms and Models for Bioinformatics and Biomedical Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Rare genetic diseases are often caused by single-gene defects that affect various biological processes across different scales. However, it is challenging to identify the causal genes and understand the molecular mechanisms of these diseases. In this paper, we present a multiplex network approach to study the relationship between human diseases and genes. We construct a human disease network (HDN) and a human genome network (HGN) based on genotype–phenotype associations and gene interactions, respectively. We analyze 3771 rare diseases and find distinct phenotypic modules within each dimension that reflect the functional effects of gene mutations. These modules can also be used to predict novel gene candidates for unsolved rare diseases and to explore the cross-scale impact of gene perturbations. We compute various centrality measures for both networks and compare them. Our main finding is that diseases are weakly connected in the HDN, while genes are strongly connected in the HGN. This implies that diseases are relatively isolated from each other, while genes are involved in multiple biological processes. This result has implications for understanding the transmission of infectious diseases and the development of therapeutic interventions. We also show that not all diseases have the same potential to spread infections to other parts of the body, depending on their centrality in the HDN. Our results show that the phenotypic module formalism can capture the complexity of rare diseases beyond simple physical interaction networks and can be applied to study diseases arising from DNA (Deoxyribonucleic Acid) sequencing errors. This study provides a novel network-based framework for integrating multi-scale data and advancing the understanding and diagnosis of rare genetic diseases.

Keywords:

graph theory; DNA sequencing; genetic diseases; centrality theory

MSC:

05C90; 05C99

1. Introduction

Rapid developments in DNA sequencing technology over the last two decades have led to identifying the genetic causes of more than 6000 uncommon disorders [1,2,3]. Rare illnesses are commonly traced back to a single genetic lesion, as opposed to the many genetic and environmental variables that normally contribute to developing more common diseases. Hence, rare disorders provide exceptional prospects for mechanistic dissection of the association between genetic abnormalities and their phenotypic implications, therefore informing tailored treatment options. This molecularly based, individualized medication promise has already been established for certain rare illnesses, such as rare immunodeficiencies [4,5,6], neurodevelopmental [7,8], and metabolic disorders. Although highlighting the need for creative, systematic techniques to research the enormous number of rare illnesses that remain uncharacterized, the high expenditures and lengthy timescales of these individual endeavors are also a drawback. Many theoretical and operational obstacles must be overcome to achieve this goal. DNA sequencing [9,10,11,12,13] is a technique that determines the order of nucleotides in a DNA molecule. DNA sequencing can reveal the genetic information of an organism, such as its traits, functions, and susceptibility to diseases. DNA sequencing can also help to identify the genes that are involved in the development or progression of various diseases, such as cancer, diabetes, Alzheimer’s, etc.

One way to analyze the genes associated with diseases is to use graph theory models. Graph theory models represent biological systems as networks of nodes and edges, where nodes represent genes or proteins, and edges represent interactions or relationships between them. Graph theory models can help to uncover the structure and dynamics of biological networks and can also help to find the key nodes or genes that play important roles in the network.

Graph centrality measures [14,15,16] are numerical values that quantify the importance or influence of a node in a network. Graph centrality measures can be used to rank the nodes according to their topological position or function in the network. Graph centrality measures can also be used to identify the genes that are critical or essential for a biological process or a disease.

There are different types of graph centrality measures [17,18], such as degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, etc. Each type of graph centrality measure captures a different aspect of the node’s importance in the network. For example, degree centrality measures the number of edges connected to a node, closeness centrality measures the average distance from a node to all other nodes, betweenness centrality measures the number of shortest paths that pass through a node, and eigenvector centrality measures the influence of a node’s neighbors.

However, not all graph centrality measures are suitable for biological networks [19,20,21,22]. Biological networks are often complex, heterogeneous, and directed, meaning that they have many nodes and edges with different types and directions. Therefore, some graph centrality measures may not capture the true importance of a node in a biological network. For example, the degree centrality may not reflect the functional relevance of a node, closeness centrality may not account for the direction of edges, betweenness centrality may not consider the strength or weight of edges, and eigenvector centrality may not handle disconnected or cyclic networks.

Therefore, it is important to choose appropriate graph centrality measures for biological networks and diseases. Some studies have proposed novel or modified graph centrality measures that are tailored for biological networks and diseases. For example:

Feng et al. [23] proposed a hypergraph model of biological networks to identify genes critical to the pathogenic viral response. A hypergraph is a generalization of a graph that allows edges to connect more than two nodes. A hypergraph can capture the multi-way relationships among genes and proteins in biological systems. The authors defined a hypergraph betweenness centrality measure that considers both the strength and direction of hyperedges. They showed that hypergraph betweenness centrality can identify genes that are more important for viral response than standard graph centrality measures.

Naderi Yeganeh et al. [24] revisited the use of graph centrality models in biological pathway analysis. They argued that standard directed graph centralities attribute significant importance to upstream elements and evaluate downstream elements as having no importance in pathways. They proposed a directed graph framework called Source/Sink Centrality (SSC) that separately measures the importance of a node in the upstream and downstream of a pathway and combines them for evaluating the centrality. They showed that SSC-derived centralities can associate higher positional importance to cancer genes and mouse lethal genes than standard graph centralities.

Lee et al. [25,26] identified disease–gene associations using a convolutional neural network-based knowledge graph-embedding model (KGED). A knowledge graph is a type of graph that represents entities and their semantic relationships as nodes and edges with labels and attributes. A knowledge graph-embedding model is a technique that learns low-dimensional vector representations or embeddings for nodes and edges in a knowledge graph. The authors used a biological knowledge graph with gene–disease associations derived from DNA sequencing data and other sources. They used a convolutional neural network to learn embeddings for nodes and edges that capture both local and global information in the knowledge graph. They showed that KGED can predict novel gene–disease associations with high accuracy.

The major contributions are as follows:

We perform a comprehensive comparison of different graph centrality measures for both HDN and HGN, highlighting their strengths and limitations for biological network analysis.
We reveal interesting patterns and properties of HDN and HGN, such as the weak connectivity of diseases, the strong connectivity of genes, and the heterogeneity of disease centrality.
We discuss the implications of our findings for infectious disease transmission and therapeutic intervention, suggesting potential applications and directions for future research.

2. Materials and Methods

2.1. The Literature of Disease Network

To begin, a broad range of phenotypes are associated with uncommon illness events, from those that are extremely cell-type or organ-specific to those that are more generalized, such as those related to heterogeneous syndromic disorders. Very little is known about how a genetic aberration influences several levels of biological structure between the genotype and the clinical manifestation. Second, significant technological obstacles are posed by the immense complexity within and across various organizational scales, such as the transcriptome, the proteome, and intra- or intercellular communication. Where can we find and combine the most important information? The third issue is that there is often a lack of data on illnesses with single-gene causes since they are so uncommon. The study of uncommon diseases has often used the “one gene, one route, one disease” paradigm. There is currently a lack of a standardized method for learning about the similarities and differences of uncommon illnesses and applying that information to other cases.

A network-based framework for systematically investigating rare diseases is designed to overcome these obstacles. In turn, we use the many rare diseases with a clear genetic basis to expand our knowledge of how the disease is associated with disruptions in molecular networks. Here, we describe a multiplex network strategy for integrating networks at many levels, each of which stands for a distinct level of biological organization (from the genome to the transcriptome to the phenome). The connection patterns indicate the relevance of a certain scale of biological organization for a given rare illness by thoroughly analyzing the network signatures of all rare diseases with known genetic origins. Finally, these systems-level insights might be translated into clinically actionable tools for the genetic diagnosis of rare disease patients with unknown gene defects, with the goals of better contextualizing individual genetic lesions and elucidating the impact of disease heterogeneity.

Genes linked to illness may help pinpoint certain biological processes. Many diseases have been linked to mutations in relatively few genes, and it has been shown that the protein products of these genes all play a role in the same cellular pathway, molecular complex, or functional module. In the case of Fanconi anemia, mutations in a group of genes encoding proteins involved in DNA repair cause the disease. Yet, it is still entirely unknown whether most diseases and disorder groups correlate to separate functional modules in the cellular network. It stands to reason that proteins inside a disease module are more likely to interact with one another than with proteins outside of the module if the genes connected by disorder associations express proteins that interact in functionally distinct modules. To do so, we used a network of physical protein–protein interactions gleaned through high-quality systematic interactome mapping and the literature curation to overlay the DGN and test the hypothesis. It is observed that there are 290 overlapping contacts between the two networks, which is a 10-fold increase above what would be expected by chance alone.

The Gene Ontology (GO) annotations show that disorder-related genes have similar cellular and functional properties. If the HDN is modular, then genes that are all linked to the same disease should have comparable cellular and functional properties, as identified in GO. To test this hypothesis, we calculated the percentage of overlapping terms between disorders (see SI Text) across three different GO categories (biological process, molecular function, and cellular component) and found that the percentage of overlapping terms was significantly higher than in random controls.

There should be a general trend towards tissue specificity in the expression of disease genes expressing proteins that interact within shared functional modules. We defined the tissue homogeneity coefficient of a disease as the highest percentage of disorder-shared genes expressed in each tissue using a microarray data set of 10,594 genes from 36 normal human tissues [27]. It is observed that 68 percent of diseases showed near-perfect tissue homogeneity, whereas only 51 percent would be anticipated by chance.

Lastly, a significant expression profiling correlation should be seen between illness genes that share a similar functional module [28,29]. Pearson correlation coefficients (PCCs) for co-expression patterns of gene pairs associated with the same illness were skewed towards higher values compared to a random control, suggesting that these disorders are causally linked. Similarly, a considerable deviation from the random reference is shown in the average PCC across all pairings of genes within a specific disorder with a minor but clearly visible peak in the distribution at PCC 0.75. Heinz body anemia, Bethlem myopathy, and spherocytosis are only a few of the 33 illnesses associated with an average PCC of 0.6, in which all genes are strongly co-expressed in most tissues.

To sum up, genes that share a common disorder are more likely to (i) have products that interact with each other through protein–protein interactions, (ii) be expressed together in specific tissues, (iii) show high co-expression levels, (iv) exhibit synchronized expression as a group, and (v) share GO terms. Together, these results provide a network-based model for the diseases and credence to the idea that disease genes and their products have a worldwide functional relatedness. Modularity is also a hallmark of cellular networks, which are composed of clusters of proteins that are intricately linked and are each responsible for a distinct cellular process [30,31]. Disturbance or failure of a certain functional module due to a change in one or more components leading to detectable mental and/or physical abnormalities characterizes disorders.

In this paradigm, a phenotype is typically correlated with the incapacity of a certain functional module to carry out its fundamental responsibilities, providing a network-based explanation for the formation of complex or polygenic illnesses. Mutations in separate genes will seem to lead to the same condition since numerous possible combinations of disrupted genes might render an expanded module inoperable. In addition to shedding light on the relationship between disease and specific cellular functions, the identification of genes that contribute to the same cellular function or network module using the correlation between disease and functional modules can improve our understanding of cellular networks.

One possible real-time example of using a disease network for outbreak analysis is the study in [32]. The authors used data from digital disease surveillance tools such as ProMED and HealthMap to construct a human disease network (HDN) based on the geographic locations and dates of reported cases of infectious diseases. They developed a statistical model to quantify the spatial heterogeneity in the risk of disease spread and to forecast short-term incidence trends. They applied their model retrospectively to data collected by ProMED and HealthMap during the 2013–2016 West African Ebola epidemic and compared it with WHO data. They showed that their model was able to robustly quantify the risk of disease spread 1–4 weeks in advance, and for countries at risk of case importations, quantify where this risk comes from. They also showed that their model could capture the impact of public health interventions such as border closures and travel restrictions on disease transmission. Their study highlights that ProMED and HealthMap data could be used in real time to quantify the spatial heterogeneity in the risk of spread of an outbreak.

Another possible real-time example of using a disease network for disease–gene association prediction is the study by Lee et al. [26,27]. The authors used a biological knowledge graph with gene–disease associations derived from DNA sequencing data and other sources. They used a convolutional neural network to learn embeddings for nodes and edges that capture both local and global information in the knowledge graph. They showed that their knowledge graph-embedding model (KGED) can predict novel gene-disease associations with high accuracy. They also showed that KGED can identify genes that are associated with multiple diseases, suggesting potential pleiotropic effects or common pathways. Their study demonstrates the potential of using knowledge graph-embedding models for integrating multi-sourced data and discovering novel disease–gene associations.

2.2. The Literature on Centrality Measurement

A network consists of a collection of nodes linked by a common set of ties. Offline networks include those formed by members of a family (family network), a group of farmers in a hamlet (farmer network), or a group of businessmen (business network). There are millions of individuals who have access to smartphones and find it convenient to utilize social apps as a means of communication and dissemination of information. Finding the most important node in a social network is a prerequisite for developing such solutions. The centrality of a node in a network reflects its significance and importance. As a result, determining centrality is a crucial part of any social network analysis. Let’s pretend a club has 100 members. The club president, naturally, plays a vital role. Many different bank locations may be found around the nation. The bank’s main office probably lies in a central location. The part of class monitor often places the student in the spotlight as the most prominent learner in the classroom. A college principal is seen as pivotal in the academic community.

In the real world, networks with high temporal complexity and difficult-to-assess centrality may produce a tremendous amount of data. It is also possible to use centrality metrics to sift through a big dataset in search of a single, highly relevant piece of information. One typical challenge in these networks is pinpointing the most important nodes. There are various uses for networks, whether online or offline, including but not limited to the dissemination of viruses, diseases, information, news, etc. It is crucial in network analysis to isolate key nodes and edges. Not every scenario calls for the use of every centrality metric. The intricacy of the network in terms of time is also crucial. Throughout time, many centrality metrics were created and utilized in appropriate settings based on varied conceptions of the significance of vertices or edges.

Several new centrality metrics are being proposed and developed for use with appropriate challenges in the real world. The measure of centrality for linked graphs was initially described by Bavelas [33], who also advocated its use in the analysis of communication networks. The concept of stress centrality, which uses the shortest route between two nodes, was first proposed by Shimbel [34]. Katz centrality was first established in 1953 [35] as a way to quantify a node’s importance inside a network. Beauchamp [36] criticized the Bavelas definition of centrality and presented an enhanced index that included points and graphs to increase the measure’s analytic power. Sabidussi [37] made a case for Beauchamp’s enhancement of the centrality index by providing a clear definition and evaluating the performance of existing indices in meeting that criterion. With minor tweaks to Sabidussi’s axioms, Nieminen [38] presented a vertex-degree-based centrality measure for undirected graphs.

For each concept, Freeman [39] created three types of centrality measures: an absolute centrality measure, a comparative centrality measure for places inside a network, and a network-wide centrality measure. To better understand the experimental culture of smaller groups, these metrics were analyzed. Information centrality was first proposed by Stephenson and Zelen [40] and is based on the notion that any two nodes in a network may exchange data with one another. An innovative method for determining network centrality based on the concept of flows was developed by Freeman et al. [39]. This was similar to Freeman’s metric, albeit not identical. Specifically, White and Borgatti [41] extended Freeman’s geodesic centrality measurements for betweenness in undirected networks to more complex directed graphs. Everett and Borgatti [42] expanded the three centrality measures to include not just people but also classes and groupings.

Until recently, Freeman’s centrality measurements were limited to binary networks. Freeman’s centrality measurements have been extensively contributed to extending them to weighted networks. Brandes [43] presented a more efficient approach for calculating betweenness centrality, cutting down on both the time and space required for such comparisons. All centrality metrics were shown to be stable when the network was sampled, as discussed by Costenbader and Valente [44]. The computational technique was used, and robustness was analyzed for a large sample of graphs to overcome the restrictions set out by Costenbader and Valente [44] and Borgatti [45]. Subgraph centrality was first proposed by Estrada and Rodriguez-Velazquez [46] as a way to quantify how often a given node appears in distinct subgraphs of a given network. Sub-graph centrality was extended as functional centrality by Rodriguez et al. [47] based on the premise that closed walks should be correctly weighted such that their effect on the centrality diminishes as the rank of the walk grows. The special properties of eigenvector centrality were explored by Bonacich [48]. Opsahl et al. [49] enhanced centrality measures for weighted networks. Leverage centrality was first proposed by Joyce et al. [50] and then used to examine a network model of the human brain.

It was argued by Kitsak et al. [51], using k-shell decomposition, that key influencers tend to congregate at the hubs of networks. Zeng and Zhang’s [52] mixed-degree decomposition approach was presented by considering both the residual and the exhausted degree. An enhanced strategy to show more distinguishing in the ranking list was proposed by Liu et al. [53,54]. The shortest path from a destination node to the group of nodes with the greatest k-shell values—the network core—is calculated using this technique. Bae and Kim [55] introduced neighborhood coreness centrality, a measure of a node’s position in the network based on how it and its neighbors are central to it. Liu et al. [56] suggested using an nth-step neighborhood centrality to identify key nodes in large networks. To better understand which are the most important nodes, Wang et al. [57] advocated using weighted neighborhood centrality.

The literature survey has reviewed various centrality measures and their applications in biological network analysis, especially in the disease–genome network. However, the literature survey has not addressed the following research gaps:

How can different centrality measures be compared and evaluated for the disease–genome network in terms of their accuracy, robustness, and interpretability?
How can multiple sources and modalities of data, such as DNA sequencing, gene expression, protein–protein interaction, pathway analysis, etc., be integrated to construct a comprehensive and reliable disease–genome network?
How can domain knowledge and expert feedback be incorporated into centrality measures to improve their biological relevance and validity?

These research gaps motivate the need for developing novel and improved centrality measures for the disease–genome network that can address the challenges and opportunities in this field.

3. The Motivation for This Study

The biological network is very complex in nature. This study analyses the biological network, particularly the disease–genome network. Finding the center of the genome network is the main objective of this study. Central nodes of any disease network are important in preventing diseases and preventing further spreading. The disease/genome with the most prominent values/centrality measures are examined using various methods. The methods have some limitations in particular networks. To overcome such limitations, we develop an algorithm to measure such central nodes. This will help us to understand the connections and spreading paths of diseases.

3.1. The Basics of Centrality Measurements

The degree centrality measure uses a node’s connection count as a single criterion for its relevance. How many “one hop” connections does each node have to every other node in the network? To identify people with extensive social networks, high levels of popularity, a wealth of relevant knowledge, or speedy access to the larger network are utilized. One easy way to evaluate the interconnections between nodes is to look at their degree centrality. When analyzing financial data or account activity, for example, it might be helpful to separate in-degree (the number of incoming connections) and out-degree (the number of outgoing links) into two separate metrics.

The betweenness centrality indexes how often a given node is located on a route connecting two other nodes. This metric reveals the nodes that connect other nodes in the network. To do this, it finds all the shortest pathways and counts the number of times each node is on the shortest path, which is useful for pinpointing the key players responsible for shaping the dynamics of a certain setup. One should use caution while using betweenness in the course of studying the dynamics of communication. Having a high betweenness score might mean that a node is influential across many clusters in a network, or it could simply mean that the node is geographically far from both clusters.

Using their “closeness” to every other node in the network, closeness centrality calculates a score for each node. Each node is given a score based on the total of the shortest pathways to and from it, which are calculated after all other paths have been taken into account to identify those people most likely to have widespread influence throughout the network immediately and pinpoint the people most capable of rapidly impacting the whole network.

EigenCentrality, like degree centrality, is a measure of a node’s impact in a network that considers the number of edges connecting it to every other node in the network. The EigenCentrality measure goes beyond this by additionally considering a node’s degree of connectivity, the number of interconnections between it and other nodes, and so on. EigenCentrality is able to determine whether nodes significantly impact the whole network as opposed to simply the nodes immediately linked to them by computing the node’s extended connections. As an excellent ‘all-around’ SNA score, EigenCentrality is useful for studying human social networks and investigating the spread of viruses. Our software determines the EigenCentrality of each node by iteratively converging on an eigenvector.

Like EigenCentrality, PageRank assigns a score to each node in a network based on the strength of their connections and those of their connections. Nevertheless, PageRank also considers the direction and weight of links, which means that connections can only convey weight in one way and may carry varying weights. This metric identifies hubs whose impact spans beyond their immediate sphere of influence. PageRank may provide light on citations and authority by considering the direction and link weight. One of the renowned algorithms powering the first iteration of Google’s search engine was called PageRank (the “Page” in PageRank refers to Google co-founder Larry Page, who also created the algorithm).

3.2. Perspectives from the Center and the Edges

Discovering that densely linked proteins, or “hubs”, are more often encoded by important genes in Saccharomyces cerevisiae provided an early hint of the relationship between the topology of a cellular network and its functional features [58,59]. Because of this, many new studies by Newman [60] and later Sharma et al. [61] have proposed that human illness genes should also prefer encoding hubs. While prior assessments showed a slight association between illness genes and hubs, the relevance of the cellular network in human disorders remains unclear. What percentage of illness genes code for nodes or hubs in the cell’s communication infrastructure?

We found that disease-related proteins have a 32% larger number of interactions with other proteins (average degree) than the non-disease proteins and that high-degree proteins are more likely to be encoded by genes associated with diseases than proteins with few interactions, suggesting that disease genes, given their impact on the organism, display a tendency to encode hubs in the interactome [60]. Nevertheless, it is demonstrated that there are substantial variances across different disease genes underneath this apparent association between illnesses and hubs.

However, the authors of other earlier studies overlooked the fact that some human genes are essential in early development and that functional changes in these contribute to the high rate of first-trimester spontaneous abortions (which may account for as much as 20% of recognized pregnancies) when investigating whether disease genes encode hubs. Using the mouse in orthologs of genes that are disrupted by homologous recombination to study their effects is one way to get insight into this phenomenon (Mouse Genome Informatics; www.informatics.jax.org (accessed on 30 May 2023)). In total, 1267 such mouse lethal orthologs of human genes were identified, of which 398 are related to human disorders or 22% of all known human disease genes. This enables us to categorize the 1267 “essential genes” and the 1379 “non-essential disease genes” by deleting the 398 “essential illness genes” from the entire list of 1777 Online Mendelian Inheritance in Man (OMIM) disease genes, demonstrating the distinct functions played by these two groups of genes in the human interactome.

To begin, critical proteins have a larger propensity than all disease proteins to be linked to hubs. This begs the crucial question: Might the fact that just a minority (22%) of illness genes are also essential be the only cause of the observed association between disease genes and hubs? We examined the degree of dependency of the non-essential disease proteins to answer this issue. Interestingly, there is no longer any link between hubs and disease-related proteins. Consequently, the majority of illness genes, or 78%, do not exhibit a propensity to encode hubs, suggesting that the observed weak relationships between hubs and disease genes were completely owing to the few key genes within the disease gene class.

To perform even the most fundamental tasks, the cell must maintain the coordinated activity of key functional modules by driving the expression patterns of its most critical genes in a generally synchronized fashion. It stands to reason, therefore, that a large fraction of genes will be involved in coordinating the expression pattern of both essential and pathogenic genes. To do so, we used microarray data from healthy human tissues to calculate the average gene co-expression coefficient (PCCij) between a disease-causing gene I and all other genes in the cell [25]. According to our hypothesis, we discover that genes with high average co-expression with all other genes are more likely to be important than genes with little or negative co-expression. Nevertheless, non-essential illness genes have the reverse impact, being overrepresented among the highly synchronized genes and linked to those whose expression pattern is anticorrelated or not correlated with other genes. Hence, non-essential disease-related genes seem to have an expression pattern independent of the cell as a whole, whereas essential genes tend to have expression patterns tightly tied to the rest of the cell.

Lastly, we questioned whether disease-causing mutations are likely to occur in housekeeping genes that are ubiquitously expressed. We discover that the chance of a gene being crucial increases as its expression is detected in more tissues. Conversely, non-essential illness genes tend to be expressed in just a few organs. Similarly, we discovered that only 9.9% of housekeeping genes are also illness genes, whereas this proportion is 13.5% for non-housekeeping genes. As a comparison, only 40.5% of non-housekeeping genes were deemed important, whereas 59.8% of housekeeping genes labeled with the mouse phenotype were.

These findings lend credence to the counterintuitive conclusion that disease genes that are not absolutely necessary tend to be expressed in only a select few tissues and have a lower correlation in their expression pattern with the rest of the genes in the cell than would be predicted by chance. Hence, contrary to our assumptions and previous ideas, most disease-causing genes that aren’t necessary are located in relatively unimportant nodes of the cellular network. Conversely, important genes tend to be overrepresented in the housekeeping gene category to have a well-synchronized expression with the rest of the genes and be expressed in most tissues.

4. Constructing the Human Disease Network (HDN)

From the beginning of medicine, doctors and scientists have zeroed in on a select group of diseases believed to have some common genetic or environmental roots. Recent advances in genetics and genomics have allowed us to understand the role that mutated genes play in practically every illness and to examine a wide range of human diseases simultaneously for the first time [4,30]. This novel method may allow for the identification of underlying human illness principles that are not immediately evident when studying diseases in isolation.

The Human Disease Network (HDN) is a valuable resource in this effort since it provides a genomic roadmap for investigations into causal links between diseases. Physicians, genetic counsellors, and biomedical researchers may all benefit from the global perspective provided by the accompanying comprehensive diseasome map, which displays all ailments and the genes linked with various conditions.

Genes that satisfy the less stringent criterion that the phenotype has not been mapped to a specific locus, as well as those genes with identified mutations linked to the specific disease phenotype, were included to test whether the conclusions obtained here are robust to the incompleteness of the OMIM coverage. Although this rise from 1777 to 2765 disease-associated genes is encouraging, the looser association between many of the newly added genes and disorders introduces noise into the data. Nonetheless, the structure of the revised diseasome map is mostly unchanged, and the patterns we have identified are unaffected by this growth, providing additional evidence that our results are resistant to further enlargement of the OMIM database. As the HDN represents the underlying cellular network-based relationship between genes and functional modules, the HDN will remain largely unchanged even if the displayed maps (Table 1, Figure 1) undergo inevitable local alterations with the discovery of new disease genes.

4.1. Central Values of the HDN

We consider PageRank, Katz, HITS, eigenvector, betweenness, and closeness centralities of the selected nodes in the HDN network (Table 1). The central values of the selected nodes are shown in Table 2. The comparison of the centralities of the HDN is shown in Figure 2 and Table 2.

Spinal muscular atrophy has the maximum values for PageRank and betweenness centralities. It also noted that breast cancer has the highest values for Katz, HITS, eigenvector, and degree centralities. Thus, we can say the mentioned diseases are highly connected to other diseases. Thus, it is necessary to determine the appropriate formulation to find the most central node. The average values of all mentioned centralities are shown in the following Table 3.

4.2. Central Values in the Genome Network

We have designed a genome network based on the existing studies (Figure 3). It is seen that LMNA (gene encodes lamin A and lamin C) has the highest value for Katz centrality (see Table 4). The genome node AR also has the maximum central values.

In Table 4, we use the following terms: HEXB is a gene that encodes Hexosaminidase Subunit Beta, an enzyme responsible for the breakdown of complex molecules in the body. LMNA is another gene that encodes the protein Lamin A/C, which is essential for maintaining the structure and function of the cell nucleus. Amyotrophic Lateral Sclerosis 2 (ALS2) is a progressive neurodegenerative disease that affects nerve cells in the brain and spinal cord. BSCL2 is a gene associated with Berardinelli-Seip Congenital Lipodystrophy 2, a rare disorder characterized by the loss or absence of adipose tissue throughout the body. VAPB is a gene that encodes Vesicle-Associated Membrane Protein-Associated Protein B, a protein that regulates membrane trafficking and lipid metabolism. GARS is a gene that encodes Glycyl-tRNA Synthetase, an enzyme involved in protein synthesis. AR is the gene responsible for encoding the Androgen Receptor protein, which is indispensable for the development and function of male reproductive tissues. ATM is a gene that encodes the Ataxia Telangiectasia Mutated protein, which plays an important role in repairing damaged DNA. BRIP1 is a gene encoding the BRCA1-Interacting Protein C-Terminal Helicase 1 protein, which is involved in DNA repair and genome stability maintenance. The BRCA2 gene is linked to Breast Cancer 2, an inherited form of breast cancer. BRCA1 is a gene that encodes the Breast Cancer 1 protein, which is essential for DNA repair and cell cycle regulation. The KRAS gene encodes the Kirsten Rat Sarcoma Viral Oncogene Homolog, which regulates cell division and development. RAD54L is a gene closely related to the DNA repair protein RAD54. Several forms of cancer, including breast and ovarian cancer, have been linked to mutations in the RAD54L gene. The Tumor Protein 53 is encoded by the TP53 gene, also known as the “guardian of the genome”. MAD1L1 is a gene that encodes the protein Mitotic Arrest Deficient 1-Like 1, which is essential for cell division and chromosome segregation. Checkpoint Kinase 2 is encoded by the CHEK2 gene, which is involved in DNA damage response and cell cycle regulation. PIK3CA is a gene encoding the catalytic subunit alpha of Phosphotidylinositol-4,5-Bisphosphate 3-Kinase, an enzyme implicated in cell growth and survival signaling pathways. The Cadherin-1 protein is encoded by the CDH1 gene, which plays a role in cell adhesion and tissue organization. MSH2 is a gene encoding the MutS Homolog 2 protein, which is essential for DNA mismatch repair. Mutations in the MSH2 gene can increase the risk of developing hereditary nonpolyposis colorectal cancer as well as other forms of cancer.

5. Analysis and Discussion

The genome network is not as complex as the HDN (see Figure 4, Figure 5 and Figure 6). We can see that the centralities from the bottom nodes (after node no. 7) of the network are higher and almost the same. It can also be seen from Figure 5 that the right part of node 7 (AR) is a clique. Every pair of nodes is connected. Thus, there is a high chance of spreading the disease if one is infected.

6. Conclusions

In this paper, we present a multiplex network approach to study the relationship between human diseases and genes. We construct a human disease network (HDN) and a human genome network (HGN) based on genotype–phenotype associations and gene interactions, respectively. We compute various centrality measures for both networks and compare them. Our main finding is that diseases are weakly connected in the HDN, while genes are strongly connected in the HGN. This implies that diseases are relatively isolated from each other, while genes are involved in multiple biological processes. This result has implications for understanding the transmission of infectious diseases and the development of therapeutic interventions. We also show that not all diseases have the same potential to spread infections to other parts of the body, depending on their centralities in the HDN.

Some possible directions for future research include exploring the dynamics and evolution of the multiplex network over time, identifying the key factors and mechanisms that influence the network structure and function, and developing novel methods and tools for network visualization and analysis. Furthermore, the multiplex network approach can be extended to other types of genetic disorders, such as complex diseases, and to other organisms, such as model species or pathogens. The multiplex network framework can also facilitate the integration of other types of data, such as epigenetic, transcriptomic, proteomic, or metabolomic data, to provide a more comprehensive and holistic view of the genotype–phenotype relationship.

Author Contributions

Conceptualization, G.M. and S.S.; methodology, G.M., S.S. and A.F.A.; validation, A.F.A. and A.M.A.; formal analysis, G.M. and S.S.; investigation, G.M., S.S., A.F.A. and A.M.A.; writing—original draft preparation, G.M.; writing—review and editing, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deanship of Scientific Research at University of Tabuk for funding this work through Research no. S-0136-1443.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to express their sincere thanks to the anonymous reviewers for their valuable comments and helpful suggestions, which greatly improved the quality of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

# Input: a database of disease–disease associations

# Output: a disease network

# Create an empty unipartite graph

unipartite_graph = Graph()

# Loop through the database

for each row in database:

# Get the names of the two diseases and their association measure

disease1 = row [“disease1”]

disease2 = row [“disease2”]

measure = row [“measure”]

# Add the disease nodes to the unipartite graph if they are not already there

if not unipartite_graph.has_node (disease1):

unipartite_graph.add_node (disease1)

if not unipartite_graph.has_node (disease2):

unipartite_graph.add_node (disease2)

# Add an edge between the two disease nodes to the unipartite graph with a weight equal to the association measure

unipartite_graph.add_edge (disease1, disease2, weight = measure) [we have taken unit weight for all edges]

# Visualize the unipartite graph using some network analysis tools

Visualize (unipartite_graph)

Appendix B

**Algorithm** GenerateTable2

**Input:** A network graph G with nodes and edges

**Output:** A table with centrality measures for each node

1. Initialize an empty table T with columns: node, degree, closeness, betweenness, and eigenvector.

2. For each node n in G:

- Calculate the degree centrality of n by counting the number of adjacent edges in G.

- Calculate the closeness centrality of n by finding the average shortest path length from n to all other nodes in G.

- Calculate the betweenness centrality of n by finding the fraction of shortest paths between any two nodes in G that pass through n.

- Calculate the eigenvector centrality of n by finding the principal eigenvector of the adjacency matrix of G and taking its value at n.

- Add a new row to T with the values of n and its centrality measures.

3. Return T.

References

Nguengang Wakap, S.; Lambert, D.M.; Olry, A.; Rodwell, C.; Gueydan, C.; Lanneau, V.; Murphy, D.; Le Cam, Y.; Rath, A. Estimating cumulative point prevalence of rare diseases: Analysis of the Orphanet database. Eur. J. Hum. Genet. 2020, 28, 165–173. [Google Scholar] [CrossRef] [Green Version]
Boycott, K.M.; Vanstone, M.R.; Bulman, D.E.; MacKenzie, A.E. Rare-disease genetics in the era of next-generation sequencing: Discovery to translation. Nat. Rev. Genet. 2013, 14, 681–691. [Google Scholar] [CrossRef]
Fernandez-Marmiesse, A.; Gouveia, S.; Couce, M.L. NGS technologies as a turning point in rare disease research, diagnosis and treatment. Curr. Med. Chem. 2018, 25, 404–432. [Google Scholar] [CrossRef]
Ozen, A.; Comrie, W.A.; Ardy, R.C.; Conde, C.D.; Dalgic, B.; Beser, F.; Morawski, A.R.; Karakoc-Aydiner, E.; Tutar, E.; Baris, S.; et al. CD55 Deficiency, Early-Onset Protein-Losing Enteropathy, and Thrombosis. N. Engl. J. Med. 2017, 377, 52–61. [Google Scholar] [CrossRef]
Dobbs, K.; Domínguez Conde, C.; Zhang, S.Y.; Parolini, S.; Audry, M.; Chou, J.; Haapaniemi, E.; Keles, S.; Bilic, I.; Okada, S.; et al. Inherited DOCK2 Deficiency in Patients with Early-Onset Invasive Infections. N. Engl. J. Med. 2015, 372, 2409–2422. [Google Scholar] [CrossRef]
Salzer, E.; Cagdas, D.; Hons, M.; Mace, E.M.; Garncarz, W.; Petronczki, Y.; Platzer, R.; Pfajfer, L.; Bilic, I.; Ban, S.A.; et al. RASGRP1 deficiency causes immunodeficiency with impaired cytoskeletal dynamics. Nat. Immunol. 2016, 17, 1352–1360. [Google Scholar] [CrossRef] [PubMed]
Nagy, V.; Hollstein, R.; Pai, T.P.; Herde, M.K.; Buphamalai, P.; Moeseneder, P.; Lenartowicz, E.; Kavirayani, A.; Korenke, G.C.; Kozieradzki, I.; et al. HACE1 deficiency leads to structural and functional neurodevelopmental defects. Neurol. Genet. 2019, 5, e330. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kochinke, K.; Zweier, C.; Nijhof, B.; Fenckova, M.; Cizek, P.; Honti, F.; Keerthikumar, S.; Oortveld, M.A.; Kleefstra, T.; Kramer, J.M.; et al. Systematic phenomics analysis deconvolutes genes mutated in intellectual disability into biologically coherent modules. Am. J. Hum. Genet. 2016, 98, 149–164. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Anikster, Y.; Haack, T.B.; Vilboux, T.; Pode-Shakked, B.; Thöny, B.; Shen, N.; Guarani, V.; Meissner, T.; Mayatepek, E.; Trefz, F.K.; et al. Biallelic mutations in DNAJC12 cause hyperphenylalaninemia, dystonia, and intellectual disability. Am. J. Hum. Genet. 2017, 100, 257–266. [Google Scholar] [CrossRef] [Green Version]
Tarailo-Graovac, M.; Shyr, C.; Ross, C.J.; Horvath, G.A.; Salvarinova, R.; Ye, X.C.; Zhang, L.-H.; Bhavsar, A.P.; Lee, J.J.; Drögemöller, B.I.; et al. Exome sequencing and the management of neurometabolic disorders. N. Engl. J. Med. 2016, 374, 2246–2255. [Google Scholar] [CrossRef] [Green Version]
Costanzo, M.; Kuzmin, E.; van Leeuwen, J.; Mair, B.; Moffat, J.; Boone, C.; Andrews, B. Global genetic networks and the genotype-to-phenotype relationship. Cell 2019, 177, 85–100.e14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Velimezi, G.; Robinson-Garcia, L.; Muñoz-Martínez, F.; Wiegant, W.W.; Ferreira da Silva, J.; Owusu, M.; Moder, M.; Wiedner, M.; Rosenthal, S.B.; Fisch, K.M.; et al. Map of synthetic rescue interactions for the Fanconi anemia DNA repair pathway identifies USP48. Nat. Commun. 2018, 9, 2280. [Google Scholar] [CrossRef] [PubMed]
Luck, K.; Kim, D.-K.; Lambourne, L.; Spirohn, K.; Begg, B.E.; Bian, W.; Brignall, R.; Cafarelli, T.; Campos-Laborie, F.J.; Charloteaux, B.; et al. A reference map of the human binary protein interactome. Nature 2020, 580, 402–408. [Google Scholar] [CrossRef] [PubMed]
Das, K.; Samanta, S.; Pal, M. Study on centrality measures in social networks: A survey. Soc. Netw. Anal. Min. 2018, 8, 13. [Google Scholar] [CrossRef]
Samanta, S.; Dubey, V.K.; Sarkar, B. Measure of influences in social networks. Appl. Soft Comput. 2021, 99, 106858. [Google Scholar] [CrossRef]
Pandey, S.D.; Ranadive, A.S.; Samanta, S.; Sarkar, B. Bipolar-valued fuzzy social network and centrality measures. Discret. Dyn. Nat. Soc. 2022, 2022, 9713575. [Google Scholar] [CrossRef]
Alanazi, A.M.; Muhiuddin, G.; Al-Balawi, D.A.; Samanta, S. Different DNA sequencing using DNA graphs: A study. Appl. Sci. 2022, 12, 5414. [Google Scholar] [CrossRef]
Samanta, S.; Dubey, V.K.; Das, K. Coopetition bunch graphs: Competition and cooperation on COVID-19 research. Inf. Sci. 2022, 589, 1–33. [Google Scholar] [CrossRef]
Samanta, S.; Muhiuddin, G.; Alanazi, A.M.; Das, K. A mathematical approach on representation of competitions: Competition cluster hypergraphs. Math. Probl. Eng. 2020, 2020, 2517415. [Google Scholar] [CrossRef]
Samanta, S.; Pal, M. Fuzzy planar graphs. IEEE Trans. Fuzzy Syst. 2015, 23, 1936–1942. [Google Scholar] [CrossRef]
Samanta, S.; Pal, M. Fuzzy k-competition graphs and p-competition fuzzy graphs. Fuzzy Inf. Eng. 2013, 5, 191–204. [Google Scholar] [CrossRef]
Samanta, S.; Akram, M.; Pal, M. m-Step fuzzy competition graphs. J. Appl. Math. Comput. 2015, 47, 461–472. [Google Scholar] [CrossRef]
Feng, S.; Heath, E.; Jefferson, B.; Joslyn, C.; Kvinge, H.; Mitchell, H.D.; Praggastis, B.; Eisfeld, A.J.; Sims, A.C.; Thackray, L.B.; et al. Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinform. 2021, 22, 287. [Google Scholar] [CrossRef]
Naderi Yeganeh, P.; Richardson, C.; Saule, E.; Loraine, A.; Mostafavi, M.T. Revisiting the use of graph centrality models in biological pathway analysis. BioData Min. 2020, 13, 5. [Google Scholar] [CrossRef]
Lee, J.; Kim, J.; Kim, S.; Lee, D. Identifying disease-gene associations using a convolutional neural network-based knowledge graph-embedding model. PLoS ONE 2021, 16, e0258626. [Google Scholar]
Lee, J.H.; Kim, S.Y.; Kim, H.J.; Kim, H.J.; Lee, J.H.; Kim, S.Y.; Kim, H.J.; Kim, H.J. Disease-gene association prediction using a convolutional neural network-based knowledge graph-embedding model. Sci. Rep. 2021, 11, 1–13. [Google Scholar]
Köhler, S.; Doelken, S.C.; Mungall, C.J.; Bauer, S.; Firth, H.V.; Bailleul-Forestier, I.; Black, G.C.M.; Brown, D.L.; Brudno, M.; Campbell, J.; et al. The Human Phenotype Ontology project: Linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014, 42, D966–D974. [Google Scholar] [CrossRef] [Green Version]
Menche, J.; Sharma, A.; Kitsak, M.; Ghiassian, S.D.; Vidal, M.; Loscalzo, J.; Barabási, A.L. Uncovering disease-disease relationships through the incomplete interactome. Science 2015, 347, 1257601. [Google Scholar] [CrossRef] [Green Version]
Zhou, X.; Menche, J.; Barabási, A.-L.; Sharma, A. Human symptoms-disease network. Nat. Commun. 2014, 5, 4212. [Google Scholar] [CrossRef] [Green Version]
Alanis-Lobato, G.; Andrade-Navarro, M.A.; Schaefer, M.H. HIPPIE v2.0: Enhancing meaningfulness and reliability of protein-protein interaction networks. Nucleic Acids Res. 2017, 45, D408–D414. [Google Scholar] [CrossRef] [Green Version]
Croft, D.; O’Kelly, G.; Wu, G.; Haw, R.; Gillespie, M.; Matthews, L.; Caudy, M.; Garapati, P.; Gopinath, G.; Jassal, B.; et al. Reactome: A database of reactions, pathways and biological processes. Nucleic Acids Res. 2011, 39, D691–D697. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bhatia, S.; Lassmann, B.; Cohn, E.; Desai, A.N.; Carrion, M.; Kraemer, M.U.G.; Herringer, M.; Brownstein, J.; Madoff, L.; Cori, A.; et al. Using digital surveillance tools for near real-time mapping of the risk of infectious disease spread. npj Digit. Med. 2021, 4, 73. [Google Scholar] [CrossRef] [PubMed]
Bavelas, A. Communication patterns in task-oriented groups. J. Acoust. Soc. Am. 1950, 22, 725–730. [Google Scholar] [CrossRef]
Shimbel, A. Structural parameters of communication networks. Bull. Math. Biophys. 1953, 15, 501–507. [Google Scholar] [CrossRef]
Katz, L. A new status index derived from sociometric analysis. Psychometrika 1953, 18, 39–43. [Google Scholar] [CrossRef]
Beauchamp, M.A. An improved index of centrality. Behav. Sci. 1965, 10, 161–163. [Google Scholar] [CrossRef]
Sabidussi, G. The centrality index of a graph. Psychometrika 1966, 31, 581–603. [Google Scholar] [CrossRef]
Tandonnet, C.; Burle, B.; Hasbroucq, T.; Vidal, F. Spatial enhancement of EEG traces by surface Laplacian estimation: Comparison between local and global methods. Clin. Neurophysiol. 2005, 116, 18–24. [Google Scholar] [CrossRef]
Freeman, L.C.; Borgatti, S.P.; White, D.R. Centrality in valued graphs: A measure of betweenness based on network flow. Soc. Netw. 1991, 13, 141–154. [Google Scholar] [CrossRef] [Green Version]
Stephenson, K.; Zelen, M. Rethinking centrality: Methods and examples. Soc. Netw. 1989, 11, 1–37. [Google Scholar] [CrossRef]
White, D.R.; Borgatti, S.P. Betweenness centrality measures for directed graphs. Soc. Netw. 1994, 16, 335–346. [Google Scholar] [CrossRef] [Green Version]
Everett, M.G.; Borgatti, S.P. The centrality of groups and classes. J. Math. Sociol. 1999, 23, 181–201. [Google Scholar] [CrossRef]
Brandes, U. A faster algorithm for betweenness centrality. J. Math. Sociol. 2001, 25, 163–177. [Google Scholar] [CrossRef]
Costenbader, E.; Valente, T.W. The stability of centrality measures when networks are sampled. Soc. Netw. 2003, 25, 283–307. [Google Scholar] [CrossRef]
Borgatti, S.P. Identifying sets of key players in a social network. Comput. Math. Organ. Theory 2006, 12, 21–34. [Google Scholar] [CrossRef]
Estrada, E.; Rodriguez-Velazquez, J.A. Subgraph centrality in complex networks. Phys. Rev. E—Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top. 2005, 71, 056103. [Google Scholar] [CrossRef] [Green Version]
Rodriguez, J.A.; Medina, R.; Estrada, E. Functional centrality: Detecting lethality of proteins in protein interaction networks by means of the Dirac operator. Phys. A Stat. Mech. Its Appl. 2006, 373, 651–664. [Google Scholar]
Bonacich, P. Some unique properties of eigenvector centrality. Soc. Netw. 2007, 29, 555–564. [Google Scholar] [CrossRef]
Opsahl, T.; Agneessens, F.; Skvoretz, J. Node centrality in weighted networks: Generalizing degree and shortest paths. Soc. Netw. 2010, 32, 245–251. [Google Scholar] [CrossRef]
Joyce, K.E.; Laurienti, P.J.; Burdette, J.H.; Hayasaka, S. A new measure of centrality for brain networks. PLoS ONE 2010, 5, e12200. [Google Scholar] [CrossRef] [Green Version]
Kitsak, M.; Gallos, L.K.; Havlin, S.; Liljeros, F.; Muchnik, L.; Stanley, H.E.; Makse, H.A. Identification of influential spreaders in complex networks. Nat. Phys. 2010, 6, 888–893. [Google Scholar] [CrossRef] [Green Version]
Zeng, A.; Zhang, C.J. Ranking spreaders by decomposing complex networks. Phys. Lett. A 2013, 377, 1031–1035. [Google Scholar] [CrossRef] [Green Version]
Liu, Y.Y.; Slotine, J.J.; Barabási, A.L. Controllability of complex networks. Nature 2011, 473, 167–173. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.Y.; Slotine, J.J.; Barabási, A.L. Observability of complex systems. Proc. Natl. Acad. Sci. USA 2013, 110, 2460–2465. [Google Scholar] [CrossRef] [PubMed]
Bae, J.; Kim, S. Identifying and ranking influential spreaders in complex networks by neighborhood coreness. Phys. A Stat. Mech. Appl. 2014, 395, 549–559. [Google Scholar] [CrossRef]
Liu, J.G.; Ren, Z.M.; Guo, Q. Ranking the spreading influence in complex networks. Phys. A Stat. Mech. Appl. 2013, 392, 4154–4159. [Google Scholar] [CrossRef] [Green Version]
Wang, J.; Li, M.; Chen, Y.; Wang, F.Y. Weighted-neighbourhood-centrality-based identification of influential nodes in complex networks. Phys. A Stat. Mech. Appl. 2017, 479, 1–9. [Google Scholar]
Pierson, E.; the GTEx Consortium; Koller, D.; Battle, A.; Mostafavi, S. Sharing and specificity of co-expression networks across 35 human tissues. PLoS Comput. Biol. 2015, 11, e1004220. [Google Scholar] [CrossRef] [Green Version]
Saha, A.; Kim, Y.; Gewirtz, A.D.; Jo, B.; Gao, C.; McDowell, I.C.; The GTEx Consortium; Engelhardt, B.E.; Battle, A. Co-expression networks reveal the tissue-specific regulation of transcription and splicing. Genome Res. 2017, 27, 1843–1858. [Google Scholar] [CrossRef] [Green Version]
Newman, M.E.J. Mixing patterns in networks. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2003, 67, 026126. [Google Scholar] [CrossRef] [Green Version]
Sharma, A.; Menche, J.; Huang, C.C.; Ort, T.; Zhou, X.; Kitsak, M.; Sahni, N.; Thibault, D.; Voung, L.; Guo, F.; et al. A disease module in the interactome explains disease heterogeneity, drug response and captures novel pathways and genes in asthma. Hum. Mol. Genet. 2015, 24, 3005–3020. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Disease network (the nodes are shown in Table 1) [see Appendix A for pseudocodes].

Figure 2. Comparison of the centralities among selected diseases in the constructed network (Figure 1).

Figure 3. Disease genome network (the serial number of Table 3 indicates the nodes of this network).

Figure 4. Genome bar chart.

Figure 5. Bridge design of Figure 3.

Figure 6. Comparison of genome network centralities.

Table 1. The selected diseases.

Sr. No.	Diseases	Sr. No.	Diseases
1	Charcot-Marie-Tooth disease	11	Lymphoma
2	Lipodystrophy	12	Wilms tumor
3	Silver spastic paraplegia syndrome	13	Breast cancer
4	Spastic ataxia/paraplegia	14	Ovarian cancer
5	Sandhoff disease	15	Pancreatic cancer
6	Amyotrophic lateral sclerosis	16	Papillary serous carcinoma
7	Spinal muscular atrophy	17	Fanconi anemia
8	Androgen insensitivity	18	T-cell lymphoblastic leukemia
9	Prostate cancer	19	Ataxia-telangiectasia
10	Perineal hypospadias

Table 2. Centralities of the nodes of the disease network (we have used Mathematica 12 software; see Appendix B).

Sr. No.	Diseases	Page Rank	Katz	HITS	Eigen	Degree	Betweenness	Closeness
1	Charcot-Marie-Tooth disease	0.762	0.442	0.161	0.161	0.182	0.000	0.605
2	Lipodystrophy	0.812	0.482	0.185	0.185	0.273	0.006	0.619
3	Silver spastic paraplegia syndrome	0.762	0.442	0.161	0.161	0.182	0.000	0.605
4	Spastic ataxia/paraplegia	0.765	0.353	0.024	0.024	0.091	0.000	0.441
5	Sandhoff disease	0.735	0.394	0.127	0.127	0.091	0.000	0.591
6	Amyotrophic lateral sclerosis	0.812	0.429	0.131	0.131	0.182	0.204	0.619
7	Spinal muscular atrophy	1.000	0.844	0.690	0.690	0.818	1.000	0.963
8	Androgen insensitivity	0.774	0.645	0.572	0.572	0.364	0.000	0.788
9	Prostate cancer	0.865	0.862	0.859	0.859	0.727	0.319	0.897
10	Perineal hypospadias	0.774	0.645	0.572	0.572	0.364	0.000	0.788
11	Lymphoma	0.822	0.653	0.526	0.526	0.455	0.039	0.684
12	Wilms tumor	0.775	0.517	0.338	0.338	0.273	0.000	0.591
13	Breast cancer	0.957	1.000	1.000	1.000	1.000	0.811	1.000
14	Ovarian cancer	0.787	0.521	0.324	0.324	0.273	0.006	0.634
15	Pancreatic cancer	0.789	0.608	0.493	0.493	0.364	0.017	0.667
16	Papillary serous carcinoma	0.759	0.462	0.243	0.243	0.182	0.000	0.619
17	Fanconi anemia	0.789	0.608	0.493	0.493	0.364	0.017	0.667
18	T-cell lymphoblastic leukemia	0.775	0.528	0.342	0.342	0.273	0.000	0.634
19	Ataxia-telangiectasia	0.775	0.528	0.342	0.342	0.273	0.000	0.634

HITS = Hyperlink-Induced Topic Search; Katz centrality alternatively known as alpha centrality.

Table 3. Ranking of the HDN in the proposed method.

Diseases	Proposed Values	Rank
Breast cancer	0.96687811	1
Spinal muscular atrophy	0.857941979	2
Prostate cancer	0.769502145	3
Androgen insensitivity	0.530645272	4
Perineal hypospadias	0.530645272	5
Lymphoma	0.529148135	6
Pancreatic cancer	0.490140949	7
Fanconi anemia	0.490140949	8
T-cell lymphoblastic leukemia	0.413526332	9
Ataxia-telangiectasia	0.413526332	10
Ovarian cancer	0.409950819	11
Wilms tumor	0.404552176	12
Lipodystrophy	0.366094845	13
Amyotrophic lateral sclerosis	0.358218184	14
Papillary serous carcinoma	0.358183726	15
Charcot-Marie-Tooth disease	0.330321295	16
Silver spastic paraplegia syndrome	0.330321295	17
Sandhoff disease	0.294905922	18
Spastic ataxia/paraplegia	0.242440338	19

Table 4. Central values of the genome nodes of the nodes in Figure 3.

Sr. No.	Disease Genome	PageRank Centrality	Katz Centrality	HITS Centrality	Eigenvector Centrality	Degree Centrality	Betweenness Centrality
1	HEXB	0.892	0.663	0.111	0.111	0.250	0.000
2	LMNA	0.866	1.000	0.019	0.019	0.125	0.000
3	ALS2	0.848	0.924	0.009	0.009	0.063	0.000
4	BSCL2	0.939	0.754	0.113	0.113	0.313	0.111
5	VAPB	0.980	0.747	0.112	0.112	0.313	0.236
6	GARS	0.939	0.754	0.113	0.113	0.313	0.111
7	AR	1.000	−4.113	1.000	1.000	1.000	1.000
8	ATM	0.919	−4.378	0.966	0.966	0.750	0.000
9	BRIP1	0.919	−4.378	0.966	0.966	0.750	0.000
10	BRCA2	0.919	−4.378	0.966	0.966	0.750	0.000
11	BRCA1	0.919	−4.378	0.966	0.966	0.750	0.000
12	KRAS	0.919	−4.378	0.966	0.966	0.750	0.000
13	RAD54L	0.919	−4.378	0.966	0.966	0.750	0.000
14	TP53	0.919	−4.378	0.966	0.966	0.750	0.000
15	MAD1L1	0.919	−4.378	0.966	0.966	0.750	0.000
16	CHEK2	0.919	−4.378	0.966	0.966	0.750	0.000
17	PIK3CA	0.919	−4.378	0.966	0.966	0.750	0.000
18	CDH1	0.919	−4.378	0.966	0.966	0.750	0.000
19	MSH2	0.919	−4.378	0.966	0.966	0.750	0.000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Muhiuddin, G.; Samanta, S.; Aljohani, A.F.; Alkhaibari, A.M. A Study on Graph Centrality Measures of Different Diseases Due to DNA Sequencing. Mathematics 2023, 11, 3166. https://doi.org/10.3390/math11143166

AMA Style

Muhiuddin G, Samanta S, Aljohani AF, Alkhaibari AM. A Study on Graph Centrality Measures of Different Diseases Due to DNA Sequencing. Mathematics. 2023; 11(14):3166. https://doi.org/10.3390/math11143166

Chicago/Turabian Style

Muhiuddin, Ghulam, Sovan Samanta, Abdulrahman F. Aljohani, and Abeer M. Alkhaibari. 2023. "A Study on Graph Centrality Measures of Different Diseases Due to DNA Sequencing" Mathematics 11, no. 14: 3166. https://doi.org/10.3390/math11143166

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Study on Graph Centrality Measures of Different Diseases Due to DNA Sequencing

Abstract

1. Introduction

2. Materials and Methods

2.1. The Literature of Disease Network

2.2. The Literature on Centrality Measurement

3. The Motivation for This Study

3.1. The Basics of Centrality Measurements

3.2. Perspectives from the Center and the Edges

4. Constructing the Human Disease Network (HDN)

4.1. Central Values of the HDN

4.2. Central Values in the Genome Network

5. Analysis and Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI