by Neha Kanneganti
In broad terms, graph theory is the study of graphs. While graph theory is not something new, its implementation in machine learning is rapidly growing. For example, many researchers are using the graph concepts and algorithms to predict a gene’s function and traits. Now it’s more than important to understand the foundational concepts of graph theory. This blog post will introduce you to the basics of graph theory!
What are Networks?
A network, or a graph, is a collection of objects that are interconnected. The objects are known as nodes, while the relationships among them are known as edges.
To start off, a common example of a network is a social network. Consider the picture of the social network below. It is comprised of a set of individuals and their connections. What are the nodes and edges?
The node represents an individual and the edge represents a connection.
Similarly, biological networks are also composed of nodes and edges. But in biological networks, the nodes can represent any biological entity such as genes and proteins. The edges between these nodes can represent functional, physical, and chemical interactions.
An example of a biological network is a protein-protein interaction network (PPI). Proteins do not act alone, instead, they interact with other proteins to perform a specific function. So through PPIs, we are able to predict the function, disease relevance, and drug targets of different proteins.
Thanks to the massive amount of experimental PPI data, there are repositories for PPI data. Some widely-used examples include BioGRID, STRING, and InBioMap. In particular, BioGRID has over a million genetic and protein interaction data. This data comes from the experimental data reported in peer-reviewed publications.
Besides biological networks, there are many other networks. For example, Wikipedia and Blog Category are also networks. Blog Category has a social relationship network of bloggers on its website. It allows you to connect with bloggers who share the similar interests as you. So the nodes represent the bloggers, while the edges represent social relationships. The bloggers’ interests are used as labels.
Basic Graph Properties
Undirected vs. Directed Graphs
There are two main types of edges: undirected and directed. In directed graphs, the edges do have a direction. We can only travel from the origin to the destination. In an undirected graph, edges do not have direction. We can travel both ways.
So in this undirected graph above, the edge connecting A and C can be referred to as A-C or C-A. But in the directed graph, the edge connecting A and C can only be referred to as A-C.
Unweighted vs. Weighted Graphs
A weighted edge is an edge with an associated number/value. This is known as weight. The weight value can represent, for example, the length of the route. On the other hand, an unweighted edge does not have any values or weights.
In the weighted graph above, A-C has a value of 3. In the unweighted graph, there are no values.
Node degree is simply the number of connections that each node has to the rest of the network. If you have many connections, then you are influential. In the earlier example, node B has a degree of 3.
Node degree tells us about each node's connectivity, but it doesn't consider the node's situation in the whole network. Therefore, node degree is known to be a local measure, since it is based on connections with adjacent nodes.
Hopefully, you are now familiar with the basics of graph theory such as node degree, weights, and directed graphs. I encourage you to learn about different graph algorithms and implement them to real-world problems.