XGI in 15 minutes

Hello! If you are new to XGI you might want to check out the XGI in 1 minute or the XGI in 5 minutes tutorials for a quick introduction.

The starting point is always to import our Python library and other standard libraries, this is simply done using:

[1]:
import matplotlib.pyplot as plt

import xgi

Uploading a dataset

In this tutorial we will construct a hypergraph describing real world data! With XGI we provide a companion data repository, xgi-data, with which you can easely load several datasets in standard format:

[2]:
H_enron = xgi.load_xgi_data("email-enron")

The ‘email-enron’ dataset, for example, has a corresponding datasheet explaining its characteristics. The nodes (individuals) in this dataset contain associated email addresses and the edges (emails) contain associated timestamps. These attributes can be accessed by simply typing H.nodes[id] or H.edges[id] respectively.

[3]:
print(f"The hypergraph has {H_enron.num_nodes} nodes and {H_enron.num_edges} edges")
The hypergraph has 148 nodes and 10885 edges

We can also print a summary of the hypergraph:

[4]:
print(H_enron)
Hypergraph named email-Enron with 148 nodes and 10885 hyperedges

The dataset is completely formatted. You can access nodes and edges or their attributes in a very simple way:

[5]:
print("The first 10 node IDs are:")
print(list(H_enron.nodes)[:10])
print("The first 10 edge IDs are:")
print(list(H_enron.edges)[:10])
print("The attributes of node '4' are")
print(H_enron.nodes["4"])
print("The attributes of edge '6' are")
print(H_enron.edges["6"])
The first 10 node IDs are:
['4', '1', '117', '129', '51', '41', '65', '107', '122', '29']
The first 10 edge IDs are:
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
The attributes of node '4' are
{'name': 'robert.badeer@enron.com'}
The attributes of edge '6' are
{'timestamp': '2000-02-22T08:07:00'}

It is also possible to access nodes of edges in particular variable types, for example we can create a dictionary containing the edges of our hypergraph and their members:

[6]:
edges_dictionary = H_enron.edges.members(dtype=dict)
print(list(edges_dictionary.items())[:5])
[('0', {'4', '1'}), ('1', {'129', '1', '117'}), ('2', {'1', '51'}), ('3', {'1', '51'}), ('4', {'1', '41'})]

Cleaning up a hypergraph dataset

You can check if your hypergraph is connected using the function:

[7]:
xgi.is_connected(H_enron)
[7]:
False

We can count the number of isolated nodes and multi-edges in the following way:

[8]:
isolated_nodes = H_enron.nodes.isolates()
print("Number of isolated nodes: ", len(isolated_nodes))
duplicated_edges = H_enron.edges.duplicates()
print("Number of duplicated edges: ", len(duplicated_edges))
Number of isolated nodes:  5
Number of duplicated edges:  9371

We can clean up this dataset to remove isolated nodes and multi-edges, and replace all IDs with integer IDs using the cleanup function:

[9]:
H_enron_cleaned = H_enron.cleanup(
    multiedges=False, singletons=False, isolates=False, relabel=True, in_place=False
)

print(H_enron_cleaned)
Hypergraph named email-Enron with 143 nodes and 1459 hyperedges

We can see that 5 isolated nodes were removed and 9371 duplicated edges were removed. We can check it:

[10]:
len(H_enron.nodes) == len(H_enron_cleaned.nodes) + len(isolated_nodes)
[10]:
True
[11]:
len(H_enron.edges) == len(H_enron_cleaned.edges) + len(duplicated_edges)
[11]:
False

We can check that the hypergraph is now connected:

[12]:
xgi.is_connected(H_enron_cleaned)
[12]:
True

Drawing

Visualization is crucial for understanding complex data structures. You can use the default drawing function:

[13]:
xgi.draw(H_enron_cleaned);
../../_images/api_tutorials_XGI_in_15_minutes_25_0.png

When dealing with large structures like this e-mail dataset the visualization can be cumberstome to interpret. To help you with that XGI provides options for plotting hypergraph using the features of nodes and edges, for example:

[14]:
fig, ax = plt.subplots(figsize=(10, 10))
xgi.draw(
    H_enron_cleaned,
    node_size=H_enron_cleaned.nodes.degree,
    node_lw=H_enron_cleaned.nodes.average_neighbor_degree,
    node_fc=H_enron_cleaned.nodes.degree,
    ax=ax,
);
../../_images/api_tutorials_XGI_in_15_minutes_27_0.png

In this case we are plotting the hypergraph with the size and color of nodes depending on their degrees and the width of the edges nodes markers depending on their average neighbor degree.

Histograms of edges sizes and nodes’ degrees

It might me useful for a first analysis of you dataset to plot some histrograms representing relevant feautures of you higher-order structure. For example if you want to plot a histogram for the edges sizes:

[15]:
list_of_edges_sizes = H_enron_cleaned.edges.size.aslist()
ax = plt.subplot(111)
ax.hist(
    list_of_edges_sizes,
    bins=range(min(list_of_edges_sizes), max(list_of_edges_sizes) + 1, 1),
)
ax.set_xlabel("Edge size")
ax.set_ylabel("Frequency");
../../_images/api_tutorials_XGI_in_15_minutes_30_0.png

Or you can plot a histogram for the nodes’ degrees (the degree of a node is the number of edges it belongs to):

[16]:
list_of_nodes_degrees = H_enron_cleaned.nodes.degree.aslist()
ax = plt.subplot(111)
ax.hist(
    list_of_nodes_degrees,
    bins=range(min(list_of_nodes_degrees), max(list_of_nodes_degrees) + 1, 1),
)
ax.set_xlabel("Degree")
ax.set_ylabel("Frequency");
../../_images/api_tutorials_XGI_in_15_minutes_32_0.png

Incidence and Adjacency Matrices

Any hypergraph can be expressed as an \(N \times M\) incidence matrix, \(I\), where \(N\) is the number of nodes and \(M\) is the number of edges. Rows indicate the node ID and the columns indicate the edge ID. \(I_{i,j}=1\) if node \(i\) is a member of edge \(j\) and zero otherwise. XGI allows you to access the incidence matrix in the following way:

[17]:
I = xgi.incidence_matrix(H_enron_cleaned, sparse=False)

Then you can visualize it:

[18]:
plt.spy(I, aspect="auto")
plt.xlabel("Hyperedges")
plt.ylabel("Nodes")
plt.show()
../../_images/api_tutorials_XGI_in_15_minutes_36_0.png

We can represent a hypergraph with an \(N\times N\) adjacency matrix, \(A\), where \(N\) is the number of nodes. Notice that the adjacency matrix is a lossy format: different hypergraphs can create the same adjacency matrix. \(A_{i,j} = 1\) if there is at least one hyperedge containing both nodes \(i\) and \(j\). XGI allows you to access the incidence matrix and visualize it in the following way:

[19]:
A = xgi.adjacency_matrix(H_enron_cleaned, sparse=False)
plt.spy(A);
../../_images/api_tutorials_XGI_in_15_minutes_38_0.png

If you are interested in other hypergraph matrices such as Laplacians, you can check the documentatation about the linear algebra package.

Algorithms

The algorithms package contains different algorithms you can run on your higher-order structure. For example you can compute the density and degree assortativity of your structure:

[20]:
print("The density of the hypergraph is:", xgi.density(H_enron_cleaned))
print(
    "The assortativity of the hypergraph is:", xgi.degree_assortativity(H_enron_cleaned)
)
The density of the hypergraph is: 1.3084764540479412e-40
The assortativity of the hypergraph is: 0.2250663686125537

Or you can access a dictionary containing the local clustering coefficient (overlap of the edges connected to a given node, normalized by the size of the node’s neighborhood, for more details you can see this paper) of your structures:

[21]:
local_clustering_dict = xgi.local_clustering_coefficient(H_enron_cleaned)
print(local_clustering_dict)
{0: 0.7804612787100978, 1: 0.6780441301803274, 2: 0.7689183265963335, 3: 0.791038445681303, 4: 0.7856925737994167, 5: 0.6965494874182317, 6: 0.7072322879888286, 7: 0.7730900667477023, 8: 0.7118552061973111, 9: 0.7683540442301209, 10: 0.8031870323887131, 11: 0.76833559781224, 12: 0.7490487746981251, 13: 0.7755923252981957, 14: 0.6848194711637989, 15: 0.7573675775256802, 16: 0.4584199134199135, 17: 0.638446173924115, 18: 0.7017206124203775, 19: 0.8031870323887131, 20: 0.7009358405995354, 21: 0.738118399364662, 22: 0.7075910814158463, 23: 0.7223895333094579, 24: 0.739291988588228, 25: 0.7710850845455389, 26: 0.7821012321012321, 27: 0.768337509507563, 28: 0.7700134952824707, 29: 0.7490487746981251, 30: 0.7856925737994167, 31: 0.7659625084060587, 32: 0.7786708147819262, 33: 0.7038081247195713, 34: 0.7727459637785723, 35: 0.7573675775256802, 36: 0.7861972454329856, 37: 0.7797302518270263, 38: 0.7759247429139076, 39: 0.7659625084060587, 40: 0.7009358405995354, 41: 0.7757340067340067, 42: 0.7573675775256802, 43: 0.6975586879901238, 44: 0.7573675775256802, 45: 0.7757340067340067, 46: 0.7759756479670126, 47: 0.7288875146018, 48: 0.7812697583387238, 49: 0.7727459637785723, 50: 0.6644212904016823, 51: 0.5541149591149591, 52: 0.49949772449772456, 53: 0.49949772449772456, 54: 0.638446173924115, 55: 0.7683540442301209, 56: 0.7104077955538087, 57: 0.7725113110513565, 58: 0.7118552061973111, 59: 0.7887064044826144, 60: 0.7812697583387238, 61: 0.7804612787100978, 62: 0.42930819180819185, 63: 0.7727459637785723, 64: 0.7757340067340067, 65: 0.7866768592959072, 66: 0.4117462894248609, 67: 0.4727321920503739, 68: 0.4584199134199135, 69: 0.4258333333333334, 70: 0.5541149591149591, 71: 0.7821012321012321, 72: 0.7725113110513565, 73: 0.3619137806637807, 74: 0.4584199134199135, 75: 0.7887064044826144, 76: 0.0, 77: 0.638446173924115, 78: 0.4727321920503739, 79: 0.4798340548340549, 80: 0.782342244842245, 81: 0.7821012321012321, 82: 0.7894392381608293, 83: 0.4727321920503739, 84: 0.7786708147819262, 85: 0.638446173924115, 86: 0.0, 87: 0.4584199134199135, 88: 0.4798340548340549, 89: 0.3619137806637807, 90: 0.7786708147819262, 91: 0.7490487746981251, 92: 0.4258333333333334, 93: 0.0, 94: 0.7827884699639654, 95: 0.7680722410445265, 96: 0.7490487746981251, 97: 0.7690633411984246, 98: 0.6644212904016823, 99: 0.7680722410445265, 100: 0.7856925737994167, 101: 0.7683540442301209, 102: 0.7725113110513565, 103: 0.7894392381608293, 104: 0.7659625084060587, 105: 0.7866768592959072, 106: 0.7490487746981251, 107: 0.7752549928556385, 108: 0.7035118435926735, 109: 0.7288875146018, 110: 0.7288875146018, 111: 0.7866768592959072, 112: 0.5978351972101971, 113: 0.7812697583387238, 114: 0.4584199134199135, 115: 0.4117462894248609, 116: 0.0, 117: 0.4727321920503739, 118: 0.4798340548340549, 119: 0.5978351972101971, 120: 0.7730900667477023, 121: 0.3619137806637807, 122: 0.7730900667477023, 123: 0.7118552061973111, 124: 0.7573675775256802, 125: 0.5978351972101971, 126: 0.7752549928556385, 127: 0.5978351972101971, 128: 0.4117462894248609, 129: 0.502430145611964, 130: 0.4117462894248609, 131: 0.685872720521843, 132: 0.638446173924115, 133: 0.7821012321012321, 134: 0.49949772449772456, 135: 0.5978351972101971, 136: 0.40510101010101013, 137: 0.791038445681303, 138: 0.4117462894248609, 139: 0.40510101010101013, 140: 0.16666666666666666, 141: 0.40510101010101013, 142: 0.40510101010101013}

Stats

The stats package is one of the features that sets XGI apart from other libraries. It provides a common interface to all statistics that can be computed from a network, its nodes, or edges. This package allows you, for example, to filter the nodes of a hypergraph with a certain degree:

[22]:
nodes_degree_2 = H_enron_cleaned.nodes.filterby("degree", 20)
print(nodes_degree_2)
[8, 58, 123]

Or you can perform more complex tasks such as creating a dataframe containing different statistics:

[23]:
df = H_enron_cleaned.nodes.multi(["degree", "clustering_coefficient"]).aspandas()
print(df)
     degree  clustering_coefficient
0        44                0.548792
1       101                0.452685
2        57                0.529268
3        36                0.606272
4        50                0.569712
..      ...                     ...
138       8                0.535714
139       6                0.333333
140       4                1.000000
141       6                1.000000
142       6                1.000000

[143 rows x 2 columns]

You can learn more about the stats package with the focus tutorial on statics or checking the documentation.

Wrapping Up

Well done! 👏 You’ve covered a lot in just 15 minutes with XGI. We hope you enjoyed this tutorial, and there’s much more to explore! Check out other tutorials here!