Graph databases are excellent at modeling interactions between people. This quality also makes graph databases a particularly effective tool for tracing the physical pathways along which viruses like Covid-19 travel through populations.

Alex Woodie filed this report for Datanami:

While computer technology alone can’t stop a microorganism like the coronavirus that emerged from China in late December, it can help to give decision-makers better information, which could potentially slow its spread and give governments precious time to prepare their responses.

In particular, graph databases are proving to be valuable tools in modeling the spread of coronavirus. Graph databases were created because they efficiently mirror complex networks of entities. In these types of specialized databases, people, places, and things are treated as “nodes” and the connections between them are called “edges.”

Entity graph databases are excellent at modeling how people interact and influence each other, and how ideas and behaviors travel along social pathways. Once you have modeled those interactions in the database, then you can use it to make predictions.

As it turns out, graph databases are also great for modeling the physical pathways along which infectious vectors like SARS-CoV2 travel through populations (or networks) of people. And once those real-world connections are populated with real-world data, then the graph database can be used to predict how a virus actually traveled, which can be a useful tool for contact tracing.

That is essentially what one group of Chinese researchers did with Neo4j, an open source graph database. According to a February blog post, a Neo4j partner in China called We-Yun has built an application atop the Neo4j database that allows Chinese citizens to do a “self assessment” by checking to see if they came in contact with a known carrier of the virus that causes Covid-19.


The application that We-Yun built, appropriately called “Epidemic Search,” allows users to enter the name of a place, a flight, a train, or a license plate. The application then returns all known cases connected with the name, Zhang writes. Users can also enter two names, such as Wuhan and Beijing, and the application will show all “edges” that connection them, or flights, trains, and vehicle license plates that traveled between the two cities. The source data was provided by a public source of coronavirus pneumonia data, and the application was developed in the Neo4j query language, called Cypher.