-
Notifications
You must be signed in to change notification settings - Fork 23
Indexed relationships
When building large scale applications in Neo4J a naive use of relationships can at times create performance problems when too many relationships are created to or from one particular node.
A common pattern among Neo4J applications is the use of category or class nodes, where several nodes are said to be an instance of a particular class or belong to a particular category.
Suppose we have the following database:
- There are many (possibly millions) of nodes representing individual topics.
- There is a node that represents a topic-class, such that all individual-topic-nodes have an IS_INSTANCE relationship to the topic class node.
- There is a node that represents the class-of-all-classes, such that the topic-class-node has an IS_INSTANCE relationship to the class-of-all-classes-node.
Given that we know the topic-class-node and we want to know what class it is an instance of, we call getRelationship(DynamicRelationshipType.withName("IS_INSTANCE"), Direction.OUTGOING)
on the topic-class-node and retrieve the end node of that relationship.
As long as there are relatively few individual-topic-nodes this works well, but when there are a huge number (hundreds of thousands or more) individual-topic-nodes, the performance of the getRelationship method degrades.
In-graph indexed relationships are a solution to this problem.
Instead of directly storing relationships from the start-node to the end-node, indexed relationships are stored in an in-graph Btree.
In our example, the topic-class-node when using direct relationships will have an incoming relationship for each individual-topic. When using indexed relationships the topic-class-node will have one outgoing relationship to the root of the index tree and all individual-topic-nodes will have one incoming relationship from a node within the index tree. This reduces the number of relationships per node and makes the call to retrieve the class of the topic-class-node fast again.
Indexed relationships are supported by means of two classes: IndexedRelationship and IndexedRelationshipExpander.
The IndexedRelationship class represents all relationships of a particular relationship type and with a particular direction. The class implements the Iterable<Relationship>
interface to make it easy to iterate over the relationships stored in the index.
The IndexedRelationshipExpander class makes the retrieval of direct and indexed relationships on a node transparent.
An indexed relationship is set up as follows:
First we need to create a comparator that, given two nodes, decides whether one is smaller or larger than the other, or if they are equal.
class IdComparator implements java.util.Comparator<Node>{
public int compare(Node n1, Node n2){
long l1 = Long.reverse(n1.getId());
long l2 = Long.reverse(n2.getId());
if(l1 == l2) return 0;
else if(l1 < l2) return -1;
else return 1;
}
}
In this case we choose to store the nodes sorted by the reverse of the node-id's. Node-id's are in practice monotonically increasing (except when id's are being reused). When storing nodes sorted by node-id, the index tree would need many balancing actions, because all nodes would be added to one side of the tree. Using the reverse of the node-id gives a much more distributed key for the index tree, leading to fewer balancing actions.
Of course we could have chosen any comparator, as long as there is a wide distribution of values. Using a comparator that distinguishes between very few values, will lead to densely populated nodes at the leaves of the index tree.
Using node-id's or their reverse has an additional advantage: the index can be made unique, since no two nodes have the same id (or reverse-id).
We can now set up the index.
Given a node nodeToIndex, we call the constructor on IndexedRelationship:
IndexedRelationship ir = new IndexedRelationship(DynamicRelationshipType.withName("IS_INSTANCE"), Direction.INCOMING, new IdComparator(), true, nodeToIndex, graphDb());
This creates the index and attaches it to nodeToIndex.
The constructor takes the following arguments:
- the type of the relationship
- the directions of the relationship
- the comparator to be used by the index
- a flag indicating if the index is unique or not
- the node that will contain the relationship index
- an instance of the database
The index can be populated by calling createRelationshipTo on the IndexedRelationship.
Node n1 = graphDb().createNode();
Node n2 = graphDb().createNode();
ir.createRelationshipTo(n1);
ir.createRelationshipTo(n2);
All relationships can be iterated:
for(Relationship rel: ir){
System.out.println(rel.getStartNode());
}
**Note: ** The relationships returned here implement the relationship interface, but are different from "normal" Neo4j relationships. As a result the getId method will throw an UnsupportedOperationException.
Once the indexed relationship is set up, the IndexedRelationshipExpander can be used to retrieve relationships whether they are direct relationships or indexed relationships.
IndexedRelationshipExpander re1 = new IndexedRelationshipExpander(graphDb(), Direction.INCOMING, DynamicRelationshipType.withName("IS_INSTANCE"));
IndexedRelationshipExpander re2 = new IndexedRelationshipExpander(graphDb(), Direction.OUTGOING, DynamicRelationshipType.withName("SOME_DIRECT_RELATIONSHIP"));
Two relationship expander instances are created for the two relationships we are interested in. One (IS_INSTANCE) is stored in the relationship index, the other is directly stored between start and end node.
The expanders can be used to iterate over all relationships with the given direction and relationship type.
for(Relationship rel: re1.expand(indexedNode)){
System.out.println(rel.getStartNode());
}
for(Relationship rel: re2.expand(indexedNode)){
System.out.println(rel.getEndNode());
}
The IndexedRelationshipExpander will recognize which relationship type is indexed and which is stored directly and will return the correct relationships irrespective of the way they are stored.
Indexed relationships are not only a solution to the problem of densely populated nodes, but can also be used to traverse relationships in a particular order.
Suppose we want to list the 50 most recent entries of the individual topics from our previous example. Instead of the comparator that uses the reverse node-id, we can create a comparator that retrieves a timestamp (a property set on each topic node) and compares that to the timestamp stored on another node in the index tree.
The index can be set up non-unique, because two topic-nodes may incidentally have the same timestamp.
With this timestamp-based index we can iterate over the relationships, retrieving the topic-nodes in descending timestamp order and stop after we have found the first 50 entries.
Another use of indexed relationships is to guarantee unicity. Suppose the topic-nodes should all have a unique property "topic_name". We can now create a comparator that compares the "topic_name" property of the topic-nodes.
The index needs to be set up as unique.
As soon as a node is added to the index that has a "topic_name" property equal to one already stored in the index, an exception will be raised.
Indexed relationships can be useful when populating large databases, avoiding the dense-relationship-problem. This problem may be solved in upcoming releases of Neo4j, by changing the layout of the relationship store.
Even when the dense-relationship-problem is solved in the Neo4j kernel, indexed relationships may still be useful when particular sort-orders are required, or when unicity constraints need to be guarded.