In the Visualization team of the Bahnknowhow project, we encountered a potential performance issue in one of our search functionalities. Since we are using Neo4j (Version: 4.0.6) as our DBMS, we suspect that our current queries or logic, which is composed of various Cypher statements and Java code (Spring Data Neo4j), could be optimized with further know-how about the graph database.
In the following section we try to show you our current problem.
The search functionality mentioned earlier is to return "hierarchical knowledge entries" based on a search term. We solved the matching of the search term with the titles of the knowledge entries via a fulltext index and with Lucene queries directly in Neo4j (no third-party tool like Elasticsearch is currently involved). We want to put all knowledge entries returned by a full-text search query into a hierarchical structure. In our current data model, knowledge entries can have child knowledge entries of any number of levels. Knowledge entries returned by the search, of which a parent knowledge entry was also returned by the search, should be subordinate to it at the appropriate hierarchy level.
Example:
If we have the following structure in the database:
(parent)<-[:IS_PART]-(child)<-[:IS_PART]-(grandchild)
the following result should be returned:
{
"name": "parent",
"children": [
{
"name": "child",
"children": [
{
"name": "grandchild",
"children": [ ]
}
]
}
]
}
In addition, there is the further condition that no more than three hierarchy levels should be returned in the result. This means that all knowledge entries that have hierarchy level 3 or higher (level 4, 5, 6, ...) in the database must be at level 3 in the result.
Example:
If we have the following structure in the database:
(parent)<-[:IS_PART]-(child)<-[:IS_PART]-(grandchild)<-[:IS_PART]-(grandgrandchild)
should return the following result:
{
"name": "parent",
"children": [
{
"name": "child",
"children": [
{
"name": "grandchild",
"children": [ ]
},
{
"name": "grandgrandchild",
"children": [ ]
}
]
}
]
}
If in a certain hierarchy level no knowledge entry has been matched by the search, but in the following levels, then the result should "skip" one hierarchy level and let for example level 3 become level 2 in the result.
Example:
If we have the following structure in the database:
(parent)<-[:IS_PART]-(child)<-[:IS_PART]-(grandchild)<-[:IS_PART]-(grandgrandchild)
and child was not matched by the search, but parent, grandchild and grandgrandchild were, then the following result should be returned:
{
"name": "parent",
"children" :[
{
"name": "grandchild",
"children" :[
{
"name": "grandgrandchild",
"children" :[ ]
}
]
}
]
}
Our current implementation of the solution:
In a first step (query) after the full-text search we separate the knowledge entries for which there is no parent knowledge entry in the search result (first level) from the knowledge entries that have a parent knowledge entry (lower levels).
However, since the child knowledge entries must be distributed to the first-level knowledge entries in a hierarchically correct manner, we wrote a recursive Java method that distributes child knowledge entries to a knowledge entry according to the conditions listed above. Each time this method is called, a Cypher statement is executed again to find out which knowledge entry in the hierarchy level must be directly subordinated to the currently given knowledge entry, based on a list of IDs of the remaining unassigned child knowledge entries.
Since this leads to a lot of queries for many search results of the full-text search, we fear strong performance problems. Therefore, if possible, we would like to adapt the implementation in such a way that fewer queries have to be made and the time complexity of our entire search is reduced.
How can the amount of executed queries on the database be reduced in our use case?
How can creating a hierarchical structure as in our use case be solved by a Cypher statement instead of a recursive method call in Java?
Yours sincerely, Visualization Team