cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Disappointing performance after migration from SDN5 to SDN6

christian3
Node

Hi,

I have a large project based on SDN5/OGM and just spent about a week trying to migrate to SDN6. We have complex queries sometimes returning thousands of nodes, and while most queries are fast, a lot of performance is lost in the OGM layer building entities. I was kind of hoping SDN6 would be more efficient in this regard. I have been looking for some performance numbers/comparisons but could not find any.
After some trial and error I got to the point where we could do some tests ourselves. Unfortunately performance is currently (a lot) worse.
The difference is especially noticeable in large queries returning whole paths, like
p=(n)-[*0..]->() return collect(p)
where we basically want to return a whole subtree of a node. As SDN6 doesn't know how to map paths (why?), I had to replace this with collect(nodes(p)) and collect(relationships(p)) everywhere.
I don't know what exactly is happening but SDN6 spends a huge amount of time trying to map returned records. Queries that took seconds now take minutes. To the point where it's unusable.

I wonder if anyone is aware of performance issues, or if comparisons have been done with large datasets?

5 REPLIES 5

gerrit_meier
Neo4j
Neo4j

I am sorry to hear that you have this bad experience.
Spring Data Neo4j 6 cannot map paths (anymore): This is not true. Have a look at spring-data-neo4j/AdvancedMappingIT.java at 44185a4150b6a2682fcde122adc3ed1ea55a875b · spring-projec... . This is a simple path return.
Performance-wise it is hard to tell what might lead to the problem you are facing. Can you give us a little bit more insight of your domain and the data? Like no. of nodes/relationships and connected objects?
Be it the path return or the list returns, both will create load in the mapping logic if you have a lot of relationships in there and/or possible related nodes.

christian3
Node

Hi,

Thanks for your response.
Concerning the paths, I think it might work for single paths, but not for collects...

From the docs:
https://docs.spring.io/spring-data/neo4j/docs/current/reference/html/#custom-queries.for-relationshi...

(listing 74)
"This will result in multiple paths that are not merged within one record. It is possible to call collect(p) but Spring Data Neo4j does not understand the concept of paths in the mapping process. Thus, nodes and relationships needs to get extracted for the result."

Maybe that would need some more documentation.

About our data, we have about 6 million nodes and 13 million relationships.
This specific query involves querying assets in pages, where each asset must include a subtree of properties and definitions. It works for small pages of 10-20 assets, from 50 on it basically takes forever. It actually looks like the time taken grows exponentially.
(Neo4j OGM takes about 10 seconds to map a 1000 asset page)

I think I'll need to dig a bit deeper in the spring data code to figure out what's going on...

christian3
Node

@gerrit.meier Hi Gerrit,

I made a little sample project to demonstrate the performance regression for some type of queries.
As mentioned before, SDN6 struggles when large amounts of nodes and relationships are returned.
Please have a look here:

There's one unit test, PerfTest.java, which sets up a neo4j docker instance, loads some sample data and has 2 test methods: testSDN and testOGM
Both will use the exact same query to load nodes and their subgraph.
Note that it is certainly possible to do this more efficiently e.g. by using APOC (see the testSDNUsingApoc method).
The goal here however is to demonstrate how slow SDN6 is in mapping this particular response.

The model is quite simple: Movies have Actors and Actors have Hobbies. Actors and Hobbies are shared between Movies.

The default setup will create about 200 movies, which means a graph of about 80000 nodes in total.
On my machine, OGM will fetch and map the whole thing in about 1 sec.
SDN6 takes 60 seconds, a factor 60 slower...
When increasing the numMovies property, you'll see that SDN execution time seems to increase exponentially. The code seems to recurse into DefaultNeo4jEntityConverter.createInstanceOfRelationships() forever...

We have such large queries, and even after making several optimisations like using APOC, SDN is still considerably slower than OGM.

I hope this can help you uncover some issues!
Christian

gerrit_meier
Neo4j
Neo4j

Thanks for reporting this in such detail. In the test I get a 1 (Neo4j-OGM) to 11 (SDN6) outcome but I will of course investigate further how we could improve this.

gerrit_meier
Neo4j
Neo4j

Quick update:
From

2021-10-01 16:37:18,774  INFO                com.example.demo.PerfTest: 128 - Start fetching movies
2021-10-01 16:37:30,632  INFO                com.example.demo.PerfTest: 130 - Done fetching movies

To

2021-10-01 16:38:19,778  INFO                com.example.demo.PerfTest: 128 - Start fetching movies
2021-10-01 16:38:20,653  INFO                com.example.demo.PerfTest: 130 - Done fetching movies

still some ideas but thanks already for the input.
The messages in question where right now MappingSupport#extractNodes and MappingSupport#extractRelationships. At least on my side most of the time (2/3) was spent on preparing the data before mapping.

There is also now a issue to track: Improve mapping performance for custom queries and paths. · Issue #2391 · spring-projects/spring-dat...

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit - November 16 - 17, 2022.


Free NODES Training Series


October 19th -

Intro to Neo4j


October 20th -

Healthcare Analytics Using Neo4j


October 25th -

Handling Neo4j data with Apache Hop


October 26th -

Blazing Fast Graphs: Hands-on with Apache Arrow and Neo4j


November 2nd -

Graph EDA Using the Neo4j GDS Client