cancel
Showing results for 
Search instead for 
Did you mean: 

Neo4j driver very-very slow, need help

mikhail
Node Clone

Hello guys, I have a problem and I my feeling is that I am doing something wrong with Java Driver (org.neo4j.driver:neo4j-java-driver:1.4.4).

Let me tell you about the architecture. Everything in AWS in one region. I have a Kinesis stream and Lambda Function which consumes events from the stream. Based on the event's type it execute Cypher query through Java Neo4j driver. It's light queries (create relationship between two or more node for ex.) but there are a lot of them in the stream. And now it's really slow. It takes 30 seconds for lambda function to execute 2000 requests. This time is the same for small (1024Mb) and big (3008Mb) Lambdas, for single instance (t2.medium) with neo4j community edition on the board and for enterprise casual cluster with 3 core servers (r4.large each).

So I think that I am doing something wrong with the driver (or bolt just can't do better)...

Here is the part of my code.

As you can see I try to reuse session for all events in a batch.

I really belive that neo4j can handle much more in a second, so the problem in me or in bolt.. Can you give me any advice how to improve performance?

6 REPLIES 6

david_allen
Neo4j
Neo4j

I don't know a lot about how this code gets instantiated or when it runs, but when performance is very slow as you say, the culprit is often recreation of the driver or the session object. The setup process does take time and back and forth handshakes which you don't want to be doing for every query.

Seems you've already thought of that though, so did you try to gather any debugging output to ensure this isn't happening at runtime? I see you've taken steps to avoid this in your code under the assumption that KinesisConsumer objects are rarely created and long-living in memory. However I'm also wondering about the parallelism on AWS's side. For example when you deploy a piece of code, there's no guarantee that AWS is only running one copy of it. This is just speculation on my part but it'd be interesting to see logs of this slow performance where your code is indicating when it's connecting and when it's creating a new session, to help ferret out whether that's happening many times due to AWS's execution model rather than your code.

Hi David, thank you for your answer.

I added log with execution time (line 18 and 32 in a gist)

and now can confirm that all the time driver spends for execution (no new session creation or something else) and it's very depressing.

Please provide the log outputs requested. The information you're providing so far isn't enough to help you.

Maybe something like this:

log.info("execution time {}, tx count {}", System.currentTimeMillis() - start, transaction_count);

and log when you create the driver. Then for a given TX load you can see how many actual connections are created, and how many TXs and so on.

Relative to the performance of a transaction, it's also pretty useful to understand what MAKE_SOMETHING_BASED_ON_EVENT is in your code, as there are a lot of factors there that can impact performance. (Complexity, presence/abscence of indexes, formulation of the query, and so on)

If 2000 queries were to be executed serially in 30 sec, that'd be 15ms per query which doesn't seem bad at all considering your database is across a network. What results are you looking for?

Yes, you are probably right - this is not a bad performance.
I had wrong (or very high) expectations. I just wanted to be sure that I'm using driver correctly.

Apparently for my case, I need to use another mechanism (unmanaged extension mb)

I have same problem, when i use cypher writing data to neo4j. I only create one session, use session.run() to execute in a loop. I have log which show that if I want to execute 4,000 sentences, i will need 20 minutes. I was so confused. Have you solved this problem?

I think you need to create a separate question for this, and provide information that will let us do more than just blindly speculate. The query, the approach you're using with the driver, the indexes you have present, and any PROFILE plans of the queries would help.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.