Neo4j High CPU and Read query times

I am using Neo4j To support Abandoned Cart/Search use case.
I store User-> Product relationships

But, I see most of the times at peak load Neo4j CPU choking and Query times are not good.
Cpu mostly chokes on IOWait.

NeoVersion - 3.5.6
Instance - i3.2xlarge (8 vCPUS)
Page cache- 40g
Heap-15g

This is a causal cluster with 3 Read replicas, and ~20 Python servers querying on Neo4j

Concurrent Writes/Deletions(In batches) are also happening on database.

Have a read throughput of 10k/min, Do we need to tune bolt configs? I am using bolt+routing python driver

My graph size is ~200GB.

I want to know, recent products which a User has abandoned in cart or search?

User -[r:ADDED_TO_CART]-> Product
User -[r:PURCHASED]-> Product

Query uses Bulk reads 100 users at a time.

Query -

MATCH (user:mapUser20201_14)-[action_rel:ADDED_TO_CART]->(product:mapProduct20201_14) 
WHERE (user.user_id IN $user_ids) 
AND (action_rel.action_time >= $action_time 
AND action_rel.action_time < $action_time) 
with action_rel,user,product order by action_rel.action_time desc 
return  distinct user{.user_id}, product{.product_id} ,
head(collect(action_rel{.action_time})) as action_rel

I am using DISTINCT as I want the top unique product and time of that action.

Checkpointing was taking ~50m so increased the iops.limit=-1, now it is reduced to ~2m, also changed the default interval of 15m to 1h

I usually see queries takes ~500-2000ms, and CPU is very high (User+IOWait) > 200%.

As there is no way to index relationship property, I cant do much here.

I already have index on mapUser20201_14{user_id} and mapProduct20201_14{product_id} (Unique constarint which also creates index)

  1. I dont want my CPU to hike so much and Queries to return result in lesser time.
  2. Is there something which can be done here?
  3. Can writes affect Read query performances?
  4. I also analysed GC logs but not much of GC pauses.
  5. If there are too many DB hits then probably that can be a potential cause? How can i reduce it? I need to query based on timestamp.
  6. Query is not taking time in Planning phase as mostly it is 0 in query.log but I see log of Page Misses.

I have tried so many things but nothing is working out. Probably Neo4j is not fit fo the use case where query involves timestamps or relation props.

I see basically 2 strategies to improve the runtime of your queries:

  1. operationally

Have more RAM, e.g. use a 256GB machine. Assign 200 GB for pagecache, then all your queries will operate 100% on cache and don't produce IO load.
If adding RAM is not an option you can either go with a instance using direct attached SSD instead of EBS volumes.
If this is also not an option, make sure you have ENA enabled: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html

  1. optimize model:
    you might have a chance to encode parts of the action time into the relationship type. Assuming e.g. you want to find all actions for your 2019, you add _2019 as a suffix to your ADDED_TO_CART relationship type -> ADDED_TO_CART_2019. Now your cypher statement can be more selective before going through the filter which results in less CPU activity.

@stefan.armbruster We are using i3.2xlarge. This is an instance store and attached SSD. The IO stats are from there only.

I will see after changing dataModel. So, if I have ADDED_TO_CART_2019, but still no INDEX.
So, will CPU be less because that will be included in the cypher?

I'd really give a machine with more RAM a test shot.

@stefan.armbruster Do you see benefit in Data model change? Shall i try with it?

I'd probably give a quick shot at increasing RAM first, data model changes will for sure help as well, but I guess this requires more effort.

1 Like

From a modeling perspective you can elevate the time property of your relationships to the rel-type, e.g. to :ADDED_TO_CART_yyyy_mm_dd so you can subfilter much quicker on the time information.

Otherwise as Stefan said, it will just trash memory all the time, having to reload data from disk.

You should also not do distinct on properties

with  distinct user, product, head(collect(action)) as rel
return user.user_id, product.product_id, rel.action_time