Query Optimization for Parallel Connection in Neo4j Community Version

laxmimerit · April 30, 2021, 7:59pm

Hi,
I am evaluating the Neo4j 4.2 with data that have millions of nodes and relationships. But writing performance is quite slow. My sample query is given below

create (session:Session {session_id: 'session_id1'})
;

match (s:Session) where s.session_id='session_id1' with s
create (e1:Event {insert_id: "insert_id1"}) set e1:SeenPage
create (s)-[:CONTAINS]->(e1)
create (s)-[:FIRST_EVENT]->(e1)
merge (pp1:Properties {value: "sample-url-1"}) set pp1:Page merge (e1)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e1)-[:RELATED_TO]->(pp2)
;


match (e1:SeenPage) where e1.insert_id='insert_id1' with e1
create (e2:Event {insert_id: "insert_id2"}) set e2:Show merge (e1)-[:NEXT]->(e2) with e2
match (s:Session) where s.session_id='session_id1' with s, e2
create (s)-[:CONTAINS]->(e2)
merge (pp1:Properties {value: "occasions"}) set pp1:Category merge (e2)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sample-url-2"}) set pp2:Page merge (e2)-[:RELATED_TO]->(pp2)
merge (pp3:Properties {value: "pl"}) set pp3:Type merge (e2)-[:RELATED_TO]->(pp3)
merge (pp4:Properties {value: "child category"}) set pp4:Sub_Category merge (e2)-[:RELATED_TO]->(pp4)
;


match (e2:Show) where e2.insert_id='insert_id2' with e2
create (e3:Event {insert_id: "insert_id3"}) set e3:SeenPage merge (e2)-[:NEXT]->(e3) with e3
match (s:Session) where s.session_id='session_id1' with s, e3
create (s)-[:CONTAINS]->(e3)
merge (pp1:Properties {value: "/p-page-0"}) set pp1:Page merge (e3)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e3)-[:RELATED_TO]->(pp2)
;


match (e3:SeenPage) where e3.insert_id='insert_id3' with e3
create (e4:Event {insert_id: "insert_id4"}) set e4:Show merge (e3)-[:NEXT]->(e4) with e4
match (s:Session) where s.session_id='session_id1' with s, e4
create (s)-[:CONTAINS]->(e4)
merge (pp1:Properties {value: "rect1"}) set pp1:Category merge (e4)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "/p-page-1"}) set pp2:Page merge (e4)-[:RELATED_TO]->(pp2)
merge (pp3:Properties {value: "pl"}) set pp3:Type merge (e4)-[:RELATED_TO]->(pp3)
merge (pp4:Properties {value: "him"}) set pp4:Sub_Category merge (e4)-[:RELATED_TO]->(pp4)
;


match (e4:Show) where e4.insert_id='insert_id4' with e4
create (e5:Event {insert_id: "insert_id5"}) set e5:SeenPage merge (e4)-[:NEXT]->(e5) with e5
match (s:Session) where s.session_id='session_id1' with s, e5
create (s)-[:CONTAINS]->(e5)
merge (pp1:Properties {value: "/p-page-2"}) set pp1:Page merge (e5)-[:RELATED_TO]->(pp1)
merge (pp2:Properties {value: "sp"}) set pp2:Type merge (e5)-[:RELATED_TO]->(pp2)
;

This data represents the journey of a user on a website. The user starts a session and browses the pages. The action done by the user is recorded as the event. Each event has its unique id. Then the sequence of events is connected with the relationship :NEXT and CONTAINS . Events are not unique that's why I had to use create not merge . Properties of events are unique and these are created as nodes then added with a relationship RELATED_TO .

It's like this

#session contains events 
#events are connected with :next

Session-[:CONTAINS]->(Event1)-[:NEXT]-(Event2)<-[:CONTAINS]-Session

A session can contain 100s of events. The current speed of writing is quite slow. It is writing 10k sessions data in 4 hours. Each session contains on average 10 events. I am writing data event by event using a python bolt connector.

Any help would be really appreciated.

#neo4j #optimization #cypher #python

Cobra · May 3, 2021, 8:13am

Hello @laxmimerit

Did you use UNIQUE CONSTRAINTS before to load your data? It will load your data faster.

Regards,
Cobra

laxmimerit · May 3, 2021, 8:35am

Hi, Thanks for the reply. Yes I had created [UNIQUE CONSTRAINTS] for the merge nodes. Few node type I need to CREATE so did not use for that one otherwise yes for all others.

Cobra · May 3, 2021, 9:01am

Verify that all node types that have a unique property have a unique constraint. You could retry to reduce the number of merge, you can merge a whole path and set labels after.

Topic		Replies	Views
Improving very slow MERGE on relationship Cypher	11	2231	March 24, 2022
Using neo4j module and/or apoc to merge large number of nodes Import / Export	6	100	October 22, 2024
Creating relationship over several millions of nodes Cypher apoc , performance , cypher , relationship	23	2881	September 24, 2020
Cypher query optimization General migrated	13	191	August 30, 2022
Creating 200K relationships to a node is taking a lot of time in Neo4J 3.5? Neo4j Graph Platform	18	7298	November 18, 2021

August Summer Fun!

Query Optimization for Parallel Connection in Neo4j Community Version

Related topics